Spec-Driven Development: Why Your AI Agent Needs a Blueprint

The most expensive mistake in AI-assisted development isn't a bug. It's building the wrong thing fast.

An AI agent can implement a feature in minutes. If the feature is wrong (wrong scope, wrong interface, wrong assumptions) those minutes bought you technical debt at machine speed. You'll spend longer untangling the confident-but-wrong implementation than you would have spent writing a spec.

The spec is the leverage point. Everything downstream is mechanical.

The Problem with "Just Build It"

The vibe coding pitch is seductive: describe what you want, let the AI build it, iterate from there. And for small tasks (a utility function, a config change, a UI tweak) it works great. The feedback loop is tight. Mistakes are cheap to fix.

Scale up and the economics invert.

A 50-line function with a wrong assumption costs 5 minutes to fix. A 500-line module with a wrong assumption costs an hour. A multi-file feature that touches 8 files with a wrong architectural assumption? You're throwing it away and starting over.

The fix isn't slower generation. It's faster specification.

What a Spec Looks Like

Not a 20-page requirements document. Not a UML diagram. A focused artifact that answers four questions:

1. What does this do? One paragraph. If you can't describe it in a paragraph, the scope is too big.

2. What does it interface with? Which files, modules, APIs, or systems does this touch? This is where architectural bugs hide, in the assumptions about interfaces.

3. What does "done" look like? Concrete acceptance criteria. Not "works correctly," that's useless. "Writes output to ./dist/, exits 0 on success, exits 1 on failure, handles missing input with error message." Specific enough to verify mechanically.

4. What could go wrong? A /pre-mortem before implementation. Which edge cases matter? Which assumptions are fragile? Where has similar work broken before?

That's it. Four questions, usually fits in half a page. Takes 10-15 minutes to write. Saves hours of rework.

The Beads Pattern

In my workspace, every unit of work is a bead: a tracked issue with an ID, dependencies, and acceptance criteria.

bd create "Add rate limiting to API gateway"
bd show rl-01

The bead isn't just a ticket. It's a contract. Before any agent touches code, the bead defines:

Scope: What files and modules are in play
Dependencies: What must exist before this starts
Acceptance: How to verify it's done
Risk: What we've seen break in similar work

The agent reads the bead before implementing. The validation checks the bead after implementing. The spec drives both sides.

RPI: Research, Plan, Implement

The workflow that emerged from running thousands of agent sessions across 31 repos:

Research → Plan → Implement → Validate
    ^                            |
    └──── Knowledge Flywheel ────┘

Research is looking before you leap. Read the codebase. Understand the current state. Find where similar patterns already exist. This takes 5-10 minutes and prevents the most common class of AI mistakes: reimplementing something that already exists, or implementing something that contradicts existing architecture.

Plan is the spec. Decompose the work into beads. Define interfaces. Write acceptance criteria. Run a pre-mortem. This is where the leverage lives. A good plan makes implementation mechanical. A bad plan (or no plan) makes implementation a gambling session.

Implement is the easy part. The agent has a spec, has context, has clear acceptance criteria. Generate the code. Run the checks. Commit or fix.

Validate closes the loop. Did the implementation match the spec? Did anything unexpected break? What did we learn that makes the next plan better?

The teams that skip straight to Implement are the teams that report "AI coding is slower than doing it myself." They're right. Unplanned AI coding IS slower. Planned AI coding is dramatically faster.

Why Specs Beat Prompts

A prompt is a wish. A spec is a contract.

"Build me a rate limiter" is a prompt. The AI will build a rate limiter. Maybe it's token-bucket. Maybe it's sliding-window. Maybe it rate-limits per IP. Maybe per user. Maybe it stores state in memory. Maybe Redis. You won't know until you read 200 lines of code.

A spec makes the decisions explicit before generation starts:

## Rate Limiter
- Algorithm: sliding window
- Scope: per API key
- Store: Redis (existing cluster)
- Limits: 100 req/min, 1000 req/hour
- Response: 429 with Retry-After header
- Files: src/middleware/rate-limiter.ts
- Tests: rate-limiter.test.ts

Same feature. Radically different implementation quality. Not because the AI is smarter, but because the input is better.

This is shift-left applied to AI development. The earlier you make decisions explicit, the less rework downstream.

The Planning Tax

"But writing specs takes time." Yes. 10-15 minutes per feature.

Here's the math. In my workspace, unplanned features average 3.2 iterations to get right. Planned features average 1.4 iterations. Each iteration is roughly 15-20 minutes of agent time plus human review.

Unplanned: 0 min spec + (3.2 × 18 min) = ~58 min total
Planned:   12 min spec + (1.4 × 18 min) = ~37 min total

The "tax" saves 35% of total time. And that gap widens with complexity. For multi-file features, planned work is 50-60% faster than unplanned.

The planning tax isn't a tax. It's an investment with immediate, measurable returns.

Shared File Conflicts

One lesson that took multiple swarm sessions to learn: identify shared files during planning, not during implementation.

When multiple agents work in parallel (which is the whole point of multi-agent orchestration) they will collide on shared files. Config files. Index modules. Type definitions. Route registrations.

If you identify these during planning, you can assign them to a single agent or sequence the work. If you discover them during implementation, you get merge conflicts, git index corruption, and wasted cycles.

The spec is where coordination happens. By the time agents are writing code, the coordination decisions should already be made.

The Takeaway

Specs aren't overhead. They're the highest-leverage artifact in an AI-assisted workflow.

Write the spec. Let the spec drive the agent. Let the agent drive the code. Validate against the spec. Feed what you learned back into the next spec.

The spec is the blueprint. Everything else is construction.

12-Factor AgentOps

The 12 operational factors for reliable AI coding

Devlog #4: The Spec Is the Leverage

The session where spec-driven development clicked

Getting Started with Vibe Coding

A practical guide to calibrating AI trust