Skip to content

Spec-Driven Development: Why Your AI Agent Needs a Blueprint

Specs are the leverage point for reliable AI delivery: get intent, interfaces, and constraints right before the agent writes code.

February 12, 2026·6 min read
#ai-engineering#development-process

This essay is part of the reliable AI-assisted delivery trail: proof, method, and judgment for making fast AI work reviewable and safe to ship. Start with the curated writing paths or inspect the proof.

The most expensive mistake in AI-assisted development isn't a bug. It's building the wrong thing fast.

An AI agent can implement a feature in minutes. If the feature is wrong (wrong scope, wrong interface, wrong assumptions) those minutes bought you technical debt at machine speed. You'll spend longer untangling the confident-but-wrong implementation than you would have spent writing a spec.

The spec is the leverage point. Everything downstream is mechanical.


The Problem with "Just Build It"

The vibe coding pitch is seductive: describe what you want, let the AI build it, iterate from there. And for small tasks (a utility function, a config change, a UI tweak) it works great. The feedback loop is tight. Mistakes are cheap to fix.

Scale up and the economics invert.

A 50-line function with a wrong assumption costs 5 minutes to fix. A 500-line module with a wrong assumption costs an hour. A multi-file feature that touches 8 files with a wrong architectural assumption? You're throwing it away and starting over.

The fix is faster specification, not slower generation.


What a Spec Looks Like

Not a 20-page requirements document. Not a UML diagram. A focused artifact that answers four questions:

1. What does this do? One paragraph. If you can't describe it in a paragraph, the scope is too big.

2. What does it interface with? Which files, modules, APIs, or systems does this touch? This is where architectural bugs hide, in the assumptions about interfaces.

3. What does "done" look like? Concrete acceptance criteria. Not "works correctly," that's useless. "Writes output to ./dist/, exits 0 on success, exits 1 on failure, handles missing input with error message." Specific enough to verify mechanically.

4. What could go wrong? A /pre-mortem before implementation. Which edge cases matter? Which assumptions are fragile? Where has similar work broken before?

That's it. Four questions, usually fits in half a page. Takes 10-15 minutes to write. Saves hours of rework.


The Beads Pattern

In my workspace, every unit of work is a bead: a tracked issue with an ID, dependencies, and acceptance criteria.

bd create "Add rate limiting to API gateway"
bd show rl-01

A bead is a contract, not just a ticket. Before any agent touches code, the bead defines:

  • Scope: What files and modules are in play
  • Dependencies: What must exist before this starts
  • Acceptance: How to verify it's done
  • Risk: What we've seen break in similar work

The agent reads the bead before implementing. The validation checks the bead after implementing. The spec drives both sides.


RPI: Research, Plan, Implement

The workflow that emerged from running thousands of agent sessions across my production repos:

Research → Plan → Implement → Validate
    ^                            |
    └──── Knowledge Flywheel ────┘

Research is looking before you leap. Read the codebase. Understand the current state. Find where similar patterns already exist. This takes 5-10 minutes and prevents the most common class of AI mistakes: reimplementing something that already exists, or implementing something that contradicts existing architecture.

Plan is the spec. Decompose the work into beads. Define interfaces. Write acceptance criteria. Run a pre-mortem. This is where the leverage lives. A good plan makes implementation mechanical. A bad plan (or no plan) makes implementation a gambling session.

Implement is the easy part. The agent has a spec, has context, has clear acceptance criteria. Generate the code. Run the checks. Commit or fix.

Validate closes the loop. Did the implementation match the spec? Did anything unexpected break? What did we learn that makes the next plan better?

The teams that skip straight to Implement are the teams that report "AI coding is slower than doing it myself." They're right about unplanned AI coding. Planned AI coding is dramatically faster.


Why Specs Beat Prompts

A prompt is a wish. A spec is a contract.

"Build me a rate limiter" is a prompt. The AI will build a rate limiter. Maybe it's token-bucket. Maybe it's sliding-window. Maybe it rate-limits per IP. Maybe per user. Maybe it stores state in memory. Maybe Redis. You won't know until you read 200 lines of code.

A spec makes the decisions explicit before generation starts:

## Rate Limiter
- Algorithm: sliding window
- Scope: per API key
- Store: Redis (existing cluster)
- Limits: 100 req/min, 1000 req/hour
- Response: 429 with Retry-After header
- Files: src/middleware/rate-limiter.ts
- Tests: rate-limiter.test.ts

Same feature, same model. The input got sharper, and the output followed.

This is shift-left applied to AI development. The earlier you make decisions explicit, the less rework downstream.


The Planning Tax

"But writing specs takes time." Yes. 10-15 minutes per feature.

The math is straightforward. In my workspace, unplanned features have averaged ~3.2 iterations to get right. Planned features average 1.4 iterations. Each iteration is roughly 15-20 minutes of agent time plus human review.

Unplanned: 0 min spec + (3.2 × 18 min) = ~58 min total
Planned:   12 min spec + (1.4 × 18 min) = ~37 min total

The "tax" saves 35% of total time. And that gap widens with complexity. For multi-file features, planned work is 50-60% faster than unplanned.

The 12 minutes you spend planning pay back the first cycle.


Shared File Conflicts

One lesson that took multiple swarm sessions (and one spectacular 6-agent merge conflict pile-up) to learn: identify shared files during planning, not during implementation.

When multiple agents work in parallel (which is the whole point of multi-agent orchestration) they will collide on shared files. Config files. Index modules. Type definitions. Route registrations.

If you identify these during planning, you can assign them to a single agent or sequence the work. If you discover them during implementation, you get merge conflicts, git index corruption, and wasted cycles.

The spec is where coordination happens. By the time agents are writing code, the coordination decisions should already be made.


The Takeaway

Specs aren't overhead. They're the highest-leverage artifact in an AI-assisted workflow.

Write the spec. Let the spec drive the agent. Let the agent drive the code. Validate against the spec. Feed what you learned back into the next spec.

This is the part that travels. Decide intent and constraints before you let the machine generate, and AI-assisted work gets safe enough to hand to people who don't write code at all. I build the rigor for engineers first, then translate it.