Devlog #4: Why Your AI Coding Specs Keep Failing
In devlog 1, the multi-agent setup collapsed under its own weight. In devlog 2, I consolidated from 60 agents down to 4. In devlog 3, the tools changed under my feet. This is what I learned about where leverage actually lives.
Everyone's optimizing the wrong thing.
We're comparing models, tweaking prompts, building elaborate agent frameworks. GPT-5 will fix it. Claude Opus is better. More context window. Better reasoning.
None of that is the bottleneck.
It's like grinding the wrong stat. You're putting points into Charisma when the boss fight needs Intelligence. The model is your gear. It upgrades automatically. Your spec-writing skill? That's the stat that actually matters.
The bottleneck is upstream. It's the spec. The thing you feed the agent. That's where leverage lives, and that's what nobody's talking about.
You're Optimizing the Wrong Thing
Here's the uncomfortable truth: the model improves every quarter without you lifting a finger. Anthropic ships updates. OpenAI ships updates. You don't control this. You're not adding value here.
What you DO control: what you feed it.
Chad Fowler nailed this in his Phoenix Architecture post:
"In a regenerable system, specifications are no longer descriptive documents. They are executable inputs."
Read that again. Specs aren't documentation. They're the actual input to your software factory.
The model is commodity. Swappable. The spec is what determines output quality.
Think Factory, Not Workbench
I stopped thinking of Claude as a smart assistant on my workbench. I started thinking of it as a factory that processes specs.
SPEC (input) → AGENT (factory) → CODE (output)
↑ │
└──── Does it match intent? ────┘
Three components:
- Input (Spec): Research, constraints, acceptance criteria. This is what YOU control.
- Factory (Agent): The AI model. Commodity. Interchangeable. Claude, GPT, Gemini. Doesn't matter.
- Output (Code): The artifact. You QC this against your original intent.
When the output doesn't match intent, most people blame the factory. "Claude hallucinated." "The model didn't understand."
Wrong instinct. You're blaming the tool when the input is the problem.
The factory is calibrated by people way smarter than me. If my output is garbage, the problem is almost always what I fed it. Missing context. Ambiguous requirements. Unstated constraints.
Your First Spec Will Fail. That's the Point.
This is the insight that changed everything for me.
The first spec almost never works. I used to think that meant I was bad at writing specs. I was wrong. The iteration IS the process.
This is the training arc. Every failed spec is a rep. Every v2 is a level up.
Spec v1 → Execute → Wrong assumption discovered
Spec v2 → Execute → Missing edge case surfaces
Spec v3 → Execute → Works
The goal isn't "write a perfect spec on v1." That's not possible. The goal is:
- Shorten the loop between spec versions
- Capture what changed at each iteration
- Learn WHY v3 worked when v1 didn't
For months, I deleted v1 and v2 the moment v3 worked. Got the working code, cleaned up the mess. Ship it.
That was a mistake. I was throwing away the most valuable part.
The history IS the knowledge. Why v1 failed tells you what to include next time. What changed between v2 and v3 becomes a checklist for similar specs. Delete the history and you're starting from scratch every time.
Here's What I Actually Do Now
I version control specs like code:
.agents/specs/
├── feature-x-v1.md # Initial attempt
├── feature-x-v1-analysis.md # What went wrong
├── feature-x-v2.md # Second attempt
├── feature-x-v2-analysis.md # What went wrong
└── feature-x-v3.md # Working version
The analysis file is the key. Three questions:
- What assumption was wrong?
- What context was missing?
- What would have caught this earlier?
That's it. Takes two minutes after a failure. Pays dividends forever.
My Kubernetes auth spec went through 5 versions before I started capturing why. Environment variables. Missing RBAC context. Timeout assumptions. Now I have that checklist. Every auth spec since? v1 → v2. That's the compounding.
The compounding effect: My v1 specs now start at the sophistication level of my old v3s. Not because I'm smarter. Because I'm querying prior failures.
Fowler's framing is exactly right: "Git taught us to version text; regenerable systems force us to version intent."
The code isn't the artifact anymore. The spec is.
Simulate the Failures Before You Hit Them
Version controlling specs is defense. Capturing what went wrong after. But there's offense too.
There's a shortcut I wish I'd learned earlier.
The old way:
Spec → Implement → Hit problem → Fix spec → Repeat 10x
The better way:
Spec → Simulate 10 failures → Learn → Implement once
Before burning agent cycles, I run through four questions:
- What if the input isn't what we expect?
- What if the external dependency fails?
- What happens on partial failure?
- What does debugging this at 2 AM look like?
That last one is my favorite. If I can't explain how to debug this when I'm sleep-deprived, the spec is too clever.
I skip this step half the time. Every time, I regret it. The problem I hit is always one of those four questions.
The spec iteration that could've been v1 → v3 becomes v1 → v7 because I was too lazy to think for five minutes first.
One Thing You Can Do Right Now
Next time a spec fails (and it will) don't just fix it. Create a spec-v1-analysis.md:
- What assumption was wrong?
- What context was missing?
- What would have caught this earlier?
Do this three times. You'll start noticing patterns. Maybe you always forget about authentication context. Maybe you never specify error handling. Maybe you assume the database is available when it's not.
That's the leverage. The model improves without you. Your spec-writing improves because of you.
The person who writes great specs will outperform the person with the best prompt library. Every time. Because the spec is the input, and the input determines the output.
That's the level-up. You stop being someone who uses AI. You become someone who understands the system well enough to adapt when it changes.
Start versioning your intent. The history is the knowledge.
Last devlog I said the tools keep changing. Here's what doesn't: the spec determines the output. Master that, and you're ready for the next patch.
Next up: what happens when you actually capture these learnings and they compound across sessions.