Devlog #1: What Running 8 Parallel AI Agents Taught Me
The problem nobody warns you about
AI coding tools work great in a single session. Then you try to scale.
You spin up a second Claude Code instance. Then a third. Suddenly you're juggling context windows, merge conflicts, and agents that cheerfully repeat solutions you already know don't work. Your single-agent productivity evaporates the moment you try to parallelize.
If that sounds familiar, this post is the failure log and the architecture that fixed it.
Where are you?
Steve Yegge's Welcome to Gas Town lays out 8 stages of AI-assisted coding.

This devlog is for Stages 5-8. If you're juggling multiple Claude Code instances and wondering why nobody else seems to be struggling with the same things -- stick around.
If you're at Stages 1-4, this will be interesting but probably not actionable yet. Bookmark it. Come back when you're running 3+ agents and feeling the chaos.
If you think AI coding is a fad, I hear you. The hype is deafening. But we've been here before. We said the same thing about compilers. We said it about garbage collection. The concerns were valid then, and they're valid now. But the abstraction layer moved up anyway. It's happening again.
The architecture shift: REPL to orchestration
It took three months and too many tokens to learn this: the standard approach to multi-agent coding is architecturally broken.
Every multi-agent system I built had workers reporting back to a coordinator. The coordinator's context fills up. Everything slows down. The coordinator becomes the bottleneck. This is the failure mode that nobody talks about because single-agent demos don't hit it.
In January, Steve Yegge released Gas Town -- an orchestration system that inverts this pattern:
Don't have agents return their work to a coordinator.
Each worker runs in complete isolation:
- Own terminal
- Own copy of the code
- Results go straight to git and a shared issue tracker
The coordinator just reads status updates from the tracker. It never loads the actual work. It doesn't know what the code looks like -- it knows what the tickets look like. It's managing work, not syntax.

A coordinator that does the work breaks at 3 agents. A coordinator that manages the work scales to 8.
I stopped everything and rebuilt my workflow around this pattern. One week later, I submitted my first PR upstream.
What I think I know
These are current beliefs. Ask me again in a month.
Warning: Provisional conclusions from 90 days. Treat accordingly.
1. The 40% Rule is real.
AI tools perform well below 40% of their context window. Above that, failures compound exponentially. Toyota figured this out in the 1950s: run production lines at 40% utilization, zero defects. Push to 60%, defects increase 400%. The extra capacity isn't waste -- it's what enables continuous improvement.
Same pattern shows up everywhere failure is catastrophic: aviation fuel reserves, ICU occupancy, portfolio risk. Gas Town applies it to code: each worker runs under 40% context. The coordinator stays light. Failures get isolated.
2. Isolation beats cleverness.
Weeks went into building clever coordination. Dumb isolation wins. Separate copies of the code mean no merge conflicts during parallel work. Failures don't cascade. Kill and restart workers without affecting others. Cattle, not pets.
3. Persistent state changes everything.
Before a git-backed issue tracker, every session started from zero. Now there's persistent state -- what's done, what failed, what's blocked. The AI wakes up knowing the plan.
4. Validation isn't optional.
Models are trained to be about 85% correct. They will say "fixed" when it's broken. They will say "tests passed" when they didn't run. 18% of tasks last week needed a second pass. If you're not validating AI output, you're not shipping code. You're shipping hope.
Successive refinement -- fail, fix, repeat -- is the only reliable pattern. It often takes multiple passes to bridge that last 15%.
The 50 First Dates problem
Every AI coding session starts from scratch. The model doesn't remember what you were working on yesterday. It doesn't know what already failed. It's Drew Barrymore waking up fresh every morning, except instead of falling in love with you, it's falling in love with solutions you already tried that don't work.
The community calls this the "50 First Dates problem."
What worked:
- CLAUDE.md files with project context
- Persistent directories (.agents/research/, .agents/plans/)
- Vibe levels (a calibration system for how much to trust the model)
What didn't:
- Hoping the model would "just remember"
- Ever-longer system prompts (context rot is real)
- My first orchestrator, which should have been called "the money pit"

Ralph Wiggum, RIP
Then Ralph Wiggum dropped. Ralph is a loop pattern -- you have Claude write to files, read its own output, and iterate until tests pass. Viral for a reason.
Ralph worked great for about two weeks. Then I started running it overnight, waking up to finished work, and feeling very clever.
The agent spirals on the same bug for hours, confident it's making progress. It tries the same fix seventeen times with minor variations, each time expecting different results. It's not insanity -- it's optimism without memory.

The gatekeeper
Gas Town gave me parallelization. But the quality problem -- the "Ralph" factor -- remained.
So I built a gatekeeper.

The gatekeeper validates every push: type checks, linting, complexity analysis, builds. The gatekeeper doesn't care about feelings or deadlines. It's successive refinement -- fail, fix, repeat -- until the code is clean or you give up.
This is the part that people miss about AI-generated code: it looks right. It types right. It's confidently, beautifully wrong.
The factory floor
There's a pattern emerging here, and it's not unique to AI.
TSMC dominates semiconductors because they figured out yield optimization at scale. The differentiation isn't the machines -- everyone buys those. It's the operational discipline that produces consistent quality at high throughput.
The job isn't to write code anymore. The job is to build and run AI coding foundries -- designing the factory floor, tuning the conveyor belts, watching the defect rates and adjusting the acceptance criteria. The code is just the output of the machine you're building.
It's a different skill -- more systems engineering than software engineering.

The flywheel bet
The bet: every dollar spent makes the system smarter. Retros become searchable wisdom. Patterns get promoted to CLAUDE.md rules. Standards keep the next agent from repeating the last one's mistakes. Work produces byproducts, byproducts become knowledge, knowledge enables better work.
More on this in a future devlog.

Try It
The stack:
- Beads - Git-backed issue tracking for agents
- Gas Town - Orchestration system
- Vibe Coding book - Gene Kim & Steve Yegge
What's next
This is devlog #1. Next up:
- Full breakdown of the .claude/ directory
- The journey from 60+ agents down to a select few
- How the Claude 2.1 command/skill changes forced another workflow overhaul
The landscape changes fast. Claude Code 2.1 shipped January 7th. By the time the next post drops, half of what's here might be obsolete. That's the game right now.
Related
- Devlog #2: How to Actually Use Your .claude/ Directory: Commands, skills, and agents -- and why you probably have too many.
- Getting Started with Vibe Coding: A practical guide to calibrating AI trust.
- 12-Factor AgentOps: Applying DevOps and SRE principles to AI reliability.