Skip to content

Devlog #1: What Running 8 Parallel AI Agents Taught Me

The origin story: how single-session AI coding broke at scale, and the architecture patterns that became AgentOps.

January 12, 2026·7 min read
#vibe-coding#ai-development#devlog#agentops#productivity

This essay is part of the reliable AI-assisted delivery trail: proof, method, and judgment for making fast AI work reviewable and safe to ship. Start with the curated writing paths or inspect the proof.

This was written in January 2026, during the multi-agent experimentation phase. The architecture patterns here (context isolation, persistent state, validation gates, the 40% rule) proved durable and became the foundation for AgentOps, the proof engine behind the discipline between generation and trust. Read devlog #3 for how the thinking evolved.

The problem nobody warns you about

AI coding tools work great in a single session. Then you try to scale.

You spin up a second Claude Code instance. Then a third. Suddenly you're juggling context windows, merge conflicts, and agents that cheerfully repeat solutions you already know don't work. Your single-agent productivity evaporates the moment you try to parallelize.

If that sounds familiar, this post is the failure log and the architecture that fixed it.


Where are you?

Steve Yegge's Welcome to Gas Town lays out 8 stages of AI-assisted coding.

Vibe Coding Stages

This devlog is for Stages 5-8. If you're juggling multiple Claude Code instances and wondering why nobody else seems to be struggling with the same things, stick around.

If you're at Stages 1-4, this will be interesting but probably not actionable yet. Bookmark it. Come back when you're running 3+ agents and feeling the chaos.

If you think AI coding is a fad, I hear you. The hype is deafening. But we've been here before. We said the same thing about compilers. We said it about garbage collection. The concerns were valid then, and they're valid now. But the abstraction layer moved up anyway. It's happening again.


The architecture shift: REPL to orchestration

It took three months and too many tokens to learn this: the standard approach to multi-agent coding is architecturally broken.

Every multi-agent system I built had workers reporting back to a coordinator. The coordinator's context fills up. Everything slows down. The coordinator becomes the bottleneck. Single-agent demos don't hit this failure mode, so nobody talks about it.

In January, Steve Yegge released Gas Town, an orchestration system that inverts this pattern:

Don't have agents return their work to a coordinator.

Each worker runs in complete isolation:

  • Own terminal
  • Own copy of the code
  • Results go straight to git and a shared issue tracker

The coordinator just reads status updates from the tracker. It never loads the actual work. It doesn't know what the code looks like; it knows what the tickets look like. The coordinator manages tickets; the workers handle syntax.

Beads Network

A coordinator that does the work breaks at 3 agents. A coordinator that manages the work scales to 8.

I stopped everything and rebuilt my workflow around this pattern. One week later, I submitted my first PR upstream.


What I believed in January

These were my beliefs at the time. The principles held up.

Written January 2026. The principles below proved durable. The tooling evolved into AgentOps.

1. The 40% Rule is real.

AI tools perform well below 40% of their context window. Above that, failures compound exponentially. Toyota figured this out in the 1950s: production lines run far below capacity precisely because the slack is where quality lives. The lean-manufacturing literature on this is striking, and the shape holds: push utilization up and defects climb fast. The extra capacity is what enables continuous improvement.

Same pattern shows up everywhere failure is catastrophic: aviation fuel reserves, ICU occupancy, portfolio risk. Gas Town applies it to code: each worker runs under 40% context. The coordinator stays light. Failures get isolated.

2. Isolation beats cleverness.

Weeks went into building clever coordination. Dumb isolation wins. Separate copies of the code mean no merge conflicts during parallel work. Failures don't cascade. Kill and restart workers without affecting others. Cattle, not pets.

3. Persistent state changes everything.

Before a git-backed issue tracker, every session started from zero. Now there's persistent state: what's done, what failed, what's blocked. The AI wakes up knowing the plan.

4. Validation isn't optional.

Models are trained to be about 85% correct. They will say "fixed" when it's broken. They will say "tests passed" when they didn't run. 18% of tasks last week needed a second pass. If you're not validating AI output, you're shipping hope, not code.

Successive refinement (fail, fix, repeat) is the only reliable pattern. It often takes multiple passes to bridge that last 15%.


The 50 First Dates problem

Every AI coding session starts from scratch. The model doesn't remember what you were working on yesterday. It doesn't know what already failed. It's Drew Barrymore waking up fresh every morning, except instead of falling in love with you, it's falling in love with solutions you already tried that don't work.

The community calls this the "50 First Dates problem."

What worked:

  • CLAUDE.md files with project context
  • Persistent directories (.agents/research/, .agents/plans/)
  • Vibe levels (a calibration system for how much to trust the model)

What didn't:

  • Hoping the model would "just remember"
  • Ever-longer system prompts (context rot is real)
  • My first orchestrator, which should have been called "the money pit"

Memory


Ralph Wiggum, RIP

Then Ralph Wiggum dropped. Ralph is a loop pattern: you have Claude write to files, read its own output, and iterate until tests pass. Viral for a reason.

Ralph worked great for about two weeks. Then I started running it overnight, waking up to finished work, and feeling very clever.

The agent spirals on the same bug for hours, confident it's making progress. It tries the same fix seventeen times with minor variations, each time expecting different results. Optimism without memory.

Ralph v Gas Town


The gatekeeper

Gas Town gave me parallelization. The quality problem (the "Ralph" factor) remained.

So I built a gatekeeper.

Gatekeeper

The gatekeeper validates every push: type checks, linting, complexity analysis, builds. The gatekeeper doesn't care about feelings or deadlines. Successive refinement (fail, fix, repeat) until the code is clean or you give up.

This is the part people miss about AI-generated code: it looks right. It types right. It's confidently, beautifully wrong.


The factory floor

There's a pattern emerging here, and it's not unique to AI.

TSMC dominates semiconductors because they figured out yield optimization at scale. Everyone buys the same machines; the operational discipline produces consistent quality at high throughput.

The job isn't to write code anymore. The job is to build and run AI coding foundries: designing the factory floor, tuning the conveyor belts, watching the defect rates and adjusting the acceptance criteria. The code is just the output of the machine you're building.

A different skill. More systems engineering than software engineering. The same shift is coming for everyone who uses AI, not just people who ship code, which is the whole reason I started writing this down in plain language.

Coding Foundry


The flywheel bet

The bet: every dollar spent makes the system smarter. Retros become searchable wisdom. Patterns get promoted to CLAUDE.md rules. Standards keep the next agent from repeating the last one's mistakes. Work produces byproducts, byproducts become knowledge, knowledge enables better work.

More on this in a future devlog.

Flywheel


Try It

The stack:


What came next

This was devlog #1. The series continued:

  • Devlog #2: The great consolidation, 60+ agents down to 4
  • Devlog #3: From vibe coder to vibe engineer; learning the system, not just the tools
  • Devlog #4: Why specs are the leverage point, not agents
  • Devlog #5: When the platform catches up, and why the patterns still matter

The landscape changed fast. The principles (isolation, persistent state, validation, the 40% rule) held up. The tooling evolved into AgentOps. And the deeper bet held too: the discipline I built here to make AI output trustworthy for engineers is the same discipline I now translate for people who aren't engineers at all. That's the throughline behind everything on this site, and the whole point of AI Partner.