Skip to content

Multi-Agent Orchestration: Lessons from Running AI Agents Across Dozens of Production Repos

What survives multi-agent work at real scale: durable queues, scoped workers, validation gates, and context that outlives the session.

February 12, 2026·7 min read
#ai-engineering#multi-agent#agentops

This essay is part of the reliable AI-assisted delivery trail: proof, method, and judgment for making fast AI work reviewable and safe to ship. Start with the curated writing paths or inspect the proof.

Multiple AI agents running in parallel across my production repos: infrastructure, applications, and tooling.

This is what happens when you try to scale AI-assisted development to an actual workload: DoD and Intel environments, GPU clusters, Kubernetes platforms, and the tooling that holds it all together. The same reliability bar I hold in production today, on a platform/SRE team in defense autonomy.

This is what I learned.


The Architecture

The workspace is one registry over many repositories. Each repository is a unit of work with its own issue tracker (beads), its own context, its own agents. A central registry tracks which repositories exist, which agents are assigned, and what work is available.

Workspace (registry)
  ├── Repositories
  │   ├── agentops
  │   ├── personal_site
  │   ├── gastown_operator
  │   ├── ocpeast, ocphpc, ocppoc...
  │   └── ...27 more
  ├── Beads (per-repo issue tracking)
  ├── Agents (stateless workers)
  └── Knowledge (flywheel across all repos)

The key design decision: agents are stateless workers. They pick up work from a hook, execute it, and put it back. No agent owns a repository permanently. No agent carries state between sessions. All state lives in the repository: in git, in beads, in .agents/ directories.

This is the cattle-not-pets pattern from infrastructure. Agents are cattle. Repositories are the infrastructure.


Lesson 1: Workers Write, Leads Commit

The first time I ran multiple agents against the same repo, git corrupted within 20 minutes.

Two agents writing to the same git index simultaneously is a race condition. Git's index file isn't designed for concurrent writers. The result: corrupted index, lost work, manual recovery.

The fix: workers write files. A team lead commits. One agent owns the git index at any time. Workers produce artifacts. The lead stages, reviews, and commits them.

Worker A → writes files → ready for review
Worker B → writes files → ready for review
Worker C → writes files → ready for review
          ↓
Team Lead → reviews → stages → commits (one at a time)

This is the same pattern as a merge queue in CI. Serialize the commits. Parallelize the work.


Lesson 2: Pre-Identify Shared Files

Parallel agents will collide on shared files. Config files, type definitions, index modules, route registrations. Anything that multiple features touch.

Discovering this during implementation means merge conflicts, duplicated work, and agents overwriting each other's changes. Discovering it during planning means you can either:

  1. Assign shared files to one agent who handles all changes
  2. Sequence the work so shared-file changes happen in order
  3. Define interfaces so agents work against contracts, not shared state

Option 1 is simplest. Option 3 is best. In practice, I use a mix depending on how intertwined the shared files are.

The pattern: during planning, grep for files that appear in multiple beads. Flag them. Decide the coordination strategy before anyone writes code.


Lesson 3: One Commit Per Issue

Every bead gets its own commit. No batching. No "fixed a few things" commits. This is operational infrastructure.

When something breaks in production, git bisect needs atomic commits to find the cause. When a feature needs to be reverted, a clean single commit reverts cleanly. When you're reviewing agent output, one commit per issue lets you evaluate each unit of work independently.

# Clean revert
git revert abc123   # Reverts exactly one feature

# vs. batched commits
git revert def456   # Reverts three features, two of which were fine

The agents resist this, by the way. They want to batch. They want to "also fix this other thing I noticed." Fight it. One bead, one commit.


Lesson 4: Fewer Workers, More Waves

My first instinct was maximum parallelism. Five agents, five features, everything at once. Fast.

Reality: foundation work can't be parallelized effectively. When multiple agents need to modify the same architectural layer (adding a new module type, changing a shared interface, restructuring a directory) parallel execution creates cascading merge conflicts.

The fix: waves.

Wave 1 (2 agents): Foundation, types, interfaces, shared config
Wave 2 (3 agents): Features, independent modules built on wave 1
Wave 3 (2 agents): Integration, connect features, update routes, final tests

Each wave completes and commits before the next wave starts. Within a wave, agents work on files that don't overlap. Between waves, the shared state is stable.

More waves, fewer agents per wave, clear handoff points. Slower wall-clock time but dramatically fewer failures. Net throughput is higher because you're not spending half your time resolving conflicts.


Lesson 5: The Hook System

Agents need a way to find work. Not "here's a task," which requires a human in the loop for every assignment. The system should be self-serve.

The hook pattern: available work hangs on a hook. Agents check the hook, grab work, execute, return results.

bd ready             # What work is available on the hook?
# assign a ready issue to a worker, it executes, results go back

The hook is a coordination primitive. It answers "what should I work on next?" without requiring a human to answer that question every time. Agents pull work; humans stock the hook.

This scales. Five agents checking the hook in parallel will each grab different work. The hook is the load balancer.


Lesson 6: Knowledge Crosses Repos

A learning from the personal_site repo applies to the agentops repo. A pattern discovered in ocpeast matters for ocphpc. Knowledge doesn't respect repository boundaries.

The flywheel operates at the workspace level, not the repo level. When an agent discovers that "acceptance checks must be token-specific, not category-level," that learning applies everywhere, not just the repo where it was discovered.

This means the inject system needs to be cross-repo. An agent starting work on the CI container should have access to learnings from the Kubernetes platform repos, because the failure patterns are similar.

ao inject            # Loads workspace-level knowledge, not just repo-level

The cross-pollination effect is real. Patterns from infrastructure work improved my application development. Patterns from writing tooling improved my infrastructure automation. The knowledge flywheel works best when it's not siloed.


What Breaks

What still doesn't work:

Context overflow. Large repos with deep dependency trees blow past the 40% context budget. Agents start hallucinating file paths, inventing APIs, and impressively referencing code that doesn't exist. The fix is aggressive scoping: never load the whole repo, only load what's relevant to the current bead.

Cross-repo dependencies. When a change in one repo requires a coordinated change in another, the orchestration gets manual fast. No good automation for "update the API in repo A, then update the client in repo B, then test the integration." This is a gap.

Agent drift. Long sessions where the agent gradually loses the plot. Starts strong, makes good progress, then somewhere around the 40-minute mark begins making changes that don't serve the original goal. The fix is short sessions with hard scope boundaries. But it means more session overhead.


The Scale Test

This isn't a toy example. The workspace includes:

  • Kubernetes platform management (3 clusters)
  • GPU infrastructure (100+ GPUs)
  • Application deployment (50+ AI applications)
  • Developer tooling (CLI tools, MCP servers, automation)
  • Knowledge management (the flywheel itself)

The orchestration patterns that work at this scale aren't sophisticated. They're simple, enforced, and relentlessly consistent. Serialize commits. Pre-identify shared files. One commit per issue. Waves over parallelism. Knowledge crosses boundaries.

This is the discipline between generation and trust, and it's the same discipline I now translate into safe, plain-language practice for people who aren't engineers.