Devlog #3: From Vibe Coder to Vibe Engineer

In devlog 1, the multi-agent setup collapsed under its own weight. In devlog 2, I consolidated from 60+ agents down to 4. This is what happened next.

The tools change faster than you can learn them. My .claude/ directory (the configuration files that control how Claude Code behaves) has been rewritten 4 times in 6 months. Not because I was wrong, but because the platform changed underneath me.

If you're happy letting Claude Code handle everything without understanding why it works, this isn't for you. This is for people who want to know what's actually happening under the hood.

Claude directory evolution

The Kelsey Hightower Test

Kelsey Hightower became a distinguished engineer at Google by doing something unfashionable: taking the time to see through the hype, get his hands dirty, and really learn what technologies are about.

He has this approach whenever new technology drops. Three questions:

1. What is it? 2. What problem does it solve? 3. How is it different from what's currently out there?

If you can't answer those questions about your coding tool of choice (Plan Mode, Skills API, whatever), I challenge you to spend more time learning. Not learning the builds. Learning the system.

Real vibe coding should embrace this. It's what every other engineering profession does. Mechanical engineers understand material properties, not just CAD software. Electrical engineers understand circuit theory, not just simulation tools. Software engineers using AI should understand how the AI actually works, not just what prompts to copy.

Here's the thing people get wrong about vibe coding: they think it means turning your brain off. Let the AI handle it. Stop thinking so hard.

That's backwards.

Vibe coding means you need to understand MORE, not less. You need to know how Claude Code actually works, not just what buttons to press. How does Explore mode spawn cheap Haiku subagents? What triggers auto-compaction? When does the skill system decide to load context vs skip it?

This is what separates a vibe coder from a vibe engineer.

The Hypothesis

I've been taking the Kelsey test to the extreme for the past 100 days.

Testing every coding CLI. Every framework. Every orchestrator. Building three of my own (now on v4 of what I hope to release soon). Logging everything. "This worked today," and then when it stops working, I can analyze all the data, all the assumptions, all the knowledge I had at that time.

The knowledge flywheel from devlog 1? This is the other side of it. Not just the markdown artifacts the AI generates, but the knowledge inside of me. I've learned more about software design in the past 100 days than in years before, just by trying to operationalize AI agents and the output of coding agents.

Here's the hypothesis I've been operating under:

What if everything we do interfaces through natural language to an agent that orchestrates our lives and knowledge work?

I've been acting as if that's true. Not because I'm certain it is, but because there's value in taking a hypothesis and acting as if it's true. You learn things you wouldn't otherwise.

The theory of relativity served us for decades. Then we discovered quantum mechanics when we needed to. But that didn't invalidate relativity. It just meant the theory is a useful tool for a certain domain.

Same with this. Maybe the "everything through agents" hypothesis breaks down somewhere. Maybe there are tasks that will always need direct human execution. But until I find that boundary, I'm going to keep pushing on the hypothesis. And the knowledge I gain along the way? That transfers regardless.

Training mode

Don't Blindly Follow Acronyms

Every month there's a new viral methodology. GSD. TDD. SPECDD. Some framework someone on Twitter swears by.

(Looking at you, Ralph. You know who you are.)

Here's the uncomfortable truth: most people adopting these don't understand them. They copy the acronym, miss the reasoning, and wonder why it doesn't work.

Tier list trap

The conventional wisdom says don't engineer solutions around model weaknesses. But I'd argue: learning the weaknesses IS the skill. Not building brittle hacks, but developing intuition for:

When the model is confabulating vs reasoning
When context is degrading vs holding
When to push through vs start fresh
WHY a feature exists, not just that it exists

That intuition doesn't come from reading docs. It comes from watching things fail. A lot. And then asking "why did that fail?" instead of just retrying.

Gene Kim and Steve Yegge say it in the Vibe Coding book:

"You need at least one year of working with LLMs before you can start to even trust them."

Not trust as in "believe what they say." Trust as in "know when to believe them."

Learning curve

Year one is about learning the failure modes. Year two is about predicting them. Year three, maybe, is about preventing them.

I'm somewhere between one and two.

What CC Improved (And What That Means)

The manual practices I built are becoming native features.

What I Built	What CC Ships Now
40% rule vigilance	Context warnings, auto-compaction
Research-Plan-Implement workflow	Plan Mode enforces research-first
Model routing tables	`/model` menu with presets
Progressive disclosure	Skills API three-level loading

Plan Mode

Plan Mode is good. It triggers Explore agents that spawn parallel Haiku subagents to research your codebase simultaneously. Multi-threaded exploration with automatic model routing. That's not "think before coding." That's a real architectural improvement.

But here's what separates people who use Claude Code from people who understand it:

Using it: "Plan Mode helps me think through problems."

Understanding it: "Haiku gathers: cheap, fast, parallel exploration across the codebase. Opus gleans: synthesizing the findings, validating the approach, building the knowledge flywheel. I can explore 5 directions simultaneously without blowing my context budget. But I need to be specific. Haiku takes instructions literally." (I call this "gather and glean," five cheap brains exploring while Opus synthesizes the summary.)

The person who understands it knows when to use Plan Mode (complex multi-file changes), when to skip it (quick fixes), and how to structure their plan file to get good results from the Explore agents.

But here's what nobody tells you: Plan Mode still isn't all the way there. You have to hack it to get the best results. For example, I explicitly tell Explore agents to ignore test/ directories during the gather phase, otherwise they burn tokens reading mocks instead of actual logic. The default behaviors aren't optimal for every workflow. I've built workarounds on top of workarounds, and that's part of the learning.

The tools got better. The judgment requirement didn't disappear. It shifted. And judgment requires understanding.

The Judgment Layer

CC absorbed the mechanics (the boring stuff: context warnings, model routing, progressive loading). The judgment is still yours.

You still need to know:

When the model is confabulating (the "tests passing lie")
When to trust Plan Mode vs when to skip it
When context is degrading before CC warns you
When to push through vs start fresh

The 12 failure patterns from the Vibe Coding book? They still happen. CC just made some of them less frequent.

Failure patterns

Inner loop failures (seconds to minutes), CC helps with these:

The "tests passing" lie (AI claims tests pass, never ran them)
Context amnesia (forgets instructions from 5 minutes ago)
Instruction drift (gradually deviates from requirements)

Outer loop failures (days to weeks), still on you:

Bridge torching (API changes break downstream services)
Deletion disaster (deletes "unused" code that was critical)
Stewnami (yes, that's a real term, when AI overwrites everything with confident garbage)

The model still doesn't understand impact. It still deletes code without knowing what depends on it. It still breaks APIs without understanding the blast radius.

The mechanics got automated. The architecture decisions didn't.

The Skill Evolution

Here's the timeline of Claude Code's skill system. Each shift required adaptation.

Early 2025: Custom slash commands. Markdown files in .claude/commands/. Simple enough, except invocation was flaky. The model would miss triggers constantly, so I built command wrappers that explicitly invoked skills.

Adaptation: Work around unreliable triggers.

Mid 2025: Skills as a separate concept. The key innovation was third-person trigger phrases: "This skill should be used when..." That dramatically improved invocation. But commands and skills coexisted awkwardly. I had thousands of lines of commands, always loaded.

Adaptation: Restructure everything for JIT loading.

January 2026: Unification. Commands merged into skills overnight. My /commands/ directory got archived. Three-level progressive disclosure became native.

Adaptation: Archive old patterns, learn new primitives.

Skills timeline

Each shift wasn't "CC got better so I do less." It was "CC changed so I adapt differently."

The January 7th release alone broke half my setup. I spent three days rebuilding. By the time I finished, I had a better system than before, but only because I'd already rebuilt it three times and knew what to optimize for.

The Meta Keeps Shifting

That's not a bug. That's the game.

Early 2025: Raw capability, figure it out yourself
Late 2025: Power users build survival patterns
2026: CC absorbs patterns, raises the floor
2027: ???

Here's my actual advice: start now.

Yes, the meta will change tomorrow. That's exactly why you should start today. The people who will struggle in 2027 are the ones who waited for the tools to "stabilize." They'll memorize "use Plan Mode for complex tasks" without understanding WHY. When Plan Mode changes (and it will), they'll be lost.

The people who will thrive? They're putting in reps now. Building intuition. Asking the Kelsey questions about every new feature.

Velocity metrics

The tools changed. The output stayed consistent. That's not because of any one tool. It's because understanding how the system works lets you adapt when the platform changes.

What early adopters actually have:

Pattern recognition: you've seen enough failures to predict them. When Claude says "I've fixed the issue" with that particular confidence, you know to verify.
Adaptation speed: you've rebuilt your workflow 4+ times. The January 7th release was a speed bump, not a wall.
Calibrated trust: you know which tasks need human review and which can run overnight. This isn't gut feel. It's understanding which failure patterns apply to which task types.
Mental model of the system: you know Claude Code is a REPL wrapping an LLM with tool use. You know subagents get isolated contexts. This lets you reason about new features without waiting for tutorials.

Someone starting today gets better defaults. The floor is higher. But they haven't built the mental model for when those defaults are wrong.

That's the real lesson: the tools change, but the principles don't. Context management matters. Isolation beats cleverness. Verification isn't optional. Understanding beats memorization.

Vibe Coder (me on a bad day)	Vibe Engineer (me on a good day)
Prompts "fix this bug"	Prompts "research dependency impact first"
Copies the result blindly	Treats AI output like untrusted code: validation gates, not vibes
Blames AI when it breaks	Asks "which failure pattern was this?"
Follows tutorials	Asks the Kelsey questions
Memorizes acronyms	Builds mental models
Waits for tools to stabilize	Puts in reps now

The APIs will change. The skills you build learning them won't.

If you're hiring in 2026: Don't ask "Can you use AI?" Ask "How do you structure your context for a complex migration?" or "Walk me through how you'd debug a multi-file refactor that went wrong." The answer reveals if they're a coder or an engineer.

Try It

Don't just read what I'm saying. Don't just copy my setup from devlog 2. I'm sharing my experiences, trying to find the meta alongside everyone else. This stuff is so fluid that exploring yourself is the best way to learn and cut through the hype.

That's the real message here.

Ask the Kelsey questions about your current setup:

What is Plan Mode? (Not "it helps me think." What does it actually do?)
What problem does it solve? (Why was it built? What was broken before?)
How is it different? (From just... thinking before coding?)

Check your adaptation debt:

When did you last restructure your .claude/ directory?
Are you using features from 3 months ago that have better alternatives now?
What's in your setup that you don't remember adding?

Most people have never audited this. You might be surprised what you find.

Resources that helped me (but explore your own path):

Vibe Coding: Gene Kim & Steve Yegge on the 1-year threshold
Claude Code docs: worth re-reading every few weeks

What's Next

I'm building a Claude Code plugin marketplace: boshu2/agentops. Skills that activate on natural language intent. A tiered architecture that scales from solo dev to multi-agent orchestration. The "plan then crank" workflow I keep mentioning, packaged so you can try it yourself.

It's the patterns from this devlog series turned into installable plugins. Will parts of it be obsolete in 3 months? Probably. That's the game. Fork it, break it, rebuild it. That's how you become an engineer.

Things I'm still working through:

The Agent Skills open standard and what it means for portability
Running Gas Town workers on open source models
The multi-session memory problem (still unsolved)

What's one AI failure pattern you've hit recently that the tools didn't catch? I'm collecting examples. Reply or DM me. The best insights I've had came from comparing notes with other people in the trenches.

Devlog #3. Four rewrites. Same velocity. The tools keep changing. Learning to adapt is the skill. Start with devlog 1 if you're new here.