This guide is for developers who've tried AI coding tools and found them... unpredictable. Maybe Copilot felt like autocomplete that thinks it's smarter than it is. Maybe Claude generated code that worked perfectly, then code that broke in weird ways, and you couldn't tell which you'd get.

If you're still skeptical about AI coding, that's fair. Stick around anyway. Calibration is the part that makes it actually work.

OK, so here's the thing nobody talks about: that unpredictability matters more than capability.

Andrej Karpathy coined the term for letting AI write code while you direct. Gene Kim and Steve Yegge's methodology builds on that with a key insight: AI reliability varies dramatically by task type. Boilerplate config? Nearly perfect. Novel architecture decisions? Needs verification every step. Most people use the same level of trust for everything, and that's where things go sideways.

> INFO:

This post is about calibration. The personal story is at /principles, and the tooling deep-dive is at /builds/vibe-check.

The Vibe Levels

The solution is simple. Declare how much you'll trust AI before you start, then verify accordingly.

Level	Trust	How to Verify	Example Tasks
L595% trust	95%	Final result only	`npm install`, formatting, linting
L480% trust	80%	Spot check major outputs	Boilerplate, config files, CRUD endpoints
L360% trust	60%	Verify key outputs	Features with known patterns, tests
L240% trust	40%	Check every change	New integrations, unfamiliar APIs
L120% trust	20%	Review every line	Architecture, security, core business logic
L00% trust	0%	AI for research only	Novel problems, exploration, spikes

The levels measure your ability to evaluate AI output. If you don't know TypeScript well enough to spot a subtle type error, that's L2 work for you even if it's L4 for someone else.

Your First Session

Here's how to try this today.

Step 1: Install vibe-check

bash

npm install -g @boshu2/vibe-check

This gives you metrics on your actual work patterns, measured from git commits.

Step 2: Pick a Small Task

Choose something you'd normally complete in 30-60 minutes:

Add a new API endpoint
Create a UI component
Write tests for existing code
Fix a specific bug

Starting small matters. You need one session's worth of data before the metrics mean anything.

Step 3: Declare the Vibe Level

Before you start, ask: "How much should I trust AI for this specific task?"

Look at the table above and pick a level. Then write it down somewhere, even if it's just a comment in your terminal:

bash

L3: Adding auth middleware - known pattern, verify key outputs

The act of declaring forces you to think about what you're doing before you do it.

Step 4: Work According to Your Level

Match your verification to your declared level:

L4-L5: Let the AI generate, review at the end
L3: Check after each logical chunk (one function, one component)
L2: Verify every change before accepting
L1: Review line by line, question everything

> TIP:

If you find yourself constantly checking at a lower level than you declared, that's signal. You underestimated the task complexity.

Step 5: Measure

After your session, run:

bash

vc --since "1 hour ago"

Compare the output to your declared level. Did you spiral (lots of fix commits)? Did it go smoothly? The metrics tell you whether your calibration was right.

Common Mistakes

I made all of these. Repeatedly.

1. Running L4 on L2 tasks. You trust the AI to generate a feature you don't fully understand, accept the output without deep review, and spend the next hour debugging subtle issues. The time you "saved" gets paid back with interest. (Ask me about the three hours I lost to a single misconfigured auth middleware.)

2. Running L1 on L5 tasks. You review every line of a package.json update. The work is trivially correct, but you've burned mental energy on low-stakes decisions. I did this for a week before I noticed I was exhausted by lunch.

3. Not declaring a level at all. You work reactively, trusting when things feel right and doubting when they don't. Without a declared level, you can't calibrate. Without calibration, you can't improve.

4. Ignoring debug spirals. You commit fix after fix after fix, each one addressing the side effects of the last. Three consecutive fix commits is your signal to stop. I ignored this signal once and burned an entire afternoon on what should have been a 20-minute fix.

The Feedback Loop

After a few sessions, you'll notice patterns:

// The Loop

Declare Level → Work → Measure → Adjust

Some things you'll discover:

Which tasks are actually L4 vs L2 for you specifically
When to escalate trust (this is easier than expected) or de-escalate (I'm out of my depth)
Your personal weak spots (CSS? Types? Database queries?)

The loop compounds. Each session teaches you something about your own reliability patterns, and that knowledge carries forward.

The Bigger Picture

This methodology comes from Gene Kim and Steve Yegge's book Vibe Coding. Their framing: you're not a line cook typing every character anymore. You're the head chef, directing AI sous chefs, tasting results, responsible for every dish that leaves the kitchen.

That last part matters. The AI generates. You're still responsible.

Kim and Yegge's research shows a concrete threshold. Context utilization above 40% degrades AI performance dramatically:

Context Used	Success Rate
Under 35%	98%
35-40%	95%
40-60%	72%
Over 60%	24%

That's why calibration matters. You're managing cognitive load, not just code.

The book documents 12 failure patterns where vibe coding destroys work in minutes, from "Tests Passing Lie" (AI claims tests pass but never ran them) to "Eldritch Code Horror" (3,000-line functions where everything connects to everything). The vibe levels and calibration loop are how you avoid them.

> INFO:

Dario Amodei (Anthropic CEO) wrote the foreword: "We are probably going to be the last generation of developers to write code by hand."

Real Example: This Website

I built this site in 48 hours without prior Next.js experience. The key numbers from vibe-check:

7% rework ratio: not constantly fixing mistakes
0 debug spirals: never got stuck in a fix loop
L3-L4 for most work: features with known patterns, verify key outputs

When I hit unfamiliar territory like OpenGraph images, I dropped to L2 and verified every change. The metrics reflect that calibration.

Tools That Help

vibe-check: Metrics from your git history

bash

npm install -g @boshu2/vibe-check vc --since "today"

npm: @boshu2/vibe-check | GitHub

Claude Code: AI pair programmer with context awareness

Declare your level in the prompt:

// Example prompt

This is L2 work (new integration). Verify each change with me before moving on. Stop if anything seems off.

claude.com/claude-code

Pre-commit hooks: Automatic checks

Add vibe-check to your pre-push hook to catch spirals before they hit the remote.

What's Next

When you're comfortable with...	Try...
Single sessions	Multi-session projects with progress tracking
Basic vibe levels	Tracer tests for L1-L2 work
Individual work	Context bundles for resuming work

The full methodology is at 12factoragentops.com. Start with single sessions and expand from there.

Try It

bash

Install

npm install -g @boshu2/vibe-check

Do whatever you were going to do today

(just declare a vibe level before you start)

Check

vc --since "today"

The first session won't tell you much. By the third, you'll see patterns. By the tenth, you'll have calibrated intuition about what works for you and what doesn't.

The goal isn't perfect AI output. It's knowing when to trust and when to verify. That's the skill that compounds.

Building This Website with Vibe-Coding

First-time web dev at L3-L4. 48 hours from zero to production.

Building vibe-check

The tooling behind vibe-check: ML deletion story and all.

Getting Started with Vibe Coding

The Vibe Levels

Your First Session

Step 1: Install vibe-check

Step 2: Pick a Small Task

Step 3: Declare the Vibe Level

L3: Adding auth middleware - known pattern, verify key outputs

Step 4: Work According to Your Level

Step 5: Measure

Common Mistakes

The Feedback Loop

The Bigger Picture

Real Example: This Website

Tools That Help

What's Next

Try It

Install

Do whatever you were going to do today

(just declare a vibe level before you start)

Check

Related Posts