Getting Started with Vibe Coding
This guide is for developers who've tried AI coding tools and found them... unpredictable. Maybe Copilot felt like autocomplete that thinks it's smarter than it is. Maybe Claude generated code that worked perfectly, then code that broke in weird ways, and you couldn't tell which you'd get.
If you're still skeptical about AI coding, that's fair. Stick around anyway. Calibration is the part that makes it actually work.
OK, so here's the thing nobody talks about: that unpredictability matters more than capability.
Andrej Karpathy coined the term for letting AI write code while you direct. Gene Kim and Steve Yegge's methodology builds on that with a key insight: AI reliability varies dramatically by task type. Boilerplate config? Nearly perfect. Novel architecture decisions? Needs verification every step. Most people use the same level of trust for everything, and that's where things go sideways.
This post is about calibration. The personal story is at /principles, and the tooling deep-dive is at /builds/vibe-check.
The Vibe Levels
The solution is simple. Declare how much you'll trust AI before you start, then verify accordingly.
| Level | Trust | How to Verify | Example Tasks |
|---|---|---|---|
| L595% trust | 95% | Final result only | npm install, formatting, linting |
| L480% trust | 80% | Spot check major outputs | Boilerplate, config files, CRUD endpoints |
| L360% trust | 60% | Verify key outputs | Features with known patterns, tests |
| L240% trust | 40% | Check every change | New integrations, unfamiliar APIs |
| L120% trust | 20% | Review every line | Architecture, security, core business logic |
| L00% trust | 0% | AI for research only | Novel problems, exploration, spikes |
The levels measure your ability to evaluate AI output. If you don't know TypeScript well enough to spot a subtle type error, that's L2 work for you even if it's L4 for someone else.
Your First Session
Here's how to try this today.
Step 1: Install vibe-check
npm install -g @boshu2/vibe-check
This gives you metrics on your actual work patterns, measured from git commits.
Step 2: Pick a Small Task
Choose something you'd normally complete in 30-60 minutes:
- Add a new API endpoint
- Create a UI component
- Write tests for existing code
- Fix a specific bug
Starting small matters. You need one session's worth of data before the metrics mean anything.
Step 3: Declare the Vibe Level
Before you start, ask: "How much should I trust AI for this specific task?"
Look at the table above and pick a level. Then write it down somewhere, even if it's just a comment in your terminal:
L3: Adding auth middleware - known pattern, verify key outputs
The act of declaring forces you to think about what you're doing before you do it.
Step 4: Work According to Your Level
Match your verification to your declared level:
- L4-L5: Let the AI generate, review at the end
- L3: Check after each logical chunk (one function, one component)
- L2: Verify every change before accepting
- L1: Review line by line, question everything
If you find yourself constantly checking at a lower level than you declared, that's signal. You underestimated the task complexity.
Step 5: Measure
After your session, run:
vc --since "1 hour ago"
Compare the output to your declared level. Did you spiral (lots of fix commits)? Did it go smoothly? The metrics tell you whether your calibration was right.
Common Mistakes
I made all of these. Repeatedly.
1. Running L4 on L2 tasks. You trust the AI to generate a feature you don't fully understand, accept the output without deep review, and spend the next hour debugging subtle issues. The time you "saved" gets paid back with interest. (Ask me about the three hours I lost to a single misconfigured auth middleware.)
2. Running L1 on L5 tasks. You review every line of a package.json update. The work is trivially correct, but you've burned mental energy on low-stakes decisions. I did this for a week before I noticed I was exhausted by lunch.
3. Not declaring a level at all. You work reactively, trusting when things feel right and doubting when they don't. Without a declared level, you can't calibrate. Without calibration, you can't improve.
4. Ignoring debug spirals. You commit fix after fix after fix, each one addressing the side effects of the last. Three consecutive fix commits is your signal to stop. I ignored this signal once and burned an entire afternoon on what should have been a 20-minute fix.
The Feedback Loop
After a few sessions, you'll notice patterns:
Declare Level → Work → Measure → Adjust
Some things you'll discover:
- Which tasks are actually L4 vs L2 for you specifically
- When to escalate trust (this is easier than expected) or de-escalate (I'm out of my depth)
- Your personal weak spots (CSS? Types? Database queries?)
The loop compounds. Each session teaches you something about your own reliability patterns, and that knowledge carries forward.
The Bigger Picture
This methodology comes from Gene Kim and Steve Yegge's book Vibe Coding. Their framing: you're not a line cook typing every character anymore. You're the head chef, directing AI sous chefs, tasting results, responsible for every dish that leaves the kitchen.
That last part matters. The AI generates. You're still responsible.
Kim and Yegge's research shows a concrete threshold. Context utilization above 40% degrades AI performance dramatically:
| Context Used | Success Rate |
|---|---|
| Under 35% | 98% |
| 35-40% | 95% |
| 40-60% | 72% |
| Over 60% | 24% |
That's why calibration matters. You're managing cognitive load, not just code.
The book documents 12 failure patterns where vibe coding destroys work in minutes, from "Tests Passing Lie" (AI claims tests pass but never ran them) to "Eldritch Code Horror" (3,000-line functions where everything connects to everything). The vibe levels and calibration loop are how you avoid them.
Dario Amodei (Anthropic CEO) wrote the foreword: "We are probably going to be the last generation of developers to write code by hand."
Real Example: This Website
I built this site in 48 hours without prior Next.js experience. The key numbers from vibe-check:
- 7% rework ratio: not constantly fixing mistakes
- 0 debug spirals: never got stuck in a fix loop
- L3-L4 for most work: features with known patterns, verify key outputs
When I hit unfamiliar territory like OpenGraph images, I dropped to L2 and verified every change. The metrics reflect that calibration.
Tools That Help
vibe-check: Metrics from your git history
npm install -g @boshu2/vibe-check vc --since "today"
npm: @boshu2/vibe-check | GitHub
Claude Code: AI pair programmer with context awareness
Declare your level in the prompt:
This is L2 work (new integration). Verify each change with me before moving on. Stop if anything seems off.
Pre-commit hooks: Automatic checks
Add vibe-check to your pre-push hook to catch spirals before they hit the remote.
What's Next
| When you're comfortable with... | Try... |
|---|---|
| Single sessions | Multi-session projects with progress tracking |
| Basic vibe levels | Tracer tests for L1-L2 work |
| Individual work | Context bundles for resuming work |
The full methodology is at 12factoragentops.com. Start with single sessions and expand from there.
Try It
Install
npm install -g @boshu2/vibe-check
Do whatever you were going to do today
(just declare a vibe level before you start)
Check
vc --since "today"
The first session won't tell you much. By the third, you'll see patterns. By the tenth, you'll have calibrated intuition about what works for you and what doesn't.
The goal isn't perfect AI output. It's knowing when to trust and when to verify. That's the skill that compounds.