Skip to content

/builds/vibe-check

// validation signals for AI-assisted development

Fast AI output needs instrumentation. vibe-check turns git history into a reliability signal: what stuck, what churned, and when the session started spiraling.

It tells you whether an AI-assisted session produced durable progress or just confident motion.

5
core metrics
npm
published
git
history based

// the 5 core metrics

Iteration Velocity
>5/hr/<3/hr

How tight are feedback loops?

Rework Ratio
<30%/>50%

Building or debugging?

Trust Pass Rate ← KEY
>95%/<80%

Does code stick?

Debug Spiral Duration
<15m/>45m

How long stuck?

Flow Efficiency
>90%/<70%

What % productive?

Trust Pass Rate is the key metric. It measures whether the trust level matched the task risk.

// vibe_check_output.log
Iteration Velocity
DPS uptime
>3/hr
🔄
Rework Ratio
Wipe count
<50%
Trust Pass Rate
First try kills
>80%
🌀
Debug Spiral
Time to reset
<30m
🎯
Flow Efficiency
Boss uptime
>75%
OVERALL
ELITE
npx @boshu2/vibe-check

// why this matters

AI reliability varies by task type. Formatting is nearly always correct. Architecture needs line-by-line verification. The vibe levels answer the question: when can you trust AI output, and when do you need to verify every line?

L595% trust: formatting, linting. Run it and move on.L480% trust: boilerplate, config files. Spot check the output.L360% trust: standard features, CRUD. Verify the key parts work.L240% trust: new features, integrations. Check every change before committing.L120% trust: architecture, security. Read every line the AI writes.L00% trust: novel research where the AI has no training data.

Declaring the level upfront forces you to think about what kind of task you're doing. After the session, compare what actually happened to what you expected.

// the 40% rule

Gene Kim and Steve Yegge found a hard threshold in their research. When context utilization stays under 40%, success rate is 98%. Above 60%, it drops to 24%. The AI starts forgetting instructions and contradicting itself.

<40% context98% success
>60% context24% success

This is why spiral detection matters. When you're stuck in a fix loop, context fills up fast.

// the insight

Git history is the receipt. The commits tell the truth.

vibe-check analyzes your commit patterns to detect debug spirals before they consume your whole session. If you're stuck for 30 minutes on the same thing, that's a wipe, reset, do some research, and come back with a plan.

"Last week, the CLI flagged a spiral at 18 minutes. I realized I was arguing with the LLM about a circular dependency. I stepped away, drew the schema on paper, and fixed it in one commit. Without the alert, I would have wasted two hours."

// results

I've been running this methodology since 2023. When I follow the discipline, it works. When I skip calibration because I'm in a hurry, I pay for it in rework.

95%
success rate
34x
commit throughput
2x
first-pass acceptance
10:1
ROI on time

// for autonomous agents

vibe-check measures human-AI collaboration sessions. For autonomous agent workflows, 12-Factor AgentOps applies DevOps and SRE discipline to the delivery system around the agents.

12-Factor AgentOps →
npm →GitHub →