Skip to content
← /work

/work/vibe-check

// validation signals for AI-assisted development

Fast AI output needs instrumentation. vibe-check turns git history into a reliability signal: what stuck, what churned, and when the session started spiraling.

It tells you whether an AI-assisted session produced durable progress or just confident motion.

5
core metrics
npm
published
git
history based

// the 5 core metrics

Trust Pass Rate is the key metric: it measures whether the trust level matched the task risk.

Iteration Velocity>5/hr / <3/hr
How tight are feedback loops?
Rework Ratio<30% / >50%
Building or debugging?
Trust Pass Rate ← KEY>95% / <80%
Does code stick?
Debug Spiral Duration<15m / >45m
How long stuck?
Flow Efficiency>90% / <70%
What % productive?

// why this matters

AI reliability varies by task type. The vibe levels answer: when can you trust AI output, and when do you verify every line?

L595% trust: formatting, linting. Run it and move on.L480% trust: boilerplate, config files. Spot-check the output.L360% trust: standard features, CRUD. Verify the key parts work.L240% trust: new features, integrations. Check every change before committing.L120% trust: architecture, security. Read every line the AI writes.L00% trust: novel research where the AI has no training data.

// the 40% rule

Gene Kim and Steve Yegge found a hard threshold. Under 40% context utilization, success rate is 98%. Above 60%, it drops to 24%; the AI starts forgetting instructions and contradicting itself.

<40% context
98% success
>60% context
24% success

// the insight

Git history is the receipt. The commits tell the truth.

vibe-check analyzes your commit patterns to detect debug spirals before they consume your whole session. Stuck for 30 minutes on the same thing? That's a wipe. Reset, do some research, come back with a plan.

"Last week, the CLI flagged a spiral at 18 minutes. I realized I was arguing with the LLM about a circular dependency. I stepped away, drew the schema on paper, and fixed it in one commit. Without the alert, I would have wasted two hours."

// results

Running this methodology since 2023. When I follow the discipline, it works. When I skip calibration because I'm in a hurry, I pay for it in rework.

95%
success rate
2x
first-pass acceptance
10:1
ROI on time

// ship it

npm →GitHub →