Validation signals for AI-assisted development.

Fast AI output needs instrumentation. vibe-check turns git history into a reliability signal: what stuck, what churned, and when the session started spiraling.

It tells you whether an AI-assisted session produced durable progress or just confident motion.

5

core metrics

npm

published

git

history based

// the 5 core metrics

Trust Pass Rate is the key metric: it measures whether the trust level matched the task risk.

Iteration Velocity>5/hr / <3/hr

How tight are feedback loops?

Rework Ratio<30% / >50%

Building or debugging?

Trust Pass Rate ← KEY>95% / <80%

Does code stick?

Debug Spiral Duration<15m / >45m

How long stuck?

Flow Efficiency>90% / <70%

What % productive?

// why this matters

AI reliability varies by task type. The vibe levels answer: when can you trust AI output, and when do you verify every line?

L595% trust: formatting, linting. Run it and move on.L480% trust: boilerplate, config files. Spot-check the output.L360% trust: standard features, CRUD. Verify the key parts work.L240% trust: new features, integrations. Check every change before committing.L120% trust: architecture, security. Read every line the AI writes.L00% trust: novel research where the AI has no training data.

// the context budget

Gene Kim and Steve Yegge published the threshold research: past a fraction of the context window, the AI starts forgetting instructions and contradicting itself. vibe-check treats the budget as a first-class reliability signal.

Load just-in-time, watch utilization, and reset early instead of arguing with a degraded session. The full belief is at /method: context is a budget.

// the insight

Git history is the receipt. The commits tell the truth.

vibe-check analyzes your commit patterns to detect debug spirals before they consume your whole session. Stuck for 30 minutes on the same thing? That's a wipe. Reset, do some research, come back with a plan.

The pattern it catches: eighteen minutes into arguing with the model about a circular dependency, certain the next prompt will fix it. The alert is what tells you to step away, draw the schema on paper, and come back to fix it in one commit. Minute eighteen instead of hour two.

// status

Published to npm in 2025 and quiet since. The project did its job: the ideas moved upstream. Trust calibration and spiral detection grew into the validation gates that now run inside AgentOps.

I still run the discipline it encodes: calibrate trust to task risk, watch for the spiral, reset early. The standalone CLI is the archaeology; the practice lives on in the method.

// ship it

npm →GitHub →