Skip to content

Building vibe-check: 1,563 Lines I Deleted

How vibe-check became validation infrastructure for AI coding sessions by turning git history into reliability metrics.

November 29, 2025·5 min read
#vibe-coding#ai-development#developer-tools#open-source

This essay is part of the reliable AI-assisted delivery trail: proof, method, and judgment for making fast AI work reviewable and safe to ship. Start with the curated writing paths or inspect the proof.

1,563 Lines of Math That Solved a Problem Nobody Had

Five hours into building vibe-check, an ML prediction system existed. Ordered Logistic Regression. Expected Calibration Error. The whole academic stack. It predicted which trust level to use for a given task.

The problem: you already know what task you're doing. Nobody needs a model to tell them that OAuth integration is riskier than formatting code.

git rm -rf src/recommend/    # Ordered logistic regression
git rm -rf src/calibration/  # ECE calculations
git rm src/commands/level.ts # The prediction command

1,563 lines deleted. One commit. Twenty-one hours after implementing it.

In the two hours after that deletion, more useful features shipped than in the entire previous day. The ML system was occupying the mental space where real features needed to be.

MetricValue
Time building ML5h 18m
Time ML existed21h 31m
Lines deleted1,563
Time from delete to next ship1h 58m

The lesson was about trust calibration. At L4 (high trust), the instinct was to ask the AI to "fix the ML tests." It would have done it. The feature would have shipped and required maintenance forever. By dropping to L1 (verify every line), the real problem became visible: the feature itself was wrong.

That experience is why vibe-check exists. It surfaces patterns from your git history so you can catch yourself before sinking five hours into the wrong thing. It is the discipline between generation and trust made measurable: generation is cheap, proving the output is worth shipping is the scarce part.


The Five Core Metrics

All five come from git history rather than code content. The tool never reads your source files, just commit metadata. Timestamps can't be gamed, and behavior reveals more than intentions.

1. Trust Pass Rate

The percentage of commits that don't require an immediate fix. When a commit lands and no fix follows within 10 minutes, that's a trust pass. A high rate means your calibration is accurate: you're trusting AI on tasks where it's reliable. A low rate means you're over-trusting on complex work.

2. Rework Ratio

Fix commits as a percentage of total work. Some rework is healthy; zero fixes probably means you're over-verifying. But when rework climbs above 25%, you're spending more time correcting than building.

3. Debug Spirals

Three or more consecutive fix commits on the same component. One fix is normal. Two happens. Three means you're patching symptoms while the AI keeps generating broken code. The count tells you how often you get stuck.

4. Spiral Duration

Total time spent inside those fix loops. Five minutes of debugging is fine. Forty-five minutes means you should have stepped back, switched approaches, or dropped to a lower trust level twenty minutes ago.

5. Flow Efficiency

The meta-metric: (Active time - Spiral duration) / Active time. Are you in a productive flow state, or stuck in the weeds? Active time comes from commit timestamps; spiral duration is subtracted to get productive building time.


Vibe Levels

The framework underneath these metrics is a trust scale you declare before starting work:

LevelTrustVerificationExample Tasks
L595%Final onlyFormatting, linting
L480%Spot checkBoilerplate, config
L360%Key outputsCRUD, standard tests
L240%Every changeFeatures, integrations
L120%Every lineArchitecture, security
L00%N/ANovel research

Declaring the level upfront forces a decision about what kind of task you're doing. After the session, comparing what actually happened against what you expected sharpens the intuition. Do that fifty times and you stop guessing.


The Tool

npm install -g @boshu2/vibe-check

Or run directly:

npx @boshu2/vibe-check

Sample output:

# terminal
$ vc --since "1 week ago"

VIBE-CHECK Nov 21 - Nov 28

  Trust: 94%
  Rework: 18%
  Spirals: 1 detected (12 min)
  Flow: 87%

Note: vibe-check is a tool for you, not your manager. It measures your own patterns so you can improve your AI collaboration. Don't use it to measure other people.

The trust scale started as an engineering tool. The move underneath it (decide upfront how much to trust the machine, then prove the result before you ship) is the same discipline I translate for people who aren't engineers.


Try It

# Install
npm install -g @boshu2/vibe-check

# Run your first check
vc --since "1 week ago"

# Or use npx
npx @boshu2/vibe-check

Links: npm · GitHub