July 1, 2026 · 6 min read · vibe-coding · ai-development · devlog · validation · codebase-audit

Devlog #7: I Ran the Gatekeeper Against My Own Site

Four audit skills inspected The Construct. I fixed 8 findings, declined 2 with evidence, and caught the validator trying to break the CSP it was supposed to protect.

I turned the gatekeeper around and pointed it at my own house.

Not the agent framework. Not a toy repo. The public site you are reading right now: the Next.js app, the MDX pipeline, the strict CSP, the green phosphor skin, the writing index, the whole thing I use as proof that I can make AI work reviewable.

Then I asked four audit skills to tell me what was wrong with it.

They did.

The run produced a small stack of receipts: an executive summary, an architecture map, a multi-domain audit, a pattern extraction report, and the resolution ledger that records what actually happened.

That last file matters most. Audits are cheap. Resolution is where the truth shows up.

The scoreboard

The audit found 13 code findings and a few human-decision items. I did not fix everything. I fixed the things that were real, cheap, and safe to land.

The final ledger:

Result	Count	What it means
Fixed	8	Real defects or high-value cleanup landed and verified
Partial	1	The cheap safe piece landed; the deeper version stayed out
Deferred	2	Valid work, wrong moment or wrong collision surface
Declined	2	The finding was wrong or the fix would break a public contract
Human	2	One-way decisions I should not make by myself

One extra fix happened while I was there: the CI audit gate. That was not part of the original 13 findings. It became obvious once the Next.js advisory was on the table, so I added it.

The fixed set was boring in the best way. Next.js moved off the CSP-nonce XSS advisory. The home JSON-LD moved onto the nonce-aware JsonLdScript path. Six copy-pasted Open Graph images collapsed into the existing renderer. The collapsed mobile nav stopped accepting keyboard focus while hidden. The low-contrast article metadata got raised. img-src narrowed from any https: host to the one external image host the app actually uses. Author JSON-LD got single-sourced. The repeated cache header became a constant.

That is the clean part of the story.

The useful part is the part I did not do.

The trap was `connection()`

The site uses a strict per-request CSP. Middleware mints a fresh nonce. JsonLdScript reads that nonce. The browser only trusts scripts carrying that request's nonce.

That choice forces a real cost. src/app/layout.tsx calls await connection(), which makes the app render dynamically so the nonce is fresh. The audit saw the cost: blog posts are MDX, syntax highlighted on the server, and dynamic rendering means the route can re-do work that static generation would avoid.

The tempting fix is obvious: delete connection() and get static generation back.

That would make the site faster. It would also break production.

Static HTML would bake in a build-time nonce. Middleware would mint a different nonce on the live request. The browser would see framework scripts with the wrong nonce under strict-dynamic and block them. Dev would look fine because dev CSP is looser. Production would fail in the exact place the security model is supposed to protect.

That is the trap. The audit correctly found a cost center. The naive fix would have cut the load-bearing beam.

So the resolution was narrower:

Keep connection().
Cache the repeated post read with React cache().
Mark deeper cross-request MDX compile caching as deferred.
Record why static generation is not available unless the CSP model changes.

The validator needed a validator.

That is the whole thesis in miniature. A finding is not a fix. A fix is not done because it sounds plausible. The change has to survive contact with the system it claims to improve.

Two declines I stand by

I declined the sitemap runtime finding.

The audit said sitemap.ts shells out to git log and framed it as request-time cost. The build output contradicted that: sitemap.xml is static. The git calls run at build, not per visitor request. There is still cleanup to do there, especially around hand-maintained route dates, but the claimed runtime defect was not true.

I declined the API envelope change too.

/api/posts returns { count, posts }. /api/posts/[slug] returns the raw post. That is inconsistent. It is also a public JSON surface. Breaking it for a Low-severity consistency cleanup is not discipline. It is aesthetic churn wearing a correctness costume.

So those went into the ledger as declined, with evidence.

That felt better than forcing the audit to be right.

The work I refused to touch

There was WIP in the tree before the audit landed: sitemap.ts, ai-partner/page.tsx, the Construct primitives, first-win/, and some career files. The reports noticed it. The resolution respected it.

That boundary matters. Agent work fails when it treats the repo like an empty lab bench. This repo was not empty. There was live work on the table.

So the color-ramp extraction stayed deferred because it would have churned the same Construct primitives Bo was editing. The title-description metadata builder stayed deferred because it would have fanned through WIP and low-value pages. The Dockerfile and coverage-threshold questions stayed human because they are policy calls, not code fixes.

Good automation knows when not to move.

The receipts

The fixed set got deterministic checks:

Type-check passed.
Lint passed.
The unit suite passed.
The production build passed.
A prod-start smoke verified the CSP nonce path.
Production later served the narrowed img-src header.

The numbers changed as the site changed. Before this wave, the ledger recorded 201 tests. After adding RSS and first-party events, the suite is at 216. That is the ratchet doing its job: every fix left a sharper test behind.

And the audit artifacts stayed in the repo. Not a chat transcript. Files. Paths. Evidence another agent can read cold.

What this proved

It did not prove the site is perfect. It proved the loop works on its own author.

The same system that tells me a model's code is not done until an independent check confirms it also told me my site had real defects. Then it told me two of its own recommendations were wrong or too blunt. Then it forced the resolution into a ledger where "fixed," "deferred," "declined," and "human" mean different things.

That is the gatekeeper I keep rebuilding.

Not a vibe. Not a clean sweep. A machine that can say yes, no, not yet, and prove which one it means.

Part of the reliable AI-delivery trail. Browse the curated paths or inspect the proof.