Skip to content

Devlog #2: How to Actually Use Your .claude/ Directory

A cleanup guide for AI coding workbenches: use commands, skills, and agents only where they strengthen the delivery loop.

January 15, 2026·7 min read
#vibe-coding#ai-development#devlog#claude-code#context-engineering

This essay is part of the reliable AI-assisted delivery trail: proof, method, and judgment for making fast AI work reviewable and safe to ship. Start with the curated writing paths or inspect the proof.

In devlog 1, the setup collapsed under its own weight. This is the cleanup guide.

This is tactical .claude/ directory documentation from January 2026. For the strategic delivery method that evolved from these patterns, see /workflow.


If you've looked at your .claude/ directory and thought "what the hell am I doing?", if you've got 20+ agents and can't remember what half of them do, this post is for you.

Commands vs Skills vs Agents

Nobody explains this clearly, because honestly it keeps changing.

Claude Code has three main extension points: commands, skills, and agents. They sound similar. They live in similar directories. The APIs have merged and shifted. If you're confused about what goes where, you're not alone.

These three divide up like this, as of early 2026, when the APIs were still settling.

Commands vs Skills vs Agents

Commands are for workflows you consciously invoke. You type a slash, it runs. Use these when you want explicit control: "I am now doing research" or "I am now implementing this issue." You remember the name, you type it, it executes.

Skills are for context you need but don't want to think about. They auto-trigger based on phrases in your conversation. You say "what's ready to work on" and the relevant skill loads without you asking. The magic is that you don't have to remember anything. The system recognizes intent and loads what you need.

Agents are for parallel specialist review, and almost nothing else. You run them when you want multiple perspectives on the same code simultaneously. Security expert, architecture expert, code quality expert, all reviewing the same PR at once.

The decision tree:

  • Will you invoke it by name? → Command
  • Should it load automatically based on context? → Skill
  • Do you need parallel specialist perspectives? → Agent
  • Everything else? → You probably don't need it

If you're loading dozens of agents at startup, you're doing it wrong. I built 60+ "specialist" agents before realizing my context window was 50% full before I even started working, just from loading agent definitions.


The Consolidation

Then Claude Code 2.1 shipped that January and the frontmatter schema changed under me. Everything broke.

Not broke-broke. But broke enough that I had to rethink the whole setup. Commands became proper slash commands. Skills got a new auto-trigger system.

I took it as an opportunity to clean house.

The great consolidation

January 11: The great consolidation. I archived 11 agents that were duplicates or replaced by commands. Went from ~60 agents to 4 domain specialists.

January 15: Further trimmed. Archived 6 more commands that were redundant:

  • /autopilot and /autopilot-polecat → both replaced by /crank
  • /doc-coverage → just use /doc coverage subcommand
  • /load-epic → bd show does this
  • /plan-to-beads → integrated into /plan
  • /synthesis → nobody was using it

Current state:

  • 25 commands (down from 40+)
  • 36 skills (down from 39)
  • 4 agents (down from 60+)

The impact: Sessions that used to hit context limits after 20 minutes now run for hours. Hallucinations dropped noticeably. The model actually remembers what it's doing because it's not drowning in agent definitions. And startup is instant instead of that 3-second pause while everything loads.

Less overhead, same capability.

Those numbers are a January snapshot. The point is the direction (fewer moving parts), not the count.


The Walkthrough: How to Set Up Your .claude/

If I were starting over today, this is what I'd do.

Clean directory structure

Step 1: Start with CLAUDE.md

This is your universal config. It loads for every project. Put things here that apply everywhere:

  • Your workflow preferences
  • Session protocol (what to do at start/end)
  • Links to governance docs

Keep it under 200 lines. Anything longer and you're wasting context.

Step 2: Add commands for workflows you repeat

Don't go crazy. Ask yourself: "Do I type this sequence of steps more than twice a week?"

My core commands:

  • /research - Deep codebase exploration
  • /crank - Autonomous epic execution
  • /implement - Single issue work
  • /retro - Extract learnings

That's it. Four commands handle 80% of my work.

Step 3: Let skills handle the implicit stuff

Skills should fire without you thinking about them. Good skill triggers:

  • "what's ready" → shows available work
  • "create a task" → creates a beads issue
  • "research this" → runs exploration

Don't create skills for things you'll invoke explicitly. That's what commands are for.

Step 4: Only add agents for parallel specialist review

You probably don't need custom agents. The built-in Task() subagent types handle most exploration and implementation.

I only have 4 agents, and they're all for the same use case: running parallel code review before a merge. Security expert, architecture expert, code quality expert, UX expert. They each look at the same PR from a different angle.

If you're not doing parallel specialist review, you probably don't need custom agents.

Step 5: The 40% Rule

This is the most important thing I learned.

The 40% Rule

Stay under 40% context utilization. I tracked this obsessively for a month. Below 40%, tasks complete reliably. Above 60%, the model starts hallucinating, forgetting what it was doing, cheerfully producing garbage. The failure rate isn't linear. It's a cliff.

How do you stay under 40%?

  • Don't load everything at startup
  • Use skills with JIT (just-in-time) loading
  • Compact frequently. Write summaries to files, start fresh sessions
  • Kill agents that return too much context to the coordinator

Complexity is where tokens go to die.


What I Learned

Simple beats clever. I spent weeks building elaborate coordination systems. Turns out dumb isolation wins. Separate copies of the code mean no merge conflicts. Failures don't cascade. Kill and restart without affecting others.

Skills replaced most of my agents. The stuff I thought needed a "specialist agent" actually just needed a skill that auto-loads relevant context. Auto-trigger beats explicit invocation.

The 40% rule is real. Every time I've ignored it, I've regretted it. Context overflow is the silent killer of AI productivity. Above 40%, the model doesn't degrade. It lies.

Consolidation is ongoing. I archived 6 commands while writing this post. The setup I have today will probably look different in a month. That's fine. The goal is to stop wasting tokens on stuff that doesn't help.

This is the same discipline I run in production at the highest reliability bar: generation is cheap, proving the output is trustworthy is the scarce part. I build that for engineers first, then translate it into plain operating habits for people who aren't.


Try It

Want to audit your own setup? Open a terminal and list what's in your Claude config directory. Count how many markdown files you have. Check the total size. If you're surprised by what you find, you're not alone.


Want This Setup?

The workflow I landed on is available as a plugin. Check out vibe-kit on GitHub in the plugins folder, or read about 12-Factor AgentOps for the methodology behind it.

Or just grab the config template and adapt it yourself.


What's Next

The tooling kept evolving. Devlog #3 covers the next shift, from optimizing the tools to learning the system underneath them. That's where the payoff actually lived.


Devlog #2. 25 commands. 36 skills. 4 agents. Down from 60+. This is the second in a series documenting how I rebuilt my AI delivery loop. Start with devlog 1 if you haven't already.