Preface — None of This Is Mine
Introduction — The Discipline, Named
- The argument in one paragraph
- How the book is structured
- A note on Claude as the working example
- What this book is not
- How to read it
Part I — Foundations and Taxonomy
Chapter 1 — The Agent Is Already in Your Codebase
- A Tuesday with the agent and a Friday with the same agent
- The naive response and why it breaks
- Why the book is structured around the repo
- The repo audit, in concrete terms
- The one-page repo audit
Chapter 2 — Old Principles, New Substrate
- The agent did not invent any of this
- The amplification thesis
- Two pressures the canon now has to absorb
- Why “stronger standards” is the wrong slogan
- I did not invent these standards
- Canon to harness — what each part applies
Chapter 3 — The Reliability Problem (and Why Determinism Is the Wrong Goal)
- The model is non-deterministic. That part is not a defect.
- Reliability is not determinism
- The three reliability levers
- The levers are filters; reliability is what survives them
- Workflow variance, not model variance
- What this book is not
- The harness idea is not original to this book
- Reading an unreliable run
Chapter 4 — What a Harness Is, Precisely
- The working definition
- The two channels every harness has
- The harness is code
- Every artifact in the harness passes the downstream-input test
- Interactive sessions produce artifacts; headless sessions consume artifacts
- Tools poke out; contracts stay in
- Why a tight definition matters
- A labeled inventory of a representative harness
Chapter 5 — The Taxonomy, Pinned
- The four neighbors, named
- The central pin
- Why the word matters this much
- The supporting terms
- Boundary cases, ruled
- A note on aphorisms
- The glossary card
Chapter 6 — The Repo as a Behavioral System
- Three classes of signal
- Why instruction loses to imitation
- The habitat frame
- A worked example
- Why each channel matters on its own
- What this means for where to invest
- Habitat audit
Chapter 7 — The Layers and the Reliability Levers
- Five layers, drawn as rings
- Concentric, not stacked
- The litmus test for placement
- The three reliability levers, restated
- The 5×3 grid
- Layers do not have to be five
- Concrete failure modes by layer
- How the rest of the book uses the grid
- The blank grid, ready to copy
Part II — The Habitat: Files That Govern
Chapter 8 — CLAUDE.md and the Instruction Layer
- What the file actually does
- Eight things the file should say
- Length matters
- The hierarchy
- How the file fails
- Who owns it, when it gets re-read
- The skeleton
Chapter 9 — Rule Files: Iron Laws, Golden Rules, Preferences
- Why rule files exist at all
- Three tiers, one file
- Markers on the page
- Path-scoped activation
- Traces-to footers
- The cardinal sin: aspirational rules
- How rules get written
- How rules get pruned
- Rule file skeleton
Chapter 10 — Guides, Corpus, and Playbooks
- Rules bound a move; playbooks walk a job
- Corpus is what the agent pulls from when one line is not enough
- The five corpus layers
- Where each layer is consulted
- The lazy-load discipline
- Stale corpus, and which layer lies loudest
- Ontologies and knowledge graphs — an honorable mention
- A playbook is a skill in this taxonomy
- The full picture, in one walk
- Two maps, one word
- Artifact — the corpus map
Chapter 11 — Cascade, Scope, and Inheritance
- The three rules that resolve the cascade
- The default rule: specificity wins
- Memory files: the cascade in one slice
- The override rule: locks
- The diagnostic rule: the resolved view is queryable
- Required declarations and the visible gap
- Three failure modes, all structural
- A worked example: test-coverage policy across three layers
Chapter 12 — The Token Budget
- The number that matters is not the ceiling
- Three tiers
- Lazy-load by region
- Two sample baselines
- The cost of bloat
- The pre-flight check
- Three rules for keeping the budget honest
- Compact with focus, not by reflex
- Prompt patterns that respect the budget
- The context-budget worksheet
Part III — The Codebase: Design for Agent Legibility
Chapter 13 — Architecture as Communication
- Architecture has a second reader now
- Dependencies point inward
- Prefer deep modules
- Layering as a behavioral guarantee
- Same feature, two repos
- What the agent reads in architecture
- Architecture-legibility checklist
Chapter 14 — Explicit Seams and Bounded Contexts
- What a seam is, and why the agent uses them
- Bounded contexts, operationalized
- The god module
- Actions and policies inside a context
- Ports and adapters inside one repo
- Why this is a Chapter 14 problem, not a Chapter 15 problem
- When boundaries collapse
- The seam catalog
Chapter 15 — Naming as a Behavioral Signal
- Names are constraints on the next token
- Naming conventions as priors
- Vocabulary as identity
- Domain language
- Aggressive renames over apologetic comments
- Anti-patterns to notice
- The naming audit
Chapter 16 — Boring Code: Predictability Over Cleverness
- No magic
- Boring is sophistication moved
- The agent thrives on boring code
- Cleverness traps the agent amplifies
- The agent reads what is in front of it
- The boring-code lift
Chapter 17 — Testing as Agent Infrastructure
- Tests are guides and sensors
- TDD as the agent loop
- The test pyramid still holds
- Characterization tests before refactoring legacy code
- Output-based tests beat implementation-based tests
- Mocks only for unmanaged dependencies
- What a good test teaches
- Flaky tests are harness failures
- A make-this-testable recipe
Chapter 18 — Fast Feedback Gates
- A 30-second hook beats a five-minute hook the agent skips
- Three families on the fast layer
- Type checkers as a deterministic reviewer
- Linters as encoded preferences
- What the hook output has to look like
- Pre-commit catches fast, CI catches slow
- Concrete patterns that keep the hook fast
- Hook hygiene
- The slow ratchet
- The hook template
Chapter 19 — Documentation as Corpus
- The agent reads the docs
- Marketing voice is not teaching voice
- The README contract
- Architecture decision records as agent context
- Diátaxis as a sorting frame
- C4 for the architecture docs
- Stale documentation as an aspirational rule in disguise
- The documentation flywheel
- Artifact — a docs audit
Part IV — Tools: Extending Agent Capability
Chapter 20 — What Tools Are and Why They Belong Here
- The working definition
- Three intents tools serve
- Why tools sit inside the harness
- The principle of minimum capability
- The tool ladder
- The wider primitive ladder
- Tool definitions vs tool reach
- Skills, sub-agents, hooks, plugins, MCP — a map of the part
- Model selection is part of the primitive design
- A tool inventory for a representative repo
Chapter 21 — Skills: Reusable Multi-Phase Workflows
- A skill is a walk, not an instruction
- The 6–8 checkpoint anatomy
- Skills as executable playbooks
- A slash command is not a skill
- Skill failure modes
- Acceptance evidence as the closing checkpoint
- The skill skeleton
Chapter 22 — Agents: Read-Only Specialists, Sub-Agents, Fan-Out
- The move
- Three named roles
- Read-only is the default
- Fan-out
- Adversarial verification
- Compression discipline
- Sub-agent vs MCP server
- The agent definition skeleton
Chapter 23 — Hooks: Where the Harness Acts on Its Own
- Why hooks are special
- The five events
- Use cases worth naming
- Hook hygiene
- The relationship to skills
- Anti-patterns
- The hook configuration skeleton
Chapter 24 — Plugins: Encapsulating Repeatable Agent Behaviors
- What a plugin is
- When to write a plugin
- Plugin anatomy
- Distribution patterns
- Versioning
- Anti-patterns
- The reference experiment
- The manifest skeleton
Chapter 25 — MCP: Giving Agents Access to the World
- What MCP actually gives you
- The MCP server is a behavior change
- Wrap vs shell out
- Designing the surface
- The trust boundary
- A short design walkthrough
- Public, hosted, self-hosted
- The MCP tool design checklist
Chapter 26 — Designing Tools for Predictable Agent Use
- What the agent actually reads
- Tool names as constraints
- Argument signatures
- Output shape
- Error messages as feedback
- Idempotency, and saying so
- Description text as a contract
- Before and after
Chapter 27 — Tool Boundaries and Safety
- Every tool is a capability grant
- Permission scopes, named
- Permission posture as a productivity lever
- Destructive operations
- Secret handling
- Sandboxing
- Rate limits as a safety mechanism
- Audit logs that read like a forensic record
- What the safety layer is not
- The tool-safety matrix
Chapter 28 — Verifying the Harness Itself
- The harness is code; verify it like code
- Pre-commit safeguards for the harness’s own files
- Code Health for the harness
- A failing safeguard is not negotiated away
- The verify, review, ship loop
- The harness verifies itself
- The harness pre-flight
- The verify skill
Part V — The Team / Delivery Harness
Chapter 29 — Intent Contracts (and Why Vibe Coding Breaks at Team Scale)
- What vibe coding hides
- Narrow the input, at team scale
- Seven sections, every time
- Falsifiable matters
- Tag what is known
- Goal mode and debate mode
- The ticket is the contract
- Split the contract before dispatch
- A context budget per contract
- What this discipline is borrowed from
- The honorable mention: spec-driven development
- What the next two chapters do
- The contract template
Chapter 30 — The Four Phases
- Why four
- Three touchpoints, and no more
- Each phase has an exit, and the exit is explicit
- The observability test for any gate
- The ticket is the contract
- What goes wrong without phase discipline
- A short note on tooling
- The phase-transition checklist
Chapter 31 — Interrupts: How Agents Ask
- Guessing is the failure mode
- The shape of an interrupt
- The shape of a resolution
- One question per interrupt
- The interrupt rate is a sensor on the harness
- Anti-patterns
- When the interrupt is the agent’s only honest move
- The inverse: when the human has to correct the agent
- The interrupt template
Chapter 32 — Continuous Delivery as a Harness Constraint
- Every closed contract must be independently releasable
- Main stays shippable after merge
- User-visible behavior gated by flags in the same PR
- No phase-one-of-N contracts
- Why this is a harness concern, not a release-process concern
- The agent-side leverage
- Slop gates
- The release-readiness gate
Chapter 33 — The Daemon in the Middle (Reconciler Pattern)
- What a reconciler is
- The sixty-second tick
- Fresh clones, every time
- Fire-and-forget dispatch
- The operational concerns the daemon has to handle
- Daemon versus orchestrator
- Webhook chains and why I stopped writing them
- Intent-driven delivery as the reference implementation
- A worked tick
Chapter 34 — Code Review and PRs as Checkpoints
- Review as a pattern-recognition activity
- Reviewer agents for the first pass; humans for judgment calls
- The PR is a checkpoint, not a delivery vehicle
- Closing a PR closes a contract
- Review fatigue as a harness failure
- The four quadrants of review findings
- Scope drift and regression risk: where to look first
- The PR template
Part VI — The Organization: Scaling Behavior Engineering
Chapter 35 — What Belongs at the Org Level
- Survives a process change. Survives a stack change.
- Universal principles — the actual ones
- Org-wide non-negotiables — the iron laws
- Identity as iron law
- The rule of three for org promotion
- What does not belong at the Org level
- The org-rule charter
Chapter 36 — Project Harness vs Org Harness
- Two scopes, two masters
- The harness consumer pattern
- The promotion path
- The demotion path
- Patch propagation, forward-only
- The org-harness repo
- The promotion-candidate template
Chapter 37 — Ports and Adapters Across the Org
- The pattern, scaled up
- Four surfaces, four ports
- What the spec contains, what the spec refuses
- Why this matters more than it sounds
- Where the spec lives, who owns it
- The daemon, unchanged
- Spec artifact — signal source, two adapters
Chapter 38 — Three Artifacts, Three Lifespans
- Three artifacts, three lifespans, three storage locations
- Each artifact has one job
- Why mixing fails
- Lifespan as the load-bearing rule
- The decision tree
- Anti-patterns
- Tooling, not vigilance
Chapter 39 — Cross-Repo and Multi-Stack Reality
- The portable subset and the stack-bound subset
- Translation discipline
- Multi-language monorepos
- The walking translation
Chapter 40 — Topology: When One Agent Is Not Enough
- Three layers, three names
- When a second agent earns its slot
- When the second agent does NOT earn its slot
- The four topology patterns
- Pipeline vs barrier: when stages have to wait
- Three more patterns for discovery work
- Two operational rules for any pattern
- Topology nodes meet the harness
- “Is this a topology problem or a harness problem?”
- Honest credit
- The topology decision tree
Chapter 41 — Multi-Agent, One Harness
- The shape of the problem
- What lives in the canonical layer
- What lives in the adapter directories
- Where to compromise and where not
- Adapter discipline — thin translation, nothing else
- The cascade in a multi-agent world
- What it costs to skip this discipline
- Three signs the canonical layer has broken down
- The reference implementation, briefly
- The artifact — a multi-agent adapter map
Chapter 42 — Patterns and Anti-Patterns
- Patterns — the shapes that work
- Anti-patterns — the shapes that fail
- The artifact — a printable cheat sheet
Part VII — Operating the Harness
Chapter 43 — The Two Flywheels: Learning and Pruning
- The two motions
- Learning, batched
- Pruning, audited
- How the catalog feeds the wheels
- Cadence
- The flywheel ledger
Chapter 44 — The Codebase That Gets Better With Use
- The compounding claim
- The mechanism, in five moves
- The opposite mechanism, in the absence of discipline
- Measuring the compounding
- The cultural signal
- The quarterly health snapshot
Chapter 45 — Harness Debt
- The symptoms
- The boy-scout rule applies to the harness
- The debt ledger
- What letting debt grow actually costs
- Debt versus the flywheels
- The artifact — a harness-debt ledger row
Chapter 46 — Versioning the Harness
- Why cadence is the answer to drift
- Forward-only, always
- Diff and ask
- Versioning semantics
- The release loop
- Changelog discipline
- Tying back to the chapters that earned this one
- The CHANGELOG entry template
Chapter 47 — Post-Mortems for Agent Runs
- When to run a post-mortem
- Near-misses are free post-mortems
- Routine retros after every workflow
- The five recurring patterns
- Blameless, but for the harness
- The patch is structural
- The flywheel consumes the output
- The template
Chapter 48 — Modes: Paired, Solo, Autopilot
- Pacing, not phasing
- Paired — the cadence for unknown ground
- Solo — the cadence for known ground
- Autopilot — the cadence for high-confidence stretches
- Modes are not skill levels
- Stepping up, stepping back
- Mode selection lives in the harness
- The rubric
Chapter 49 — The Curator’s Role and Pair-Programming with an Agent
- The work, named
- Pair-programming with a constrained partner
- Anti-patterns — what curation is not
- Why the framing matters culturally
- The curator’s daily rhythm
- What is not the curator’s job
- The artifact — a curator’s daily checklist
Part VIII — From Experiments to Tool: The Path to Keystone
Chapter 50 — The Experiments: Bridle, Sellier, Intent-Driven Delivery
- Bridle — the harness as a living system
- Sellier — defaults over flexibility
- Intent-driven delivery — the team layer made real
- What carried forward
Chapter 51 — Keystone: The Synthesis
- What Keystone is, in one paragraph
- What survived from each experiment
- The first command
- The directory layout
- Sensors, actions, playbooks, adapters — the lifecycle in markdown
- The companion MCP server
- The three-ring principle
- Multi-agent support, wired in
- The cascade engine
- Forward-only patches
- A worked first task, end-to-end
- The full chain, every contract
- What the agent reads on each turn
- What Keystone does not do
- Tying back to the experiments
- A copyable getting-started recipe
Chapter 52 — Composing the Layers in Practice
- Why composition works at all
- Example one — small team, single repo
- Example two — multi-team org, many repos
- When to add a layer (and when not to)
- What composition looks like in practice
- The composition decision tree
Part IX — Where This Goes
Chapter 53 — The Honest State of the Art
- What works
- What sort-of works
- What does not work yet
- Confidence intervals, named
- The next two years, my best guess
Chapter 54 — Building Your Own Harness: A 30-Day Plan
- Day 1–3: Tests and a pre-commit hook
- Day 4–7: CLAUDE.md, one rule file, one skill
- Day 8–14: One reviewer agent, one investigator agent, hooks for state capture
- Day 15–21: First learning audit, first pruning audit, harness-debt ledger
- Day 22–30: Pick one team-level constraint and encode it
- Iterate
- For multi-team and org rollouts
- The artifact — a 30-day calendar template
Chapter 55 — The Discipline That Was Happening Before Anyone Named It
- What the reader owes the discipline
- The work is ongoing
- A note to the engineer who has not started yet
- The apprentice
Appendix A — Vocabulary Reference
- Part I — Foundations and Taxonomy
- Part II — The Habitat
- Part III — The Codebase
- Part V — The Team and Delivery Harness
- Part VI — The Organization
- Part VII — Operating the Harness
- Part VIII — From Experiments to Tool
Appendix B — Reference Implementations
- Keystone — the synthesis tool
- Bridle — the maintenance discipline
- Sellier — the prescriptive scaffold
- Intent-driven delivery — the team-layer reference
- Reading a foreign harness
Appendix C — Failure-Pattern Catalog
- Pattern 1 — Acted on Ambiguous Request
- Pattern 2 — Worked in Unfamiliar Area Without Orienting
- Pattern 3 — Followed an Aspirational Rule Too Literally
- Pattern 4 — Escalated Wrong (Ladder Mistuned)
- Pattern 5 — Mental Model Drifted from Reality
- Pattern 6 — Token-Budget Collapse
- Pattern 7 — Tool Over-Fit
- Pattern 8 — Imitation Cascade
- Pattern 9 — Silent Override at Cascade
- Pattern 10 — Mock Anchoring
- Pattern 11 — Phase Skipping
- Pattern 12 — Reviewer Fatigue Collapse
Appendix D — Templates
- CLAUDE.md skeleton (Ch 8)
- Rule file skeleton — iron, golden, preference (Ch 9)
- Corpus map — five directories (Ch 10)
- Skill skeleton — eight checkpoints (Ch 21)
- Agent definition skeleton (Ch 22)
- Plugin manifest skeleton (Ch 24)
- MCP tool design checklist — eight questions (Ch 25)
- Intent contract template — seven sections (Ch 29)
- Phase-transition checklist (Ch 30)
- Interrupt template (Ch 31)
- Release-readiness gate — five checks (Ch 32)
- PR template — five sections (Ch 34)
- Org-rule charter (Ch 35)
- Promotion-candidate template (Ch 36)
- Three-artifact decision tree (Ch 38)
- Post-mortem template (Ch 47)
- Harness-debt ledger row (Ch 45)
- Harness CHANGELOG entry (Ch 46)
- Mode-selection rubric (Ch 48)
- Curator’s daily checklist (Ch 49)
- 30-day calendar template (Ch 54)
Appendix E — Further Reading
- Foundational engineering canon
- Foundational agent literature
- Adjacent disciplines
- Mathematics and theory
- The author’s blog archive
Appendix F — The 90-Day Org Rollout
- Weeks 1–2: Charter, pilot pick, universal-principles-only org repo
- Weeks 3–6: Pilot team installs Keystone at the project layer
- Weeks 7–10: Second team adopts
- Weeks 11–13: The team layer lands
- Exit criterion and the maintenance rhythm
- A note on growing past two teams
- The artifact — a 90-day calendar template