Self-healing context: agent detects stale docs, updates

At L5, agents monitor context quality continuously - detecting when CLAUDE.md entries, README sections, or architecture diagrams no longer match the codebase, and automatically proposing or applying updates.

·Agents maintain persistent identity and memory across sessions (Beads/Git-backed)
·Production telemetry feeds back into agent context automatically (deploy, error, performance data)
·Agents detect stale documentation and update it without human initiation

·Agent memory persists architectural decisions and their rationale across sessions
·Self-healing context updates are validated by automated tests before commit

Evidence

·Agent memory store with session-spanning entries and timestamps
·Production telemetry-to-context pipeline configuration with update frequency
·Git history showing agent-authored documentation updates with passing CI

What It Is

Documentation rot is one of the oldest unsolved problems in software engineering. Teams invest in writing good documentation, and then the codebase evolves while the documentation stays still. At L1, teams discover the rot when something breaks or someone asks a question the documentation answers incorrectly. The fix is manual, reactive, and easy to forget.

Self-healing context is an L5 pattern where agents actively monitor the accuracy of context files - CLAUDE.md, README, architecture diagrams, convention documents - and detect when they've drifted from reality. When a drift is detected, the agent doesn't just flag it; it generates a proposed correction and, depending on the configured policy, either creates a pull request for human review or applies the correction directly.

The detection mechanism works by comparing claims in context files against the actual codebase. A CLAUDE.md that says "we use Jest for testing" when the test runner has been switched to Vitest is a detectable inconsistency - an agent can read the CLAUDE.md claim, check the package.json and test file imports, and determine that the claim is stale. A README that says "deploy with npm run deploy" when the deployment script has been renamed to npm run ship is similarly detectable. An architecture diagram that references AuthService when the service was renamed IdentityService six months ago is a falsifiable claim.

Not all documentation drift is detectable by static analysis. "We prioritize reliability over feature velocity" can't be verified from code. But a large fraction of the most practically harmful documentation drift - wrong commands, stale dependencies, incorrect file paths, outdated service names, wrong tech stack claims - is mechanically verifiable.

At L5, this detection runs continuously: after each significant commit, or on a scheduled basis, agents scan context files for falsifiable claims and verify them. Drift reports are generated, and depending on policy, corrections are proposed or automatically applied.

Why It Matters

Self-healing context closes the loop that manual documentation processes always leave open:

Documentation is continuously accurate - not accurate until the next undocumented change, but continuously maintained by the same agents that generate the changes
Documentation debt doesn't accumulate - each change that would create a context file inconsistency is caught at the time it occurs, not discovered months later
Agent behavior is predictably correct - agents that read context files and find accurate information make better decisions; the quality of agent behavior is a direct function of context quality
Human attention is preserved for nuance - mechanical documentation errors (wrong commands, stale file paths) are fixed automatically; humans review changes that require judgment (updated architectural guidance, revised conventions)
Builds trust in context - when developers know that CLAUDE.md is actively maintained and verified, they trust it more and use it more. This creates a virtuous cycle.

The irony of reaching L5 is that the same agents whose behavior depends on accurate context can be responsible for maintaining that context. The system becomes self-reinforcing: good context enables good agents; good agents maintain good context.

Tip

Before deploying automatic context updates, run the agent in "proposal only" mode for 30 days - every proposed update generates a pull request for human review. This validates that the agent's drift detection is accurate before enabling any automatic updates.

Getting Started

Identify the falsifiable claims in your context files - Read through your CLAUDE.md and README. Highlight every claim that could be verified programmatically: file paths, command names, dependency names and versions, service names, technology choices. These are your detection targets.
Build a drift detection agent - Write an agent that takes a claim ("we use Jest for testing") and a verification strategy (check package.json, check test file imports) and returns: VERIFIED, STALE, or UNCERTAIN. Start with simple string matching for the clearest cases.
Run detection on a schedule - Schedule the drift detection agent to run after commits to the main branch, or daily on a cron schedule. Each run produces a drift report: a list of detected inconsistencies with the expected value and the observed value.
Implement proposal generation - For each detected inconsistency, the agent should generate a proposed correction to the context file. This is typically straightforward: replace the stale claim with the current accurate value.
Configure the update policy - Decide which corrections are auto-applied vs. require human review. Suggested policy: auto-apply corrections to commands, file paths, and dependency names (low risk, clearly mechanical). Require human review for corrections to architectural guidance, conventions, or anything with ambiguous interpretation.
Monitor correction quality - Track the false positive rate: proposals that were generated but rejected in review. A high false positive rate (>20%) means the detection logic is too aggressive. A low false positive rate means the system can be trusted with more aggressive auto-apply policies.

Common Pitfalls

Overfitting to syntactic verification. The most mechanically detectable documentation errors are also the least harmful - wrong command names are caught quickly by the developer who runs them. The more valuable drift detection catches semantic drift: architectural documentation that describes a design that no longer exists, convention guidance that reflects practices the team has moved away from. Semantic drift is harder to detect but more important to catch.

Auto-applying corrections without a review trail. Even for clearly mechanical corrections, maintain a review trail. When a CLAUDE.md is automatically updated and a developer wonders why, they should be able to find the change in git history with a clear commit message explaining what was detected and corrected. Auto-apply with full transparency; never invisible modification.

Treating documentation that can't be auto-corrected as out of scope. The self-healing system won't fix everything - qualitative guidance, architectural rationale, and conventions that require human judgment will still drift. The system should flag these as "needs human review" rather than silently passing them. A flag is better than silence.

Not validating the drift detector itself. The drift detection agent is itself a piece of software that can be wrong. If the detector has a bug that causes it to incorrectly identify valid documentation as stale, it will generate incorrect corrections at scale. Test the detector against a corpus of known-good and known-stale documentation before deploying it to production.

How Different Roles See It

BobHead of Engineering

Bob's team has excellent context engineering infrastructure at L3-L4. CLAUDE.md files are well-maintained, MCP servers are running, and BYOC pipelines are assembled. But Bob keeps getting complaints from developers that the CLAUDE.md "is wrong again." The problem is that the files were written once but are being updated manually, and manual updates lag behind the pace of codebase change. Bob is frustrated - he invested heavily in context engineering and now it's becoming a maintenance burden.

What Bob should do: Bob has reached the natural L4-to-L5 transition: context maintenance that was manageable at L3-L4 is becoming unmanageable as the codebase and agent usage scales. Self-healing context is the solution. Bob should commission a 4-week project: build a drift detection agent for CLAUDE.md files, run it in proposal-only mode for 30 days, measure the false positive rate, and then deploy it with auto-apply for high-confidence corrections. The goal is to flip documentation maintenance from a manual burden to an automated infrastructure responsibility.

SarahProductivity Lead

Sarah is tracking context quality as a metric - she runs quarterly audits where developers rate the accuracy of their team's CLAUDE.md files and README documentation. The scores have been declining quarter over quarter despite increasing context engineering investment. More documentation is being written, but the codebase is evolving faster than documentation can be updated manually.

What Sarah should do: Sarah should reframe the metric: instead of tracking "documentation quality at audit time" (a lagging indicator that requires manual effort to measure), she should track "documentation drift" as an ongoing metric (the number of detected inconsistencies between context files and the codebase). Self-healing context infrastructure turns this metric from a quarterly audit into a continuous, automated measurement. She should propose the drift detection infrastructure as a productivity infrastructure investment, framing it as "making our existing context engineering investment self-sustaining" - the ROI case is that it prevents the degradation of the context quality that all previous context engineering investment produced.

VictorStaff Engineer - AI Champion

Victor has been manually auditing the CLAUDE.md files in his repositories every month. It takes 4-5 hours and invariably finds the same categories of errors: renamed commands, updated dependencies, reorganized file paths. He knows this is mechanical work that shouldn't require his time, but he also knows that if he stops doing it, the CLAUDE.md files will drift and agent quality will degrade. He's caught in a maintenance loop.

What Victor should do: Victor should automate his own audit process. His monthly review is, in effect, a manual implementation of a drift detection agent. He should convert it: write a script that checks each of the categories he manually verifies (command names, dependency names, file paths, service names) against the current codebase state, and generates a report of detected inconsistencies. Then wire that script into CI. This eliminates his manual audit entirely - the CI check runs the audit on every commit. Victor should then extend the script to generate corrective pull requests for the clearest inconsistencies, which eliminates not just the audit but the manual correction work as well.

From the Field

Recent releases, projects, and discussions relevant to this maturity level.

discoveredL5

alvinunreal/awesome-openclaw-tipsPractical OpenClaw tips for memory, reliability, cost, automation, and multi-agent workflows.OpenClaw practitioners transition from ad-hoc chat to systematic automation by treating workspace folders as Git-versioned sources of truth, ensuring state persgithub.com

discoveredL5

devallibus/shiplogSUPERCHARGE AI-assisted development by using Git. Cross-model review gates, evidence-linked closure, verification profiles, model-tier routing, artifact envelopes, anShiplog establishes a persistent engineering memory layer by routing AI-agent reasoning, rejected alternatives, and design decisions directly into GitHub Issuesgithub.com

discoveredL5

oguzbilgic/agent-kernelMinimal kernel to make any AI coding agent stateful. Clone, point your agent, go.The agent-kernel framework enables persistent state for AI agents using a Git-backed repository structure instead of traditional databases or vector stores. It github.com

releaseL5

crewAIInc/crewAICrewAI 1.12.0a2 introduces Qdrant Edge as a dedicated storage backend for its persistent memory system, shifting multi-agent architectures toward decentralized github.com

Where does your team actually sit on this?

This guide describes one level of one area. Run the assessment to place your team across all 16 areas, see which gates you have passed, and get a report you can take to your stakeholders.

Start the assessment

Context Engineering

Production telemetry → context auto-update

Self-healing context: agent detects stale docs, updates

What It Is

Why It Matters

Getting Started

Common Pitfalls

How Different Roles See It

Further Reading

From the Field

Where does your team actually sit on this?