Agent detects stale context → updates → validates
Stale context detection is the capability of an agent to recognize that the documentation, runbooks, or contextual information it is working with no longer accurately describes the
- ·Knowledge base is self-evolving (agents add, update, and validate knowledge entries continuously)
- ·Agent detects stale context, updates it, and validates the update - without human initiation
- ·Organizational memory is Git-backed, agent-readable, and provably current
- ·Knowledge base freshness score exceeds 95% (% of entries updated within their defined freshness window)
- ·Self-evolving updates are validated against codebase to prevent knowledge drift
Evidence
- ·Knowledge base with agent-authored entries and update timestamps
- ·Stale context detection and auto-update logs
- ·Git-backed knowledge store with provenance tracking
What It Is
Stale context detection is the capability of an agent to recognize that the documentation, runbooks, or contextual information it is working with no longer accurately describes the codebase it is working on — and to proactively update and validate that context before continuing. The agent does not passively consume inaccurate context and produce incorrect output. It detects the discrepancy, generates a correction, validates the correction against the code, and either applies it automatically or surfaces it for human review.
The detection mechanism varies by context type. For API documentation, the agent compares documented function signatures against the actual signatures in code. For configuration references, it compares documented parameter names against the actual configuration schema. For runbooks, it attempts to follow the documented procedure in a sandbox and detects where the procedure fails. For ADRs, it identifies when the decision they describe has been reversed or superseded by changes in the codebase. Each detection mechanism requires the agent to have both the documentation and the ground truth it should reflect.
The update step generates a corrected version of the stale documentation based on what the agent found in the code. For straightforward cases — a renamed function, an added configuration parameter — the update is high-confidence and can be applied with minimal human review. For complex cases — an ADR whose architectural rationale no longer applies because the system has been redesigned — the agent surfaces the discrepancy and proposes options for how to handle it, but defers the decision to a human.
The validation step closes the loop. After generating an update, the agent verifies that the updated documentation is internally consistent, does not conflict with other documentation, and accurately describes the current state of the code. For runbooks, validation means running the updated procedure in a sandbox and confirming it succeeds. For API docs, validation means parsing the generated documentation and confirming it matches the code. Validation converts the update from a generated draft to a verified correction.
Why It Matters
- Stale context is an agent safety issue - an agent that trusts stale documentation will make changes based on incorrect assumptions; stale context that goes undetected propagates errors through the system; detection is a prerequisite for reliable agent operation in long-lived codebases
- The detection loop closes the knowledge maintenance gap - without automatic staleness detection, documentation drift is only discovered when a human notices an inconsistency or an agent produces incorrect output; with detection, discrepancies are found and corrected continuously
- Validation creates trusted documentation - documentation that has been validated by an agent running the procedure or parsing the code is more trustworthy than documentation written by a human and not subsequently verified; the validation step is what distinguishes auto-generated documentation from auto-generated noise
- Detect-update-validate runs continuously without human initiative - the most valuable property of this loop is that it runs whether or not any human thinks to check; documentation accuracy does not depend on anyone's memory, availability, or discipline
- Stale context metrics become observable - when an agent is continuously detecting and correcting stale context, the detection rate, correction accuracy, and residual staleness are all measurable; this makes documentation health a quantified, monitorable property
Getting Started
6 steps to get from here to the next level
Common Pitfalls
Mistakes teams actually make at this stage - and how to avoid them
How Different Roles See It
Bob has watched the knowledge base improve significantly over the past two years, but there is still a class of documentation problem he cannot solve: the documentation that becomes stale slowly, through many small changes, none of which is large enough to trigger a formal update. An API that acquires five new optional parameters over 18 months ends up with documentation that omits all five. A runbook that was accurate when written becomes inaccurate through six small infrastructure changes.
The detect-update-validate loop addresses exactly this class of problem. Bob should give Victor the explicit mandate to build this loop as a strategic infrastructure project, with a timeline and success metrics defined upfront. The success metric he should track is documentation staleness rate: what percentage of a sample of documentation artifacts have discrepancies with the codebase, measured monthly by automated detection. He should set a target — perhaps less than 5% staleness rate — and track progress toward it quarterly. When the loop is working, this metric should decline continuously without requiring any human initiative.
Sarah has been the most consistent voice for documentation quality, and she has seen every manual approach to maintaining it fail under pressure. The detect-update-validate loop is qualitatively different: it does not depend on human initiative, it does not degrade under sprint pressure, and it produces measurable output. Sarah should champion this investment to Bob as the highest-leverage documentation infrastructure project available.
Her specific contribution is defining the accuracy measurement framework. She should design the monthly human audit process: sample size, sampling methodology, discrepancy categorization, and the staleness score calculation. She should run this audit monthly and report the results to Bob alongside the automated detection metrics. The combination of automated detection rates (volume) and human audit accuracy scores (quality) gives a complete picture of whether the system is working. Sarah should also track engineer confidence in documentation quality, surveyed quarterly, as the leading indicator that the loop is building the trustworthy knowledge base the team needs.
Victor designed and built most of the knowledge infrastructure that makes this loop possible. The final step is closing it: connecting the detection agents to the update agents, connecting the update agents to the validation agents, and establishing the human review interface for escalations. This is primarily integration and orchestration work, but it requires careful design to avoid the failure modes described above.
Victor should build the loop incrementally: detection only for one month, detection plus update with human review for one month, detection plus update plus automated validation for one month, then full loop operation with selective automation based on confidence thresholds. He should instrument every step and share the metrics weekly with Sarah and Bob during the rollout period. He should also document the loop architecture explicitly: what triggers each step, what the confidence thresholds are, what goes to human review versus automatic application. This documentation is itself part of the knowledge base the loop maintains — a test of the system's ability to maintain documentation about itself.
Further Reading
5 resources worth reading - hand-picked, not scraped
From the Field
Recent releases, projects, and discussions relevant to this maturity level.