Back to Development
developmentL5 AutonomousContext Engineering

Production telemetry → context auto-update

At L5, agent context updates automatically based on production signals - when a service degrades, agents working on related code receive updated operational context without manual intervention.

  • ·Agents maintain persistent identity and memory across sessions (Beads/Git-backed)
  • ·Production telemetry feeds back into agent context automatically (deploy, error, performance data)
  • ·Agents detect stale documentation and update it without human initiation
  • ·Agent memory persists architectural decisions and their rationale across sessions
  • ·Self-healing context updates are validated by automated tests before commit

Evidence

  • ·Agent memory store with session-spanning entries and timestamps
  • ·Production telemetry-to-context pipeline configuration with update frequency
  • ·Git history showing agent-authored documentation updates with passing CI

What It Is

At L3 and L4, operational context (deployment status, error rates, service health) is provided to agents at task start through MCP servers and BYOC pipelines. This works well when context is fetched once and the production state is stable. But production systems change constantly: a service that was healthy when the agent started a task may be degraded by the time the agent proposes a change. Static operational context, even if it was accurate at session start, becomes stale.

Production telemetry context auto-update closes this loop. Agent context is subscribed to production signals - not just fetched once at session start, but continuously updated as production state changes. When an error rate crosses a threshold, when a deployment is rolled back, when an alert fires, when a circuit breaker opens - the relevant agents receive updated context automatically, without a human relaying the information.

The architecture is event-driven: production monitoring systems (Datadog, Grafana, PagerDuty, etc.) emit events when significant state changes occur. A context update pipeline subscribes to these events, determines which agents and tasks are affected, and pushes updated operational context to those agents. The agent's operational understanding of the system stays synchronized with reality, not with a snapshot taken at session start.

At L5 (Autonomous), this capability becomes essential. When agents are autonomously making code changes, deploying those changes, and monitoring the result, the feedback loop between production behavior and agent context must be automatic. An agent that doesn't know a service is currently degraded might confidently generate changes that make the situation worse. An agent that receives a real-time alert about increased error rates in a related service will incorporate that information into its next decision.

Why It Matters

The gap between agent context and production reality is a safety risk that grows with the autonomy level of the agent:

  • Agents making unsafe changes - an agent that doesn't know a downstream service is operating at reduced capacity might generate a change that removes a safety check "to simplify the code"
  • Missed integration with incident response - when production is degraded, agents working on related code should know about the degradation and adjust their suggestions accordingly
  • Compounding errors in autonomous workflows - in multi-step agent workflows, a wrong assumption about production state in step 3 can lead to a cascade of wrong decisions in steps 4-10
  • Faster incident response - when agents are automatically informed about production anomalies, they can proactively generate hypotheses and diagnostic steps without waiting for a human to relay the information
  • Closed-loop validation - agents that have deployed a change and are monitoring its impact need real-time telemetry to evaluate whether the change was successful

The L5 vision is an agent that can observe the production environment, reason about what's happening, and act - all without requiring a human to bridge between production monitoring tools and the agent's context window.

Tip

Start with read-only telemetry subscriptions before giving agents any ability to act on production signals. Verify that the agent correctly interprets and acts on telemetry context for 30 days before introducing any automated production actions.

Getting Started

6 steps to get from here to the next level

Common Pitfalls

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

B
BobHead of Engineering

Bob's team is running production-level autonomous agent workflows. An incident occurs: an agent working on a performance optimization to the payment service didn't know that the payment processor had been experiencing elevated error rates for the past 30 minutes. The agent's "optimization" removed a defensive timeout that was acting as a circuit breaker. The change made it to production (through automated review) and worsened the incident.

What Bob should do - role-specific action plan

S
SarahProductivity Lead

Sarah is tracking incident frequency and notes that a new category of incidents has emerged: "agent-assisted regressions" - changes made by AI agents that were correct in isolation but incorrect given the production context at the time they were made. These incidents are difficult to prevent because they don't represent agent errors in the traditional sense - the agent produced valid code, but its operational context was incomplete.

What Sarah should do - role-specific action plan

V
VictorStaff Engineer - AI Champion

Victor runs an on-call rotation for the platform team. He's noticed that when production incidents occur, AI agents continue working on related features without knowing about the incident. This creates two problems: agents generate changes that are inappropriate during an incident (optimizations when stability is needed), and developers manually context-switch between their agent sessions and the incident response, losing context in both directions.

What Victor should do - role-specific action plan