Back to Development
developmentL3 SystematicContext Engineering

5-level context: System, Code, Org, Historical, Operational

A structured framework for the five types of context an AI agent needs to make good decisions - and a diagnostic tool for understanding why your agents keep getting things wrong.

  • ·MCP servers provide structured context (architecture, ownership, SLAs) to agents
  • ·Context is organized across at least 3 of the 5 levels: System, Code, Org, Historical, Operational
  • ·Token budget management is implemented (agents receive context within defined token limits)
  • ·Context sources are versioned and tested for correctness
  • ·Context budgeting policy defines priority order when token limits are reached

Evidence

  • ·MCP server configuration files listing active context sources
  • ·Token budget configuration in agent settings
  • ·Context coverage audit showing 3+ context levels populated

What It Is

When an AI agent makes a bad suggestion, the cause is almost always a context gap: the agent was missing information it needed to reason correctly. But "the agent needs more context" is not an actionable diagnosis. The 5-level context framework provides a structured vocabulary for describing exactly what kind of context is missing - and therefore what specific investment would fix it.

The five levels are:

1. System-level context - The architecture and infrastructure of the system: how services are deployed, what infrastructure they run on, how they communicate, what the scalability and reliability constraints are. An agent working on a microservice without system-level context doesn't know whether it's a standalone service or one of two hundred in a mesh. It doesn't know if horizontal scaling is available or if there's a single-region constraint.

2. Code-level context - The codebase itself: module structure, dependency graph, existing abstractions, test patterns, build conventions. Code-level context is what IDE tools naturally provide - file contents, imports, definitions. The agent can read the files, but it can only read what fits in its context window.

3. Org-level context - Team structure, ownership maps, who to contact for what, what teams' current priorities are, what's in-progress vs. completed. An agent that doesn't know that the authentication team owns the auth-service might modify that service in a way that's technically correct but violates a team boundary.

4. Historical context - Why decisions were made: Architecture Decision Records (ADRs), git commit history with meaningful messages, design documents, post-mortems. An agent that doesn't know a particular pattern was explicitly rejected three years ago after a production incident will confidently re-suggest the rejected pattern.

5. Operational context - The current state of the running system: deployment status, error rates, latency metrics, recent incidents, on-call alerts. An agent working on code related to a currently-degraded service needs to know the service is degraded - and why - to make safe suggestions.

At L3 (Systematic), organizations have identified which of these context types they're providing and which they're not, and are systematically building the infrastructure to provide all five.

Why It Matters

The framework is diagnostic. When an agent makes a bad suggestion, you can classify the failure by context level:

  • Agent suggested a pattern we explicitly rejected in 2022 → Historical context gap → fix: ADRs in context
  • Agent modified a service owned by another team without coordination → Org-level context gap → fix: ownership data via MCP
  • Agent didn't account for the fact that this runs on single-region infrastructure → System-level context gap → fix: infrastructure documentation in CLAUDE.md
  • Agent created a utility function that already exists in a shared library → Code-level context gap → fix: better codebase indexing, RAG over the repo
  • Agent suggested adding caching to a service that's currently in a degraded state due to a memory leak → Operational context gap → fix: production telemetry in context

Without this framework, context gaps get described as "the AI is bad" - which leads to no actionable fix. With the framework, they become specific infrastructure investments.

The five levels also map naturally to the maturity progression: L2 provides basic Code-level context (CLAUDE.md with project info). L3 systematically addresses all five levels. L4 automates context assembly. L5 makes context self-maintaining.

Tip

Run a "context failure audit" with your team. Take the last 10 agent mistakes that made it to code review and classify each one by which context level was missing. The level with the most failures is your highest-priority context engineering investment.

Getting Started

6 steps to get from here to the next level

Common Pitfalls

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

B
BobHead of Engineering

Bob has invested in L2 context engineering - most repositories have CLAUDE.md files, and developers are using them. But he's hearing a new category of complaint: "The agent writes good code but it doesn't understand how things fit together." Agents are creating duplicate abstractions, violating service boundaries, and suggesting changes that are technically correct but operationally risky. Bob doesn't know how to categorize or prioritize these issues.

What Bob should do - role-specific action plan

S
SarahProductivity Lead

Sarah has been trying to explain to stakeholders why AI tool ROI varies so much across teams. Some teams are seeing 40% productivity improvements; others are seeing marginal gains at best. She suspects the difference is context engineering maturity but hasn't been able to articulate it precisely enough to be actionable.

What Sarah should do - role-specific action plan

V
VictorStaff Engineer - AI Champion

Victor has built a sophisticated context system for his own workflow: he manually assembles context from different sources before starting agent sessions. He has CLAUDE.md, he keeps ADRs, he checks the service registry, he looks at deployment status. It works well but takes 20-30 minutes of preparation per session, and none of it is automated.

What Victor should do - role-specific action plan