5-level context: System, Code, Org, Historical, Operational

A structured framework for the five types of context an AI agent needs to make good decisions - and a diagnostic tool for understanding why your agents keep getting things wrong.

·MCP servers provide structured context (architecture, ownership, SLAs) to agents
·Context is organized across at least 3 of the 5 levels: System, Code, Org, Historical, Operational
·Token budget management is implemented (agents receive context within defined token limits)

·Context sources are versioned and tested for correctness
·Context budgeting policy defines priority order when token limits are reached

Evidence

·MCP server configuration files listing active context sources
·Token budget configuration in agent settings
·Context coverage audit showing 3+ context levels populated

What It Is

When an AI agent makes a bad suggestion, the cause is almost always a context gap: the agent was missing information it needed to reason correctly. But "the agent needs more context" is not an actionable diagnosis. The 5-level context framework provides a structured vocabulary for describing exactly what kind of context is missing - and therefore what specific investment would fix it.

The five levels are:

1. System-level context - The architecture and infrastructure of the system: how services are deployed, what infrastructure they run on, how they communicate, what the scalability and reliability constraints are. An agent working on a microservice without system-level context doesn't know whether it's a standalone service or one of two hundred in a mesh. It doesn't know if horizontal scaling is available or if there's a single-region constraint.

2. Code-level context - The codebase itself: module structure, dependency graph, existing abstractions, test patterns, build conventions. Code-level context is what IDE tools naturally provide - file contents, imports, definitions. The agent can read the files, but it can only read what fits in its context window.

3. Org-level context - Team structure, ownership maps, who to contact for what, what teams' current priorities are, what's in-progress vs. completed. An agent that doesn't know that the authentication team owns the auth-service might modify that service in a way that's technically correct but violates a team boundary.

4. Historical context - Why decisions were made: Architecture Decision Records (ADRs), git commit history with meaningful messages, design documents, post-mortems. An agent that doesn't know a particular pattern was explicitly rejected three years ago after a production incident will confidently re-suggest the rejected pattern.

5. Operational context - The current state of the running system: deployment status, error rates, latency metrics, recent incidents, on-call alerts. An agent working on code related to a currently-degraded service needs to know the service is degraded - and why - to make safe suggestions.

At L3 (Systematic), organizations have identified which of these context types they're providing and which they're not, and are systematically building the infrastructure to provide all five.

Why It Matters

The framework is diagnostic. When an agent makes a bad suggestion, you can classify the failure by context level:

Agent suggested a pattern we explicitly rejected in 2022 → Historical context gap → fix: ADRs in context
Agent modified a service owned by another team without coordination → Org-level context gap → fix: ownership data via MCP
Agent didn't account for the fact that this runs on single-region infrastructure → System-level context gap → fix: infrastructure documentation in CLAUDE.md
Agent created a utility function that already exists in a shared library → Code-level context gap → fix: better codebase indexing, RAG over the repo
Agent suggested adding caching to a service that's currently in a degraded state due to a memory leak → Operational context gap → fix: production telemetry in context

Without this framework, context gaps get described as "the AI is bad" - which leads to no actionable fix. With the framework, they become specific infrastructure investments.

The five levels also map naturally to the maturity progression: L2 provides basic Code-level context (CLAUDE.md with project info). L3 systematically addresses all five levels. L4 automates context assembly. L5 makes context self-maintaining.

A June 2026 shift changed how Code-level context is best retrieved: memory beyond RAG. Anthropic dropped vector search in Claude Code in favour of agentic grep and tool-search, and Microsoft's FastContext (June 15) uses a repo-explorer subagent that returns compact file-line citations and cuts coding-agent tokens by up to 60%. The lesson: assembling more context is not free, and an overstuffed window degrades reasoning. Treat context as a budget - pruning unused skills and MCP servers recovers 24%+ of the window - and prefer on-demand retrieval (grep, scoped subagents) over pre-loading everything for the levels that change fast.

Tip

Run a "context failure audit" with your team. Take the last 10 agent mistakes that made it to code review and classify each one by which context level was missing. The level with the most failures is your highest-priority context engineering investment.

Getting Started

Audit your current context coverage - For each of the five levels, ask: does the agent have reliable access to this type of context? Rate each level: none, partial, or systematic.
Identify your biggest gap - The level with "none" coverage is your starting point. Don't try to address all five levels at once.
Build Code-level context first - If you haven't done L2 work, start there. CLAUDE.md with architecture overview and conventions is the foundation. Everything else builds on it.
Add Historical context via ADRs - Start writing Architecture Decision Records for new decisions. Retroactively write ADRs for the three most consequential historical decisions. Store them in the repository where agents can find them.
Connect Org-level and Operational context via MCP - These change too fast to maintain in static files. Build MCP servers that expose service ownership and deployment status from your existing operational tools.
Validate each level with a probe question - Test each context level by asking the agent a question that requires it. "Who owns the auth service?" tests Org-level. "What's the current deployment status of the payments service?" tests Operational. "Why don't we use Redis for session storage?" tests Historical. Gaps become visible immediately.

Common Pitfalls

Treating Code-level context as the only kind that matters. Teams that have done good L2 work often stop there - they've improved autocomplete and basic agent suggestions, and declare success. But for agents tackling larger tasks, the other four context levels are equally important. Code-level context alone doesn't prevent the agent from violating team ownership, ignoring historical decisions, or making unsafe changes to degraded services.

Confusing context level with information priority. Operational context is not more important than Historical context - they're different dimensions of the same picture. An agent that has excellent operational context but no Historical context will make safe suggestions about the current system state while confidently repeating patterns that were explicitly rejected years ago.

Trying to provide all five levels through static files. System-level and Code-level context can live in CLAUDE.md. Org-level, Historical, and Operational context changes faster than static files can keep up with. Organizations that try to maintain these as static documents spend significant effort on a losing battle. MCP servers are the right infrastructure for dynamic context.

Not measuring context quality by level. If you track agent suggestion quality as a single number, you can't tell which context level is causing failures. Separate your quality metrics: how often does the agent make architectural errors (System), violation-of-conventions errors (Code), team boundary violations (Org), repeated historical mistakes (Historical), and operationally unsafe suggestions (Operational)?

How Different Roles See It

BobHead of Engineering

Bob has invested in L2 context engineering - most repositories have CLAUDE.md files, and developers are using them. But he's hearing a new category of complaint: "The agent writes good code but it doesn't understand how things fit together." Agents are creating duplicate abstractions, violating service boundaries, and suggesting changes that are technically correct but operationally risky. Bob doesn't know how to categorize or prioritize these issues.

What Bob should do: Bob should run the context failure audit using the 5-level framework. Give it to his tech leads: take the last 10 agent failures in code review and classify each one. The classification will almost certainly reveal that Code-level context is reasonably good (thanks to CLAUDE.md investment) but Org-level, Historical, and Operational context are gaps. This gives Bob a prioritized investment roadmap: which context levels to address next, and in what order.

SarahProductivity Lead

Sarah has been trying to explain to stakeholders why AI tool ROI varies so much across teams. Some teams are seeing 40% productivity improvements; others are seeing marginal gains at best. She suspects the difference is context engineering maturity but hasn't been able to articulate it precisely enough to be actionable.

What Sarah should do: Sarah should use the 5-level framework as a maturity assessment tool. She can run a quick audit across teams: for each team, which context levels are provided? The teams seeing 40% gains likely have good coverage of multiple levels; the marginal-gain teams likely have only Code-level context at best. This gives Sarah a concrete explanation for stakeholders (the teams with full context coverage perform better) and a specific investment roadmap (the other teams need to build out their missing context levels).

VictorStaff Engineer - AI Champion

Victor has built a sophisticated context system for his own workflow: he manually assembles context from different sources before starting agent sessions. He has CLAUDE.md, he keeps ADRs, he checks the service registry, he looks at deployment status. It works well but takes 20-30 minutes of preparation per session, and none of it is automated.

What Victor should do: Victor has independently invented the 5-level context framework and is delivering it manually. His next step is to automate the assembly. The context that changes (Org-level: service ownership; Operational: deployment status) should come from MCP servers. The context that's static but complex (Historical: ADRs) should be stored in the repository where agents can retrieve it. Victor should also document his assembly workflow - not as a manual process to repeat, but as a specification for the automated context assembly system he should build (the BYOC pattern at L4).

From the Field

Recent releases, projects, and discussions relevant to this maturity level.

releaseL3

topoteretes/cogneeCognee v0.5.6 transitions AI memory management from ad-hoc interactions to systematic context engineering by introducing bulk JSON/CSV import/export for large-sgithub.com

discoveredL3

Doorman11991/budget-aware-mcpModel-agnostic code memory MCP server. Budget-aware graph retrieval for AI agents. Sub-millisecond queries, token budgeting, deterministic results. Built obudget-aware-mcp shifts AI agent context management from high-latency vector searches to deterministic, hop-based graph walks using CodeGraphContext and tree-sigithub.com

discoveredL3

samber/cc-skills-golang🧑‍🎨 A collection of Golang agentic skills that worksModular instruction sets for Go projects—focusing on performance, testing, and security—implement the Agent Skills protocol to optimize context window efficiencgithub.com

releaseL3

kodustech/kodus-aiKodus-AI web-1.0.93 stabilizes enterprise integration by resolving connectivity issues for self-hosted GitLab instances, enabling AI agent deployment within prigithub.com

Where does your team actually sit on this?

This guide describes one level of one area. Run the assessment to place your team across all 16 areas, see which gates you have passed, and get a report you can take to your stakeholders.

Start the assessment

Context Engineering

MCP servers: architecture, ownership, SLA (universal standard)Context budgeting + automatic compaction (handoff.md, .claudeignore, Rewind summarize, Amp auto-compact at 90%); prune unused skills/MCP (recovers 24%+ context)