Infrastructure
The technical layer that enables (or blocks) agents. From shared Jenkins to ephemeral agent sandboxes.
Maturity →
Agent Runtime & Sandboxing
Where and how AI agents execute code - isolation, security, and resource management.
- Agent in developer's IDEAt the earliest stage of AI-assisted development, the agent lives inside the developer's IDE - literally running as an extension or plugin within VS Code, Cursor, or JetBrains.guide→
- Agent runs in the developer's local environment"No isolation" describes the default state at L1: the AI agent runs directly in the developer's working environment with no boundary between the agent's execution context and the dguide→
- Agent access is coarse-grained (all or none)At the L1 maturity level, organizations face a binary permission model for AI agents: either the agent has full access to the developer's environment (everything), or access is soguide→
- 01Agents can run in the developer's local environment
- 02Agents have file-system and shell access in their run environment
- 03Developers are aware of the security implications of agents with full local access
- 04Agent access scope (file system, network) is understood even if not restricted
- Dedicated dev environmentsDedicated dev environments move agent execution off the developer's laptop and into isolated cloud-hosted workspaces.guide→
- Basic sandboxing (Docker, cc-mini, bubblewrap wrappers)Basic Docker sandboxing wraps the agent's execution environment in a container that is isolated from the host system.guide→
- Agent credentials scoped per project; per-session spend capsProject-scoped agent credentials means that each project has its own dedicated set of credentials that agents use when working on that project, rather than agents inheriting the deguide→
- 01Dedicated development environments exist for agent execution (separate from developer's primary workspace)
- 02Basic sandboxing via Docker or equivalent containers is implemented
- 03Agent credentials are scoped per project (not a single org-wide key)
- 04Container images for agent environments are versioned and reproducible
- 05Credential rotation schedule exists for agent-scoped keys
- Isolated agent environments (devbox model)The devbox model is the architectural pattern where each agent task gets its own isolated environment, created at task start and destroyed at task end.guide→
- Pre-warmed containers with codebasePre-warmed containers are agent environments that have been prepared in advance and are waiting in a ready state before any task is assigned to them.guide→
- Network isolation: agent can't see production; granular permission layers (Claude Code leak pattern)Network isolation for agents means that the agent's execution environment has a constrained network configuration: it can reach the systems it needs for development work (GitHub, pguide→
- 01Isolated agent environments (devbox model) prevent agents from accessing other projects
- 02Pre-warmed containers with codebase at HEAD and dependencies installed are available
- 03Network isolation prevents agents from reaching production systems
- 04Container warm pool size matches team's agent usage patterns
- 05Network isolation rules are tested and audited quarterly
- Ephemeral devboxes: 10s spin-up (Stripe benchmark)The 10-second devbox spin-up is the performance target that Stripe's agent infrastructure team set as the benchmark for production-grade agent environments.guide→
- Pre-loaded services, code, MCP toolsA pre-loaded devbox is one where everything the agent needs to do its work is already running and available when the task starts - not just the codebase, but the dependent servicesguide→
- Kernel-level policy enforcementKernel-level policy enforcement means using Linux security mechanisms - seccomp (secure computing mode), AppArmor, and eBPF (extended Berkeley Packet Filter) - to enforce what an aguide→
- 01Ephemeral devboxes spin up in under 10 seconds (Stripe benchmark)
- 02Devboxes come pre-loaded with codebase, dependencies, and MCP tools
- 03Kernel-level policy enforcement restricts agent actions (syscall filtering, resource limits)
- 04Devbox spin-up P99 latency is under 30 seconds
- 05Firecracker microVMs or equivalent provide VM-level isolation with container-level startup speed
- Agent fleet on dedicated computeAn agent fleet on dedicated compute is the infrastructure pattern where AI agent workloads run on a distinct, purpose-built compute layer that is separate from developer laptops, Cguide→
- Auto-scaling: agents scale with loadAuto-scaling for agent fleets means that the compute infrastructure automatically adds or removes capacity in response to agent task load, without manual intervention.guide→
- Each agent = isolated machine (Cursor approach) or shared with smart resource managementAt L5, organizations running agent fleets at scale face a fundamental architectural choice: does each agent get its own isolated machine (strong isolation, higher cost), or do multguide→
- 01Dedicated compute infrastructure exists for agent fleet (not shared with developer workstations or production)
- 02Agent fleet auto-scales with load (agents scale up during business hours, scale down off-hours)
- 03Each agent runs in a fully isolated environment (Cursor approach: one machine per agent, or smart resource management)
- 04Cost per agent-hour is tracked and optimized
- 05Fleet scaling responds to demand within 60 seconds
MCP & Tool Integration
How agents connect to external tools, APIs, and internal systems via MCP (now universal standard) and plugins.
- Agent uses built-in tools onlyZero MCP is the baseline state: your AI agent has no programmatic connection to any tool, system, or data source outside its training data.guide→
- Agent relies on public / general knowledgeWhen an agent operates with only public API knowledge, it answers questions about your codebase using information from its training data - open-source documentation, public GitHubguide→
- Integrations done by copy-pasteCopy-paste integration is the universal first approach to giving AI agents context: a developer encounters a problem, grabs the relevant information (an error message, a stack tracguide→
- 01Agents use their built-in tools
- 02Agents draw on their general / built-in knowledge
- 03Team is aware of MCP as a standard for agent-tool integration
- 04Integrations, if any, are manual (copy-paste between tools)
- 1-3 basic MCP servers (Git, Jira, docs)The first practical MCP deployment replaces the most expensive copy-paste operations with programmatic connections.guide→
- Manual MCP setup per developerManual MCP setup per developer is the phase where MCP servers exist and work, but each developer is responsible for installing and configuring them independently.guide→
- Basic tool authorization; mcp-auth-proxy for OAuth/token-persistence issuesBasic tool authorization is the first deliberate access control layer on MCP tool usage: decisions about which agents can call which tools, under what conditions.guide→
- 011-3 MCP servers are configured (e.g., Git, Jira, documentation)
- 02MCP setup is documented but configured manually per developer
- 03Basic tool authorization is implemented (agents authenticate to MCP servers)
- 04MCP server configurations are shared via repository (not local-only)
- 05At least one MCP server provides internal documentation or codebase context
- MCP platform: centralized server managementA centralized MCP platform moves server configuration, deployment, and credential management from individual developer machines to organization-managed infrastructure.guide→
- Architecture MCP, Ownership MCP, SLA MCP; deep-system MCPs (pentester-mcp, windbg-mcp, Pepper for iOS runtime)Specialized MCP servers organized by knowledge domain represent the L3 evolution from general-purpose data access to structured organizational intelligence.guide→
- RBAC per MCP toolRole-Based Access Control per MCP tool means defining precisely which agents can call which tools, based on the agent's role, the task it's performing, and the data it's operating on.guide→
- 01Centralized MCP platform manages server provisioning, configuration, and lifecycle
- 02Domain-specific MCP servers exist (Architecture MCP, Ownership MCP, SLA MCP)
- 03RBAC controls which agents can access which MCP tools
- 04MCP server health is monitored with alerting on downtime
- 05New MCP servers go through a standardized review and onboarding process
- Toolshed model: 400+ tools behind one MCP (Stripe)The Toolshed model, pioneered by Stripe, consolidates hundreds of distinct tools behind a single MCP endpoint.guide→
- Agent discovery: agent knows what tools are availableAgent discovery is the capability for an agent to dynamically enumerate what tools are available in its current environment and adapt its behavior accordingly.guide→
- MCP governance: lifecycle, versioning, audit; Rust low-latency context retrievers (webclaw, ferris-search)MCP governance means treating MCP servers as production services with the same lifecycle management, change control, versioning, and audit requirements as any other production softguide→
- 01Toolshed model: 400+ tools accessible behind a unified MCP gateway (Stripe model)
- 02Agent discovery: agents can query available tools and their capabilities at runtime
- 03MCP governance covers lifecycle management, versioning, and audit logging
- 04MCP tool usage analytics track which tools are used, by which agents, how often
- 05MCP server versioning allows rollback to previous versions without downtime
- MCP as nervous system: bidirectional context flowAt L5, MCP is no longer just a protocol for giving agents access to tools - it is the real-time information backbone that connects every part of the software delivery system.guide→
- Production → MCP → Agent → Code → Deploy → ProductionThe Production-MCP-Agent-Code-Deploy-Production loop is the fully autonomous software delivery cycle: production systems detect a condition (anomaly, performance degradation, configuide→
- Agent-to-Agent Protocol (A2A) + MCP combinedThe Agent-to-Agent Protocol (A2A), developed by Google and adopted as an open standard, defines how autonomous agents communicate with each other: how one agent delegates a task toguide→
- 01MCP operates as a bidirectional nervous system: production data flows to agents, agent actions flow to production
- 02Full production loop: Production -> MCP -> Agent -> Code -> Deploy -> Production
- 03Agent-to-Agent Protocol (A2A) and MCP are combined for multi-agent coordination
- 04MCP latency for context delivery is under 500ms P95
- 05A2A protocol enables agents to discover and delegate to other agents without human configuration
Build System
Build tooling optimized for agent-scale throughput - caching, incrementality, and speed.
- Maven/Gradle default configMaven and Gradle ship with sensible defaults for single-developer, sequential workflows.guide→
- Full rebuild on every changeA full rebuild recompiles every source file and re-runs every build step from scratch on each invocation, regardless of what changed.guide→
- Shared CI queueA shared CI queue processes all build and test requests in a single pool of runners, regardless of their source or priority.guide→
- 01A build system is in place (default configuration is fine)
- 02Builds run on each change
- 03Build completes (even if slowly)
- 04CI runs builds on a shared queue (even if everyone waits)
- Basic build cachingBasic build caching stores the outputs of build steps and reuses them when the inputs haven't changed.guide→
- Parallel build stepsParallel build steps execute independent stages of the build and test pipeline concurrently rather than sequentially.guide→
- Dedicated CI resourcesDedicated CI resources assign specific pools of compute capacity to specific types of work, rather than all work competing in a single shared queue.guide→
- 01Build caching is implemented (dependency cache, compilation cache)
- 02Parallel build steps are configured (test and lint run concurrently)
- 03Dedicated CI resources are allocated (not shared across all teams)
- 04Cache hit rate exceeds 60%
- 05Build time has improved by at least 30% compared to uncached baseline
- Bazel / Buck2 / PantsBazel, Buck2, and Pants are hermetic build systems originally developed by Google, Meta, and Toolchain respectively to handle the scale and correctness requirements of massive monorepos.guide→
- Remote execution (EngFlow)Remote execution distributes build actions across a cluster of machines rather than running them locally.guide→
- Incremental builds: only changed targetsIncremental builds with only changed targets rebuild exactly and only the build targets that depend on files that have changed since the last build.guide→
- 01Advanced build system (Bazel, Buck2, or Pants) is adopted for primary codebase
- 02Remote execution (EngFlow or equivalent) distributes build steps across multiple machines
- 03Incremental builds run only changed targets (not full rebuild)
- 04BUILD file maintenance is assigned to specific team members or automated
- 05Remote cache hit rate exceeds 80%
- Sub-2min feedback on any changeSub-2-minute feedback means that for any change an agent or developer makes to the codebase, the signal "this compiles and the relevant tests pass" arrives within 120 seconds.guide→
- Agent-specific build profiles; multi-root workspaces, worktree isolation per agent (Cursor 3.2)Agent-specific build profiles are lightweight build configurations optimized for the agent iteration use case rather than the human pre-merge or release use case.guide→
- Build system aware of agent iteration patternsA build system aware of agent iteration patterns goes beyond passive responsiveness - it actively anticipates what agents will need to build next and prepares accordingly.guide→
- 01Any change gets build feedback in under 2 minutes
- 02Agent-specific build profiles exist (optimized for agent iteration patterns - fast feedback over comprehensive build)
- 03Build system understands agent iteration patterns and pre-caches likely next builds
- 04Build profiles are auto-selected based on invoker (agent vs. human vs. CI)
- 05Pre-caching hit rate exceeds 70% for agent iterations
- Build = commodity (near-instant for agents)When build time is a commodity, it has ceased to be a meaningful variable in agent throughput calculations.guide→
- Compilation bottleneck eliminated via crate/module architectureCompilation bottleneck elimination through crate/module architecture means restructuring a codebase so that the unit of compilation is small, focused, and independently compilable.guide→
- Disk I/O optimized for concurrent agent workloads (Cursor lesson)Disk I/O is the hidden bottleneck when running hundreds of concurrent agents on a shared infrastructure.guide→
- 01Build is a commodity: near-instant feedback for agents regardless of codebase size
- 02Codebase is structured into self-contained modules/crates to eliminate compilation bottleneck (Cursor lesson)
- 03Disk I/O is optimized for concurrent agent workloads (parallel reads/writes across modules)
- 04Build latency is under 30 seconds for 90%+ of changes
- 05Module dependency graph is automatically maintained and optimized
Observability & Feedback Loop
Monitoring agent behavior, costs, and outcomes to close the improvement loop.
- Basic loggingBasic logging is the first and most primitive form of production visibility: writing text output to stdout, stderr, or a log file so that when something goes wrong you have some reguide→
- Alerting on errorsAlerting on errors is the practice of automatically notifying a human when something goes wrong in production - before a customer reports it.guide→
- Prod feedback and token-cost visibility not yet wired to dev"No connection: prod to dev feedback" describes the state where production incidents have no automatic path back to the developer or agent that caused them.guide→
- 01Basic application logging exists
- 02Alerting fires on application errors
- 03Logs are searchable (centralized logging, not just local files)
- 04Production issues do not yet feed back into dev priorities
- Structured loggingStructured logging replaces free-form text log output with machine-parseable records - typically JSON - where every field has a defined name and type.guide→
- OpenTelemetry basicOpenTelemetry (OTel) is the open standard for collecting and exporting telemetry data - traces, metrics, and logs - from distributed systems.guide→
- Post-deploy monitoring; per-session token cost (ccusage, /usage, /context) as table stakesPost-deploy monitoring is the practice of actively watching key production metrics in the minutes and hours after a deployment, with the goal of detecting deployment-induced regresguide→
- 01Structured logging is implemented (JSON logs with consistent fields)
- 02OpenTelemetry basic instrumentation is deployed (traces and metrics)
- 03Post-deploy monitoring checks run after each deployment
- 04Traces are correlated across services
- 05Post-deploy checks include automated smoke tests
- Full observability stack (OTel + Grafana)A full observability stack means having all three telemetry pillars - metrics, traces, and logs - collected, correlated, and queryable in a unified system.guide→
- Production metrics → dashboards; agent telemetry: thinking length, files-read-before-edit, KV cache TTLProduction metrics dashboards are the operational nerve center of a mature engineering team: real-time, continuously updated views into the health and behavior of every production service.guide→
- Incident data available for context; Claude-Code-Usage-Monitor live predictions"Incident data available for context" means that when an AI agent or human engineer begins investigating a production issue, all the relevant historical context is immediately acceguide→
- 01Full observability stack is operational (OpenTelemetry + Grafana/Datadog or equivalent)
- 02Production metrics feed into dashboards accessible to all developers
- 03Incident data (post-mortems, error patterns) is available as agent context
- 04SLOs are defined and tracked for key services
- 05Incident data is structured for machine consumption (not just human-readable post-mortem docs)
- Production anomaly → auto-ticket → agent investigationThe production anomaly to auto-ticket to agent investigation pipeline automates the first phase of incident response.guide→
- Self-healing basic: known patterns auto-fixedSelf-healing for known patterns means that specific, well-understood failure conditions are remediated automatically without human intervention.guide→
- Vercel SDI model: infra recommends code changesThe Vercel Software-Defined Infrastructure (SDI) model describes a paradigm where infrastructure does not just run code - it actively analyzes production behavior and surfaces specguide→
- Session audit pattern (Stella Laurenzo, 6,852 sessions / 234k tool calls) - regression detection on agent quality timelineThe Vercel Software-Defined Infrastructure (SDI) model describes a paradigm where infrastructure does not just run code - it actively analyzes production behavior and surfaces specguide→
- 01Production anomaly detection auto-creates tickets and triggers agent investigation
- 02Self-healing for known patterns: agent detects known error pattern, applies known fix, deploys, and verifies
- 03Infrastructure recommends code changes based on production data (Vercel SDI model)
- 04Auto-created tickets include full context (traces, logs, affected users, similar past incidents)
- 05Self-healing success rate is tracked (% of auto-fixes that resolve the issue without human intervention)
- Full production → agent loopThe full production-to-agent loop is the L5 realization of observability as an agent input channel.guide→
- Anomaly → investigate → fix → test → deploy autonomousThe anomaly-to-deploy autonomous pipeline is end-to-end automated incident response: from the moment a production anomaly is detected to the moment a fix is deployed and verified iguide→
- Infrastructure self-drives: code defines infra, production informs code"Infrastructure self-drives" describes the fully realized bidirectional relationship between code and infrastructure at L5.guide→
- 01Full production-to-agent loop operates autonomously: anomaly detected, investigated, fixed, tested, deployed
- 02Infrastructure self-drives: code defines infrastructure, production performance informs code changes
- 03Anomaly-to-deploy cycle completes without human intervention for 80%+ of known issue categories
- 04Novel anomalies (not matching known patterns) are escalated to humans with full investigation context
- 05Mean time from anomaly detection to autonomous fix deployment is under 15 minutes
You don't have to figure this out alone.
Every level in this matrix has a path. Read the playbooks the teams that have climbed it wrote. Run the assessment with our consultants. Start where you are.
Book an AI Maturity Assessment session with your team.
We walk you through all four perspectives, score where you actually are, and leave you with a 90-day plan to climb in the dimensions that matter most.
May 2026 update: observability stopped being a "nice to have" and became the area where the most money was made and lost.
ccusage is at 13.2k GitHub stars; /usage and /context are now built into Claude Code; multiple Reddit threads documented overnight bills of $3,800 from runaway subagent loops. The two new disciplines this month are cost telemetry (token spend per session, per project, per merged PR) and quality telemetry (thinking length, files-read-before-edit, KV cache hit rate). Stella Laurenzo's audit of 6,852 Claude Code sessions is the template for the latter; Anthropic's April 23 postmortem made it official that harness changes - not the model - cause regressions, which makes harness telemetry a first-class concern.
MCP also evolved this month - from "code tools" to deep-system access. pentester-mcp (offensive security), windbg-mcp (kernel), Pepper (iOS runtime), with mcp-auth-proxy emerging as middleware for OAuth/token-persistence issues. Rust-based context retrievers (webclaw, ferris-search) and Go orchestrators (jig) bring low-latency multi-agent profiles within reach. Infrastructure, not the model, is now the thing that decides whether your agent fleet scales gracefully or burns the budget on a Tuesday night.