Maturity Matrix
Matrix/Infrastructure

Infrastructure

The technical layer that enables (or blocks) agents. From shared Jenkins to ephemeral agent sandboxes.

4capabilities20levels61practices61guides
The matrix · at a glance
Capability ↓
Maturity →
L1 · Stage 01
Ad-hoc
L2 · Stage 02
Guided
L3 · Stage 03
Systematic
L4 · Stage 04
Optimized
Sweet spot
L5 · Stage 05
Autonomous
01
Agent Runtime & Sandboxing
02
MCP & Tool Integration
03
Build System
04
Observability & Feedback Loop
Capability 01 · Infrastructure

Agent Runtime & Sandboxing

Where and how AI agents execute code - isolation, security, and resource management.

L2 · Stage 02Guided
Criteria - what to measure
  1. 01Dedicated development environments exist for agent execution (separate from developer's primary workspace)
  2. 02Basic sandboxing via Docker or equivalent containers is implemented
  3. 03Agent credentials are scoped per project (not a single org-wide key)
  4. 04Container images for agent environments are versioned and reproducible
  5. 05Credential rotation schedule exists for agent-scoped keys
L3 · Stage 03Systematic
Criteria - what to measure
  1. 01Isolated agent environments (devbox model) prevent agents from accessing other projects
  2. 02Pre-warmed containers with codebase at HEAD and dependencies installed are available
  3. 03Network isolation prevents agents from reaching production systems
  4. 04Container warm pool size matches team's agent usage patterns
  5. 05Network isolation rules are tested and audited quarterly
L4 · Stage 04OptimizedMost teams aim here
Criteria - what to measure
  1. 01Ephemeral devboxes spin up in under 10 seconds (Stripe benchmark)
  2. 02Devboxes come pre-loaded with codebase, dependencies, and MCP tools
  3. 03Kernel-level policy enforcement restricts agent actions (syscall filtering, resource limits)
  4. 04Devbox spin-up P99 latency is under 30 seconds
  5. 05Firecracker microVMs or equivalent provide VM-level isolation with container-level startup speed
L5 · Stage 05Autonomous
Criteria - what to measure
  1. 01Dedicated compute infrastructure exists for agent fleet (not shared with developer workstations or production)
  2. 02Agent fleet auto-scales with load (agents scale up during business hours, scale down off-hours)
  3. 03Each agent runs in a fully isolated environment (Cursor approach: one machine per agent, or smart resource management)
  4. 04Cost per agent-hour is tracked and optimized
  5. 05Fleet scaling responds to demand within 60 seconds
Capability 02 · Infrastructure

MCP & Tool Integration

How agents connect to external tools, APIs, and internal systems via MCP (now universal standard) and plugins.

L2 · Stage 02Guided
Criteria - what to measure
  1. 011-3 MCP servers are configured (e.g., Git, Jira, documentation)
  2. 02MCP setup is documented but configured manually per developer
  3. 03Basic tool authorization is implemented (agents authenticate to MCP servers)
  4. 04MCP server configurations are shared via repository (not local-only)
  5. 05At least one MCP server provides internal documentation or codebase context
L4 · Stage 04OptimizedMost teams aim here
Criteria - what to measure
  1. 01Toolshed model: 400+ tools accessible behind a unified MCP gateway (Stripe model)
  2. 02Agent discovery: agents can query available tools and their capabilities at runtime
  3. 03MCP governance covers lifecycle management, versioning, and audit logging
  4. 04MCP tool usage analytics track which tools are used, by which agents, how often
  5. 05MCP server versioning allows rollback to previous versions without downtime
L5 · Stage 05Autonomous
Criteria - what to measure
  1. 01MCP operates as a bidirectional nervous system: production data flows to agents, agent actions flow to production
  2. 02Full production loop: Production -> MCP -> Agent -> Code -> Deploy -> Production
  3. 03Agent-to-Agent Protocol (A2A) and MCP are combined for multi-agent coordination
  4. 04MCP latency for context delivery is under 500ms P95
  5. 05A2A protocol enables agents to discover and delegate to other agents without human configuration
Capability 03 · Infrastructure

Build System

Build tooling optimized for agent-scale throughput - caching, incrementality, and speed.

L2 · Stage 02Guided
Criteria - what to measure
  1. 01Build caching is implemented (dependency cache, compilation cache)
  2. 02Parallel build steps are configured (test and lint run concurrently)
  3. 03Dedicated CI resources are allocated (not shared across all teams)
  4. 04Cache hit rate exceeds 60%
  5. 05Build time has improved by at least 30% compared to uncached baseline
L3 · Stage 03Systematic
Criteria - what to measure
  1. 01Advanced build system (Bazel, Buck2, or Pants) is adopted for primary codebase
  2. 02Remote execution (EngFlow or equivalent) distributes build steps across multiple machines
  3. 03Incremental builds run only changed targets (not full rebuild)
  4. 04BUILD file maintenance is assigned to specific team members or automated
  5. 05Remote cache hit rate exceeds 80%
L4 · Stage 04OptimizedMost teams aim here
Criteria - what to measure
  1. 01Any change gets build feedback in under 2 minutes
  2. 02Agent-specific build profiles exist (optimized for agent iteration patterns - fast feedback over comprehensive build)
  3. 03Build system understands agent iteration patterns and pre-caches likely next builds
  4. 04Build profiles are auto-selected based on invoker (agent vs. human vs. CI)
  5. 05Pre-caching hit rate exceeds 70% for agent iterations
L5 · Stage 05Autonomous
Criteria - what to measure
  1. 01Build is a commodity: near-instant feedback for agents regardless of codebase size
  2. 02Codebase is structured into self-contained modules/crates to eliminate compilation bottleneck (Cursor lesson)
  3. 03Disk I/O is optimized for concurrent agent workloads (parallel reads/writes across modules)
  4. 04Build latency is under 30 seconds for 90%+ of changes
  5. 05Module dependency graph is automatically maintained and optimized
Capability 04 · Infrastructure

Observability & Feedback Loop

Monitoring agent behavior, costs, and outcomes to close the improvement loop.

L4 · Stage 04OptimizedMost teams aim here
L5 · Stage 05Autonomous
Criteria - what to measure
  1. 01Full production-to-agent loop operates autonomously: anomaly detected, investigated, fixed, tested, deployed
  2. 02Infrastructure self-drives: code defines infrastructure, production performance informs code changes
  3. 03Anomaly-to-deploy cycle completes without human intervention for 80%+ of known issue categories
  4. 04Novel anomalies (not matching known patterns) are escalated to humans with full investigation context
  5. 05Mean time from anomaly detection to autonomous fix deployment is under 15 minutes
Climb the matrix

You don't have to figure this out alone.

Every level in this matrix has a path. Read the playbooks the teams that have climbed it wrote. Run the assessment with our consultants. Start where you are.

Live with Visdom

Book an AI Maturity Assessment session with your team.

We walk you through all four perspectives, score where you actually are, and leave you with a 90-day plan to climb in the dimensions that matter most.

Book an assessment See what's included90-day plan - scored assessment - coaching
Author Commentary

May 2026 update: observability stopped being a "nice to have" and became the area where the most money was made and lost.

ccusage is at 13.2k GitHub stars; /usage and /context are now built into Claude Code; multiple Reddit threads documented overnight bills of $3,800 from runaway subagent loops. The two new disciplines this month are cost telemetry (token spend per session, per project, per merged PR) and quality telemetry (thinking length, files-read-before-edit, KV cache hit rate). Stella Laurenzo's audit of 6,852 Claude Code sessions is the template for the latter; Anthropic's April 23 postmortem made it official that harness changes - not the model - cause regressions, which makes harness telemetry a first-class concern.

MCP also evolved this month - from "code tools" to deep-system access. pentester-mcp (offensive security), windbg-mcp (kernel), Pepper (iOS runtime), with mcp-auth-proxy emerging as middleware for OAuth/token-persistence issues. Rust-based context retrievers (webclaw, ferris-search) and Go orchestrators (jig) bring low-latency multi-agent profiles within reach. Infrastructure, not the model, is now the thing that decides whether your agent fleet scales gracefully or burns the budget on a Tuesday night.

Other perspectives