Infrastructure

The technical layer that enables (or blocks) agents. From shared Jenkins to ephemeral agent sandboxes.

Agent Runtime & Sandboxing

L1Ad-hoc3 practices
  • ·Agents run inside the developer's local IDE (no separate runtime)
  • ·No isolation between agent execution and developer's local environment

Evidence

  • ·Agent runs as IDE plugin with no containerization or isolation
  • ·No sandboxing configuration exists
L2Guided3 practices
  • ·Dedicated development environments exist for agent execution (separate from developer's primary workspace)
  • ·Basic sandboxing via Docker or equivalent containers is implemented
  • ·Agent credentials are scoped per project (not a single org-wide key)

Evidence

  • ·Docker or container configuration files for agent environments
  • ·Credential management configuration showing per-project scoping
  • ·Environment provisioning documentation or scripts
L3Systematic3 practices
  • ·Isolated agent environments (devbox model) prevent agents from accessing other projects
  • ·Pre-warmed containers with codebase at HEAD and dependencies installed are available
  • ·Network isolation prevents agents from reaching production systems

Evidence

  • ·Devbox configuration showing per-project isolation boundaries
  • ·Pre-warmed container pool metrics (pool size, warm hit rate, cold start rate)
  • ·Network policy configuration (Kubernetes NetworkPolicy, firewall rules) blocking production access
L4Optimized3 practices
  • ·Ephemeral devboxes spin up in under 10 seconds (Stripe benchmark)
  • ·Devboxes come pre-loaded with codebase, dependencies, and MCP tools
  • ·Kernel-level policy enforcement restricts agent actions (syscall filtering, resource limits)

Evidence

  • ·Devbox spin-up latency dashboard showing P50 under 10 seconds
  • ·Devbox snapshot configuration showing pre-loaded codebase, deps, and MCP tools
  • ·Kernel policy configuration (seccomp profiles, cgroup limits)
L5Autonomous3 practices
  • ·Dedicated compute infrastructure exists for agent fleet (not shared with developer workstations or production)
  • ·Agent fleet auto-scales with load (agents scale up during business hours, scale down off-hours)
  • ·Each agent runs in a fully isolated environment (Cursor approach: one machine per agent, or smart resource management)

Evidence

  • ·Infrastructure allocation showing dedicated agent compute (separate from dev and prod)
  • ·Auto-scaling configuration and scaling event logs
  • ·Agent fleet dashboard showing per-agent isolation and resource utilization

MCP & Tool Integration

L1Ad-hoc3 practices
  • ·No MCP servers are configured
  • ·Agents rely solely on public API knowledge from training data

Evidence

  • ·No MCP configuration files in repository or developer environment
  • ·Absence of tool integration beyond IDE built-ins
L2Guided3 practices
  • ·1-3 MCP servers are configured (e.g., Git, Jira, documentation)
  • ·MCP setup is documented but configured manually per developer
  • ·Basic tool authorization is implemented (agents authenticate to MCP servers)

Evidence

  • ·MCP server configuration files (mcp.json or equivalent)
  • ·Setup documentation for MCP server installation per developer
  • ·MCP server authentication configuration
L3Systematic3 practices
  • ·Centralized MCP platform manages server provisioning, configuration, and lifecycle
  • ·Domain-specific MCP servers exist (Architecture MCP, Ownership MCP, SLA MCP)
  • ·RBAC controls which agents can access which MCP tools

Evidence

  • ·MCP platform configuration showing centralized server management
  • ·RBAC policy configuration for MCP tool access
  • ·MCP server inventory listing domain-specific servers with owners
L4Optimized3 practices
  • ·Toolshed model: 400+ tools accessible behind a unified MCP gateway (Stripe model)
  • ·Agent discovery: agents can query available tools and their capabilities at runtime
  • ·MCP governance covers lifecycle management, versioning, and audit logging

Evidence

  • ·MCP gateway configuration showing 400+ registered tools
  • ·Agent discovery API or protocol documentation with runtime tool listing
  • ·MCP governance logs showing lifecycle events (deploy, version, deprecate, audit)
L5Autonomous3 practices
  • ·MCP operates as a bidirectional nervous system: production data flows to agents, agent actions flow to production
  • ·Full production loop: Production -> MCP -> Agent -> Code -> Deploy -> Production
  • ·Agent-to-Agent Protocol (A2A) and MCP are combined for multi-agent coordination

Evidence

  • ·MCP configuration showing bidirectional data flow (production to agent, agent to production)
  • ·End-to-end production loop traces (anomaly detected, agent invoked, fix deployed)
  • ·A2A protocol configuration showing agent-to-agent communication channels

Build System

L1Ad-hoc3 practices
  • ·Build uses default tool configuration (Maven/Gradle defaults, npm scripts without optimization)
  • ·Full rebuild runs on every change (no incremental build support)

Evidence

  • ·Build configuration file with default/untuned settings
  • ·CI logs showing full rebuild on every PR
L2Guided3 practices
  • ·Build caching is implemented (dependency cache, compilation cache)
  • ·Parallel build steps are configured (test and lint run concurrently)
  • ·Dedicated CI resources are allocated (not shared across all teams)

Evidence

  • ·Build cache configuration (Gradle build cache, npm cache, Docker layer cache)
  • ·CI pipeline configuration showing parallel step execution
  • ·Dedicated runner or resource pool configuration
L3Systematic3 practices
  • ·Advanced build system (Bazel, Buck2, or Pants) is adopted for primary codebase
  • ·Remote execution (EngFlow or equivalent) distributes build steps across multiple machines
  • ·Incremental builds run only changed targets (not full rebuild)

Evidence

  • ·Bazel/Buck2/Pants BUILD files in repository
  • ·Remote execution configuration (EngFlow, BuildBuddy, or equivalent)
  • ·Build log showing incremental target selection
L4Optimized3 practices
  • ·Any change gets build feedback in under 2 minutes
  • ·Agent-specific build profiles exist (optimized for agent iteration patterns - fast feedback over comprehensive build)
  • ·Build system understands agent iteration patterns and pre-caches likely next builds

Evidence

  • ·Build duration dashboard showing sub-2-minute feedback for all change types
  • ·Agent-specific build profile configuration
  • ·Pre-cache hit rate metrics for agent iteration patterns
L5Autonomous3 practices
  • ·Build is a commodity: near-instant feedback for agents regardless of codebase size
  • ·Codebase is structured into self-contained modules/crates to eliminate compilation bottleneck (Cursor lesson)
  • ·Disk I/O is optimized for concurrent agent workloads (parallel reads/writes across modules)

Evidence

  • ·Build duration dashboard showing near-instant feedback for standard changes
  • ·Codebase architecture showing modular structure (crate/module boundaries)
  • ·Disk I/O benchmarks for concurrent agent build workloads

Observability & Feedback Loop

L1Ad-hoc3 practices
  • ·Basic application logging exists
  • ·Alerting fires on application errors

Evidence

  • ·Logging configuration in application code
  • ·Alert configuration (PagerDuty, Opsgenie, or equivalent)
L2Guided3 practices
  • ·Structured logging is implemented (JSON logs with consistent fields)
  • ·OpenTelemetry basic instrumentation is deployed (traces and metrics)
  • ·Post-deploy monitoring checks run after each deployment

Evidence

  • ·Structured logging configuration showing JSON format with standard fields
  • ·OpenTelemetry SDK configuration in application code
  • ·Post-deploy monitoring job configuration in CD pipeline
L3Systematic3 practices
  • ·Full observability stack is operational (OpenTelemetry + Grafana/Datadog or equivalent)
  • ·Production metrics feed into dashboards accessible to all developers
  • ·Incident data (post-mortems, error patterns) is available as agent context

Evidence

  • ·Observability stack configuration (OTel collector, Grafana dashboards)
  • ·Production metrics dashboards with developer access
  • ·Incident data accessible via MCP or structured API
L4Optimized3 practices
  • ·Production anomaly detection auto-creates tickets and triggers agent investigation
  • ·Self-healing for known patterns: agent detects known error pattern, applies known fix, deploys, and verifies
  • ·Infrastructure recommends code changes based on production data (Vercel SDI model)

Evidence

  • ·Auto-ticket creation logs triggered by production anomalies
  • ·Self-healing event logs showing detection, fix, deploy, and verification steps
  • ·Infrastructure recommendation pipeline configuration (production data to code change suggestions)
L5Autonomous3 practices
  • ·Full production-to-agent loop operates autonomously: anomaly detected, investigated, fixed, tested, deployed
  • ·Infrastructure self-drives: code defines infrastructure, production performance informs code changes
  • ·Anomaly-to-deploy cycle completes without human intervention for 80%+ of known issue categories

Evidence

  • ·End-to-end autonomous fix traces (anomaly to deployed fix with no human steps)
  • ·Infrastructure-as-code showing production-informed code changes
  • ·Autonomous resolution rate dashboard showing 80%+ for known issue categories

Author Commentary

April 2026 update: MCP is now the universal standard for agent-tool integration. With 97M+ npm downloads and Cursor 3 shipping with 30+ built-in MCP plugins, the "should we adopt MCP?" question is settled. The question is now "how mature is your MCP layer?" — L1 (zero) to L5 (nervous system). Teams without any MCP servers are falling behind the baseline. Disk I/O is the hidden bottleneck of multi-agent systems. Cursor discovered this building a browser with hundreds of agents: compiling a monolith = many GB/s reads/writes. Solution: restructure project into self-contained crates/modules. The same applies to JVM: modularization isn't just clean code, it's agent throughput. Stripe's devbox (10s spin-up, pre-warmed) is the gold standard of isolated agent runtime. Replicating this requires investment, but the alternative (agent on dev's laptop) doesn't scale beyond L2.