Maturity Matrix

Agent iterates tests to green in sandbox (doesn't block team CI)

AI agents fix failing tests in isolated sandboxes - running their own private CI loop - so agent work-in-progress never pollutes the shared pipeline or slows the team.

  • ·TORS exceeds 95%
  • ·Agents iterate tests to green in isolated sandbox CI without blocking team CI queue
  • ·Mutation testing validates that tests catch real defects (not just achieve coverage)
  • ·Sandbox CI iteration count per PR is tracked (ITS target: 1-3)
  • ·Mutation testing kill rate exceeds 80%

Evidence

  • ·TORS dashboard showing 95%+ with per-service breakdown
  • ·Sandbox CI logs showing agent iteration cycles separate from team CI
  • ·Mutation testing reports showing kill rate and surviving mutants

What It Is

When an AI agent's code changes cause test failures, the naive approach is to let the agent iterate in the shared CI pipeline: submit, fail, fix, resubmit, fail again, fix again. This works for a single agent, but it creates serious problems at scale. Each iteration cycle occupies CI resources, clutters the PR history, and creates noise in the shared build queue. With multiple agents running simultaneously - the L4 pattern of 3-5 agents per developer - the shared CI pipeline becomes a bottleneck.

Agent sandbox iteration solves this by giving each agent its own isolated CI environment. The agent makes code changes, runs tests in its sandbox, observes the results, makes corrections, and repeats - all without touching the shared pipeline. Only when all tests pass in the sandbox does the agent submit its PR to the shared CI for final validation and merge. The shared pipeline sees only finished, green work - never in-progress iterations.

The sandbox is not a stripped-down environment. It runs the same tests as the shared CI: unit tests, integration tests, the acceptance test suite. It has access to the same test fixtures, the same environment configuration, the same database schemas. What makes it a sandbox is isolation: each agent's environment is independent from every other agent's environment, and from the shared team pipeline. Failures in one sandbox don't affect another.

At Level 4 (Optimized), sandbox isolation is the technical prerequisite for running multiple agents in parallel. Without it, parallel agents interfere with each other through shared CI state, competing for resources, and generating noise in the shared build system that humans have to triage.

Why It Matters

Sandbox CI isolation is the infrastructure that enables the parallel agent model:

  • No CI pollution - The shared pipeline only sees PR-ready code. Developers are not interrupted by agent work-in-progress failures. Build queues are not congested by iterating agents.
  • Parallel agent execution - Three to five agents per developer running simultaneously each need independent CI feedback. Sandbox isolation makes this feasible without proportional infrastructure scaling.
  • Agent iteration speed - In a shared pipeline with queue wait times, each agent iteration might take 15-20 minutes including wait. In a dedicated sandbox, the same iteration takes 3-5 minutes. Agents reach green 3-5x faster.
  • Failure attribution clarity - When a test fails in the shared pipeline, it may not be clear whether it was caused by the most recent agent, a previous agent, or a human developer's change. In sandboxed agents, every failure is clearly attributable.
  • Cost predictability - Sandbox CI for multiple agents requires infrastructure investment, but the cost is predictable and proportional to agent count. Uncontrolled shared pipeline congestion is unpredictable and grows super-linearly.
Tip

Design sandboxes to be ephemeral - created on agent start, destroyed on completion. Ephemeral sandboxes prevent state accumulation between agent runs, which is one of the leading causes of "works in sandbox, fails in shared CI" surprises. Use container orchestration (Kubernetes Jobs, Fargate Tasks, or a CI platform with ephemeral environments) to manage sandbox lifecycle.

Getting Started

6 steps to get from here to the next level

Common Pitfalls

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

B
BobHead of Engineering

Bob's team has deployed three agents working in parallel, but shared CI is now congested with agent iterations. Developers are waiting 25 minutes for their own PRs to get CI feedback because agent jobs fill the queue. The team is frustrated and starting to question whether the agents are worth it.

What Bob should do - role-specific action plan

S
SarahProductivity Lead

Sarah has been tracking developer satisfaction and has seen a dip since agents were deployed. Developers feel like CI has gotten slower and less responsive. She expected agents to improve productivity, not degrade it.

What Sarah should do - role-specific action plan

V
VictorStaff Engineer - AI Champion

Victor architected the initial agent deployment and is now being asked to fix the CI congestion problem. He has three options: scale CI infrastructure (expensive), throttle agent concurrency (defeats the purpose), or implement sandbox isolation (correct but complex).

What Victor should do - role-specific action plan