Maturity Matrix

Self-driving CI: auto-scaling per agent load

Self-driving CI is a CI system that observes its own load, predicts demand, and scales its infrastructure automatically without any human intervention.

  • ·CI provides sub-minute feedback for standard changes
  • ·CI auto-scales runner capacity based on agent load (no manual capacity planning)
  • ·Production feedback loop auto-adjusts the CI test suite (adds tests for observed failures, removes redundant tests)
  • ·CI runner utilization stays between 50-80% (auto-scaling prevents both waste and queuing)
  • ·Test suite evolution is auditable (each auto-added/removed test has a provenance record)

Evidence

  • ·CI run duration dashboard showing sub-minute median for standard changes
  • ·Auto-scaling configuration and runner utilization metrics
  • ·Test suite change log showing production-feedback-driven additions and removals

What It Is

Self-driving CI is a CI system that observes its own load, predicts demand, and scales its infrastructure automatically without any human intervention. When an agent fleet begins an intensive iteration session and submits 200 CI jobs in a 10-minute window, the CI system detects the increasing queue depth, proactively provisions additional runners, and serves all 200 jobs with minimal queuing - then scales back down when the burst ends. No engineer manually adjusts runner counts; no oncall rotation manages CI capacity; the system regulates itself.

The "self-driving" label captures the key property: the CI system drives its own operational parameters based on observed conditions. This goes beyond simple reactive autoscaling (add runners when queue grows, remove runners when queue shrinks). Self-driving CI incorporates predictive scaling (anticipate load based on historical patterns and current agent activity signals), heterogeneous scaling (provision different runner types for different job types - large machines for compilation, small machines for lint), and capacity reservation (maintain a pre-warmed pool sized for the team's peak agent usage, not just current load).

At L5, self-driving CI is the infrastructure that enables "hundreds of agents on a codebase" patterns. Organizations like Stripe have published accounts of running 1,000+ merges per week; Google runs millions of CI jobs per day. At this scale, static runner pools and manual capacity management are not viable. The CI infrastructure must be as dynamic as the agent workloads it serves.

Self-driving CI at the autonomous maturity level also includes loop-closing feedback: the CI system reports its own performance metrics back to an observability layer that agents can read. An orchestrating agent that sees "CI queue time is elevated, sandbox pool exhausted" can reduce its submission rate, batch changes, or escalate to a human. The CI system and the agent fleet are coupled in a feedback loop, not operating independently.

Why It Matters

  • Agent fleets scale without CI infrastructure becoming the bottleneck - a fleet of 50 concurrent agents generating 500 CI jobs per hour needs a CI system that matches its scale; self-driving CI provides this without manual intervention
  • Zero engineering oncall for CI capacity - capacity management is automated; engineers focus on CI correctness and optimization, not on "how many runners do we have right now"
  • Cost optimization through elastic scaling - self-driving CI scales down during off-hours and weekends, paying only for the capacity actually used; static over-provisioning wastes money continuously
  • Predictive scaling reduces cold-start latency - a CI system that anticipates morning load peaks and pre-warms runners before they're needed delivers consistent sub-minute job start times, even during demand spikes
  • Provides a CI health signal to the agent layer - agents and orchestrators that can observe CI system health can adapt their behavior (throttle submission, prioritize certain job types) based on actual infrastructure state

Getting Started

6 steps to get from here to the next level

Common Pitfalls

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

B
BobHead of Engineering

Bob's engineering team has grown from 15 to 40 developers over the past year, and AI agent adoption is generating 10x the CI load of a year ago. The platform team is spending 3-4 hours per week manually adjusting runner counts based on observed load patterns - scaling up on Monday morning, scaling down Friday afternoon, emergency scaling during sprint end pushes. This manual capacity management is unsustainable and error-prone.

Bob should fund the platform team to implement autoscaling properly. The work is well-defined: Actions Runner Controller with KEDA, configured with the team's load patterns. The implementation is a 1-2 sprint project with a clear success metric: zero manual runner count adjustments for 60 consecutive days after implementation. Bob should frame this as operational work with a direct cost impact: 3-4 hours per week of platform engineer time at fully-loaded cost, saved continuously after implementation. The ROI calculation is straightforward and the work is non-trivial but well-understood.

S
SarahProductivity Lead

Sarah's CI feedback latency dashboard shows a pattern: CI queue times spike Monday mornings and Thursday afternoons, correlating with sprint start (Monday) and sprint-end push (Thursday). These are predictable, recurring patterns that predictive scaling could address. But the current system only responds reactively, meaning CI is slow for 20-30 minutes at the start of each peak period while the reactive scaling catches up.

Sarah should present this pattern analysis to the platform team: "CI queue time exceeds SLO for 25 minutes at the start of every Monday and every Thursday afternoon. This is a predictable pattern that scheduled pre-warming would eliminate." She should estimate the developer-time cost of these recurring slowdowns: if 40 developers each experience 5 minutes of elevated queue time twice a week (Monday morning, Thursday afternoon), that's 400 developer-minutes per week - 6.7 developer-hours - lost to predictable, preventable CI slowness. The scheduled pre-warming implementation is a few hours of work that recovers 6.7 developer-hours per week indefinitely.

V
VictorStaff Engineer - AI Champion

Victor has implemented KEDA-based autoscaling for his team's runner pool and it works well: the pool scales from 0 to 20 runners within 60 seconds of a demand spike and back to 0 within 15 minutes of inactivity. His CI costs dropped 60% compared to the previous static pool of 10 runners.

Victor should take the next step: make the CI system observable to agents. He should expose a simple HTTP endpoint that returns current CI metrics (/ci/health returning queue depth, wait time, utilization). He should then create a Claude Code MCP tool that agents can call to check CI health before submitting large batches of changes. An agent that checks CI health before submitting 20 sandbox runs can decide to stagger them (submit 5 at a time, wait for results, submit the next 5) when the queue is already long. This agent-CI feedback loop is the defining capability of L5 CI infrastructure: the CI system and the agents that use it are mutually aware and coordinate their behavior.