CI < 2 minutes

CI under 2 minutes is the Optimized (L4) milestone and represents a qualitative shift in how CI is used.

·CI completes in under 2 minutes (median)
·Ephemeral sandbox environments spin up in under 10 seconds for agent CI loops
·Agent sandbox CI supports 50+ iteration attempts in 5 minutes without blocking team CI queue

·P95 CI duration is under 3 minutes
·CI feedback latency (from push to result) is tracked and reported

Evidence

·CI run duration dashboard showing median under 2 minutes
·Sandbox spin-up time metrics showing sub-10-second P50
·Agent CI iteration logs showing 50+ attempts within 5-minute windows

What It Is

CI under 2 minutes is the Optimized (L4) milestone and represents a qualitative shift in how CI is used. At this speed, CI is no longer a gate that agents and developers wait for - it's a feedback mechanism they interact with continuously. A 90-second CI run means an agent can iterate 40 times per hour. The loop becomes tight enough that agents can run exploratory iteration: try an approach, get feedback, adjust, repeat - in a way that resembles test-driven development but at machine speed.

Achieving sub-2-minute CI is not primarily a configuration problem at this level. The pipeline architecture has already been optimized with caching, parallelization, and incremental builds. What remains is infrastructure-level work: pre-built base images that eliminate container startup time, warm runner pools that eliminate job startup latency, and distributed build systems (Bazel, Pants, Buck) that can execute build and test steps in true parallel across many machines rather than within a single runner.

The 2-minute target also changes what "passing CI" means. At 5-minute CI, teams run a meaningful fast-path and accept that some slower tests run post-merge. At 2-minute CI, teams invest in making the meaningful fast-path comprehensive enough to be a real quality gate. This requires significant test architecture work: tests that run in 2 minutes across a large codebase must be highly selective, highly parallelized, and architecturally isolated from I/O. Teams at L4 have typically invested 6-12 months getting their test suite to this shape.

The payoff at 2-minute CI is visible in throughput numbers. Stripe's engineering blog describes internal tooling that supports 1,000+ merges per week on monorepos, enabled by fast CI. Teams running dozens of parallel agents can sustain high PR throughput without CI becoming the bottleneck. The 2-minute threshold is where CI scales with agent usage rather than constraining it.

Why It Matters

Agent iteration reaches machine speed - 40+ iteration cycles per hour means agents can work through complex multi-step implementations within a single 30-minute window
CI becomes a development tool, not a gate - at 2 minutes, developers and agents use CI as a rapid feedback mechanism during development, not just before merge
Merge queues become high-throughput - a 2-minute pipeline can process 30 PRs per hour per runner, enabling teams running dozens of parallel agents to sustain throughput without queuing
Enables agent autonomy - when CI feedback arrives in 2 minutes, agents can close their own iteration loops without human intervention; the human reviews the final result, not each intermediate step
Dramatically reduces total delivery time - a feature that requires 15 CI iterations takes 30 minutes at 2-minute CI versus 75 minutes at 5-minute CI; at scale, this compounds across every story point

Getting Started

Invest in pre-built runner images - Cold runner startup (downloading and installing tools) can add 30-90 seconds to every CI run. Pre-build custom Docker images with your full toolchain already installed and push them to a registry. In GitHub Actions, use a custom runs-on image or self-hosted runners with pre-configured AMIs. Eliminating runner startup is often a 30-60 second win.
Implement a distributed build system - Bazel, Pants, or Buck can distribute build and test steps across multiple machines. This is the step that takes large codebases from "5 minutes with parallelism on one machine" to "90 seconds with distribution across 8 machines." See the Bazel + Remote Caching guide for implementation details.
Maintain a warm runner pool - On-demand runners have startup latency (30-120 seconds for cloud VMs). Self-hosted persistent runners or pre-warmed runner pools eliminate this. GitHub's larger runners and BuildKite clusters support warm pools. The investment pays back immediately on every CI run.
Instrument CI with sub-stage timing - At 2-minute CI, every 10 seconds matters. Instrument each stage with explicit timing output so you know exactly where time is spent. Use GitHub Actions' job summary output or BuildKite's annotation API to surface timing data on every run.
Eliminate all remaining real I/O from unit tests - Any test that opens a real file, real socket, or real database connection does not belong in the 2-minute path. Use strict linting rules (custom ESLint rules, Checkstyle rules) to prevent new real-I/O tests from being added to the fast suite. Put them in an explicitly labeled slow suite that runs post-merge.
Set and enforce a per-stage time budget - Define explicit time budgets: lint must complete in 20 seconds, unit tests in 60 seconds, build in 40 seconds. Enforce these with timeout-minutes and fail the build if any stage exceeds its budget. Budget enforcement prevents gradual drift back toward slowness.

Tip

Pre-built runner images and warm runner pools are often the highest-ROI investment at this level. They're infrastructure changes (not test architecture changes) that provide immediate speedup across all CI runs. Prioritize them before the more complex distributed build work.

6 steps to get from here to the next level

Common Pitfalls

Distributed build complexity without organizational support. Bazel is powerful but requires significant investment in BUILD file maintenance and a team that understands the tool. Adopting Bazel to hit 2-minute CI without the organizational commitment to maintain it creates a brittle system. Assess team capacity for Bazel ownership before adopting it.

Achieving 2 minutes on the happy path only. Fast CI that takes 8 minutes when a certain module changes is not 2-minute CI. Profile your CI across a representative sample of recent PRs - not just simple changes - to understand the actual distribution of CI times. If the p95 is 6 minutes, you have a 2-minute median with a 6-minute tail, not 2-minute CI.

Undermining quality to hit the time target. If the only way to hit 2 minutes is to remove tests that would catch real bugs, the speed target has been set incorrectly relative to the test suite quality. The right response is to improve test architecture (faster tests with equivalent coverage), not to remove coverage to hit an arbitrary time budget.

Neglecting the local development environment. If CI is 2 minutes but local tests take 15 minutes, developers stop running tests locally and push speculatively. The local test path should mirror the CI fast path and run in comparable time. This requires the same test architecture investments - isolation, parallelization, incremental execution.

Allowing the runner pool to become a bottleneck. A 2-minute CI pipeline only delivers 2-minute feedback if runners are available immediately. If the runner pool is undersized for the team's push rate, jobs queue and wall-clock time to feedback grows. Size the runner pool for peak agent load, not average human load - agents generate bursts of CI activity.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob's team has reached 5-minute CI and the improvement in agent productivity is visible: developers report getting more done in agent sessions and PR throughput has increased 40% over the past quarter. Now he wants to push to 2 minutes to unlock the full Optimized-level capability. The challenge is that reaching 2 minutes requires runner infrastructure investment and potentially adopting Bazel, both of which require budget and platform team support.

Bob should build a business case framing 2-minute CI as agent infrastructure investment. His team's agents currently complete about 20 iterations per hour at 5-minute CI. At 2 minutes, that becomes 40 iterations per hour. If 10 developers each run 2 agent sessions per day for an hour each, the throughput improvement is 200 additional agent iteration-hours per month. Bob should estimate what that's worth in terms of features delivered (using historical velocity data) and compare to the cost of dedicated runners and the platform team's time for the Bazel migration. The ROI calculation is typically compelling: faster CI infrastructure pays for itself within 1-2 months when teams are running AI agents at scale.

SarahProductivity Lead

Sarah's CI feedback latency dashboard now shows consistent 5-minute CI, and she's tracking agent iteration rate as a derived metric. The data shows that at 5 minutes, the team's most productive agent users are hitting a ceiling: their iteration rate plateaus after about 6 per hour, suggesting the CI wait is still the bottleneck even at 5 minutes. The developers describe it as "I start the next task while waiting for CI, then I'm context-switched and CI takes me a while to re-engage with."

Sarah should add a metric for "CI re-engagement latency" - how long after CI completes does it take the developer to respond to results? If this is consistently 3-5 minutes (the re-context-switch overhead), then the effective feedback loop is 8-10 minutes even with 5-minute CI. This finding makes the case for 2-minute CI more precisely: it's not just about raw speed, it's about whether the feedback arrives within the developer's attention window. Sarah should present this insight with data and propose 2-minute CI as the target that would keep feedback within the active attention window for most developers.

VictorStaff Engineer - AI Champion

Victor has been experimenting with Bazel on a side branch for a month. He's implemented BUILD files for the core library modules and set up EngFlow for remote caching. On those modules, test execution time has dropped from 3 minutes to 45 seconds. He's confident the approach works but needs to show a clear path to team-wide adoption before proposing it.

Victor should productize his Bazel experiment into a concrete migration proposal: which modules to migrate first (start with the most frequently changed modules for maximum impact), what the BUILD file maintenance burden looks like (estimated hours per month to keep BUILD files current), and what the expected CI time improvement is (based on actual measurements from his experiment). He should propose a phased approach: migrate three modules in sprint 1, measure the impact on CI time, and use the data to decide on expanding the migration. This measured approach reduces risk and builds confidence. Victor should also flag the EngFlow option explicitly - many teams that need remote caching for Bazel don't want to operate the caching infrastructure themselves, and EngFlow's managed service is the standard enterprise solution.