CI < 5 minutes

CI under 5 minutes is the Systematic (L3) milestone where CI speed becomes a first-class engineering concern, not a background project.

·CI completes in under 5 minutes (median)
·Remote caching is implemented (Bazel remote cache, EngFlow, Gradle Enterprise)
·Incremental builds run only changed modules or fragments

·P95 CI duration is under 8 minutes
·Build system supports hermetic builds (reproducible outputs regardless of machine)

Evidence

·CI run duration dashboard showing median under 5 minutes
·Remote cache configuration and cache hit rate metrics
·Build configuration showing incremental/changed-only targeting

What It Is

CI under 5 minutes is the Systematic (L3) milestone where CI speed becomes a first-class engineering concern, not a background project. At this level, teams have moved beyond basic caching and parallelization to systematic optimization: incremental builds that only rebuild changed modules, fine-grained test selection that only runs tests affected by changed code, and CI infrastructure that treats fast feedback as a hard requirement with enforcement mechanisms.

The 5-minute threshold qualitatively changes what is possible with AI agents. At 10 minutes, agents can iterate 6 times per hour. At 5 minutes, they can iterate 12 times per hour - enough to tackle a multi-step feature, encounter several failures, and work through them within a single focused session. The difference between 10 and 5 minutes is not just 2x speed; it's the difference between "agents can work" and "agents can work fluidly." Developers who experience 5-minute CI describe it as the point where using agents stops feeling like waiting and starts feeling like having a conversation.

Reaching sub-5-minute CI requires more than configuration changes to an existing pipeline. It requires architectural decisions: which tests are truly unit tests (fast, isolated, no I/O) versus integration tests (slower, with dependencies)? Is the build system capable of incremental compilation, or does every change trigger a full rebuild? Are test data fixtures loaded from disk on every test run or maintained in a test harness state? These questions touch the architecture of both the application and the test suite, not just the CI pipeline configuration.

At the Systematic level, teams treat CI time as a metric with the same rigor as application performance metrics. There are alerts when CI time exceeds 5 minutes. There is a designated owner for CI infrastructure. New PRs that would increase CI time by more than 30 seconds require explicit justification. The 5-minute target is defended, not just achieved once.

Why It Matters

Agent iteration reaches "fluid" threshold - at 5 minutes, agents can complete 12+ iteration cycles per hour, enabling them to tackle and resolve multi-step implementation challenges within a single session
Developers stay in flow - 5-minute feedback fits within the average human attention span for a single task; developers read results, make decisions, and resubmit without losing context
Merge queue throughput enables AI-scale PR volume - a 5-minute pipeline can process 12 PRs per hour per runner, supporting the 10-50 PRs per day that AI-assisted teams generate
Test quality improves alongside speed - reaching 5-minute CI forces the discipline of isolating unit tests from integration tests, which consistently improves test reliability as a side effect
CI becomes competitive pressure - once one team achieves 5-minute CI, it becomes the benchmark for other teams; speed normalizes as an expectation

Getting Started

Implement test impact analysis - Tools like Jest's --onlyFailures, pytest's --lf (last failed), or dedicated test selection tools (Launchable, BuildPulse) run only the tests affected by changed code. This is the single highest-leverage change for large test suites, often cutting test execution time by 60-80% on typical branch changes.
Enable incremental compilation in your build system - For JVM projects: Gradle's build cache and incremental compilation reduce rebuild time on partial changes from minutes to seconds. For TypeScript: use project references and --incremental. For Go: Go's package-level caching is incremental by default, but ensure GOCACHE is preserved across CI runs.
Move integration tests to a post-merge pipeline - Integration tests that require real databases, message queues, or external services don't need to run on every commit. Move them to a post-merge pipeline that validates main but doesn't block branches. This alone often cuts the blocking CI time in half.
Implement CI time regression detection - Add a CI timing check that fails the build if total CI time exceeds a threshold (e.g., 5 minutes). In GitHub Actions, use the timeout-minutes key on each job and an explicit step that reads run duration and fails if it exceeds a limit. Treat CI time regressions the same as performance regressions in the application.
Profile test execution with granularity - Use your test framework's timing output (Jest's --verbose, pytest's -v --durations=10, JUnit's Surefire reports) to identify the slowest 10 tests. These outliers often account for 30-50% of total test time. Investigate each: are they testing real behavior, or testing implementation details that could be unit-tested faster?
Consolidate and right-size runners - At 5-minute CI, runner startup time becomes meaningful (a 30-second runner startup is 10% of a 5-minute build). Use persistent runners or runner pools with warm containers to eliminate cold-start overhead. GitHub Actions larger runners (ubuntu-latest-16-cores) can also significantly reduce parallelization overhead.

Tip

Test impact analysis tools require an initial investment in mapping tests to code paths, but most modern tools do this automatically by observing test execution. The payoff is disproportionate: on a typical codebase change, only 15-30% of tests are affected. Running only those tests while maintaining full confidence in the suite is the highest-leverage optimization available.

6 steps to get from here to the next level

Common Pitfalls

Over-parallelizing without measuring job startup overhead. Splitting into 20 parallel jobs doesn't give you 20x speedup - if each job takes 45 seconds to start, spin up, and restore cache, your effective test execution time for a 3-minute test run becomes less than half the total job time. Find the parallelism level where the ratio of startup overhead to test execution is acceptable (usually 8-12 parallel jobs for most pipelines).

Pruning tests that should not be pruned. Test impact analysis is powerful but can have false negatives - tests that are affected by a change but not caught by the analysis. Teams that tune their analysis too aggressively can miss failures. Validate your test selection tool's accuracy before relying on it for gate decisions. Run the full suite weekly or on merge to catch anything the incremental selection misses.

Moving integration tests post-merge and losing visibility. If integration tests only run on merge, failures on main are discovered later and blame analysis is harder. Mitigate this by running integration tests on PRs targeting main (not on every feature branch commit) and ensuring the team has a clear process for responding to post-merge failures - someone must own the fix within an SLA.

Achieving 5 minutes once and not defending it. CI pipelines accumulate slowness organically. Each new test added, each new dependency installed, each new CI step added takes a little time. Without active defense - a timing check, a time budget per job, a designated owner - the pipeline drifts back to 10 minutes within a few months.

Confusing a fast path with a complete path. A "fast feedback" job that runs in 2 minutes but only checks 40% of the codebase gives false confidence. The 5-minute target must represent a meaningful quality gate, not just a subset of checks. Document exactly what passes in 5 minutes and what runs separately, so developers understand the signal they're getting from CI.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob's team has hit 9-minute CI through basic caching and parallelization. The next milestone - 5 minutes - feels harder; the obvious wins are gone. He's seeing that the remaining time is in the integration test suite, which requires real infrastructure (a PostgreSQL database and a Redis instance) and can't simply be parallelized further without test isolation work.

Bob should treat the integration test infrastructure as a platform investment, not a cleanup task. He should fund a dedicated "test infrastructure" sprint where one engineer moves the integration tests to Docker Compose-based isolation (each CI job gets its own containerized database, spun up fresh per run), enabling true parallelization. This work typically takes 3-5 days and unlocks both the 5-minute target and improved test reliability. Bob should frame the investment in terms of agent iteration rate: the team's agents are currently bottlenecked at 6 iterations per hour; after this work they'll be at 12. That's a 2x improvement in agent productivity from one sprint of infrastructure work.

SarahProductivity Lead

Sarah has the data to make a compelling case: her CI feedback latency dashboard shows that agent iteration rate correlates directly with CI speed. The team's most productive agent users are the ones who've found ways to get faster local feedback - running just the relevant test file before pushing. But those workarounds don't work for everyone, and they don't capture the full CI quality signal.

Sarah should compute the "agent iteration ceiling" at the current CI speed and compare it to the ceiling at 5 minutes. At 9-minute CI, agents can iterate 6-7 times per hour. At 5 minutes, they can iterate 12 times per hour. If the team has 10 active agent users who each run 3 agent sessions per day for 2 hours each, the 5-minute CI unlocks 60 additional iterations per day across the team. Sarah should present this as expected throughput improvement, then validate it after the optimization. The before-and-after data becomes the most compelling argument for further CI investment.

VictorStaff Engineer - AI Champion

Victor has been running test impact analysis locally using a custom script that reads the git diff and maps changed files to test files using a manually maintained mapping. It works but requires maintenance every time the codebase structure changes. He knows the right answer is a proper test impact analysis tool integrated into CI, but he hasn't had time to evaluate and implement one.

Victor should spend two days evaluating Launchable and BuildPulse for their codebase - both offer free trials with existing CI systems. He should instrument a week of CI runs with both tools to compare their coverage accuracy (how many tests do they select vs. how many they should select) and false negative rate (how often do they miss a failure that the full suite would catch). The evaluation should be quantitative: compare the selected test count and run time against the full suite, and track any failures that slip through. With that data, Victor can make a concrete recommendation to the team: here is the tool, here is the expected speedup, here is the accuracy track record over 200 CI runs.