CI < 5 minutes
CI under 5 minutes is the Systematic (L3) milestone where CI speed becomes a first-class engineering concern, not a background project.
- ·CI completes in under 5 minutes (median)
- ·Remote caching is implemented (Bazel remote cache, EngFlow, Gradle Enterprise)
- ·Incremental builds run only changed modules or fragments
- ·P95 CI duration is under 8 minutes
- ·Build system supports hermetic builds (reproducible outputs regardless of machine)
Evidence
- ·CI run duration dashboard showing median under 5 minutes
- ·Remote cache configuration and cache hit rate metrics
- ·Build configuration showing incremental/changed-only targeting
What It Is
CI under 5 minutes is the Systematic (L3) milestone where CI speed becomes a first-class engineering concern, not a background project. At this level, teams have moved beyond basic caching and parallelization to systematic optimization: incremental builds that only rebuild changed modules, fine-grained test selection that only runs tests affected by changed code, and CI infrastructure that treats fast feedback as a hard requirement with enforcement mechanisms.
The 5-minute threshold qualitatively changes what is possible with AI agents. At 10 minutes, agents can iterate 6 times per hour. At 5 minutes, they can iterate 12 times per hour - enough to tackle a multi-step feature, encounter several failures, and work through them within a single focused session. The difference between 10 and 5 minutes is not just 2x speed; it's the difference between "agents can work" and "agents can work fluidly." Developers who experience 5-minute CI describe it as the point where using agents stops feeling like waiting and starts feeling like having a conversation.
Reaching sub-5-minute CI requires more than configuration changes to an existing pipeline. It requires architectural decisions: which tests are truly unit tests (fast, isolated, no I/O) versus integration tests (slower, with dependencies)? Is the build system capable of incremental compilation, or does every change trigger a full rebuild? Are test data fixtures loaded from disk on every test run or maintained in a test harness state? These questions touch the architecture of both the application and the test suite, not just the CI pipeline configuration.
At the Systematic level, teams treat CI time as a metric with the same rigor as application performance metrics. There are alerts when CI time exceeds 5 minutes. There is a designated owner for CI infrastructure. New PRs that would increase CI time by more than 30 seconds require explicit justification. The 5-minute target is defended, not just achieved once.
Why It Matters
- Agent iteration reaches "fluid" threshold - at 5 minutes, agents can complete 12+ iteration cycles per hour, enabling them to tackle and resolve multi-step implementation challenges within a single session
- Developers stay in flow - 5-minute feedback fits within the average human attention span for a single task; developers read results, make decisions, and resubmit without losing context
- Merge queue throughput enables AI-scale PR volume - a 5-minute pipeline can process 12 PRs per hour per runner, supporting the 10-50 PRs per day that AI-assisted teams generate
- Test quality improves alongside speed - reaching 5-minute CI forces the discipline of isolating unit tests from integration tests, which consistently improves test reliability as a side effect
- CI becomes competitive pressure - once one team achieves 5-minute CI, it becomes the benchmark for other teams; speed normalizes as an expectation
Getting Started
6 steps to get from here to the next level
Common Pitfalls
Mistakes teams actually make at this stage - and how to avoid them
How Different Roles See It
Bob's team has hit 9-minute CI through basic caching and parallelization. The next milestone - 5 minutes - feels harder; the obvious wins are gone. He's seeing that the remaining time is in the integration test suite, which requires real infrastructure (a PostgreSQL database and a Redis instance) and can't simply be parallelized further without test isolation work.
Bob should treat the integration test infrastructure as a platform investment, not a cleanup task. He should fund a dedicated "test infrastructure" sprint where one engineer moves the integration tests to Docker Compose-based isolation (each CI job gets its own containerized database, spun up fresh per run), enabling true parallelization. This work typically takes 3-5 days and unlocks both the 5-minute target and improved test reliability. Bob should frame the investment in terms of agent iteration rate: the team's agents are currently bottlenecked at 6 iterations per hour; after this work they'll be at 12. That's a 2x improvement in agent productivity from one sprint of infrastructure work.
Sarah has the data to make a compelling case: her CI feedback latency dashboard shows that agent iteration rate correlates directly with CI speed. The team's most productive agent users are the ones who've found ways to get faster local feedback - running just the relevant test file before pushing. But those workarounds don't work for everyone, and they don't capture the full CI quality signal.
Sarah should compute the "agent iteration ceiling" at the current CI speed and compare it to the ceiling at 5 minutes. At 9-minute CI, agents can iterate 6-7 times per hour. At 5 minutes, they can iterate 12 times per hour. If the team has 10 active agent users who each run 3 agent sessions per day for 2 hours each, the 5-minute CI unlocks 60 additional iterations per day across the team. Sarah should present this as expected throughput improvement, then validate it after the optimization. The before-and-after data becomes the most compelling argument for further CI investment.
Victor has been running test impact analysis locally using a custom script that reads the git diff and maps changed files to test files using a manually maintained mapping. It works but requires maintenance every time the codebase structure changes. He knows the right answer is a proper test impact analysis tool integrated into CI, but he hasn't had time to evaluate and implement one.
Victor should spend two days evaluating Launchable and BuildPulse for their codebase - both offer free trials with existing CI systems. He should instrument a week of CI runs with both tools to compare their coverage accuracy (how many tests do they select vs. how many they should select) and false negative rate (how often do they miss a failure that the full suite would catch). The evaluation should be quantitative: compare the selected test count and run time against the full suite, and track any failures that slip through. With that data, Victor can make a concrete recommendation to the team: here is the tool, here is the expected speedup, here is the accuracy track record over 200 CI runs.
Further Reading
5 resources worth reading - hand-picked, not scraped
From the Field
Recent releases, projects, and discussions relevant to this maturity level.