CI Feedback Latency tracking

CI Feedback Latency is the time from when an agent pushes a commit to when CI produces a result (pass or fail) that the agent can act on.

·ITS (Iterations-to-Success) is tracked with a target of 1-3
·CPI (Cost-per-Iteration) is tracked with a target under $0.50
·CI feedback latency is tracked as a metric (time from push to CI result)

·Metrics are broken down per team and per repository
·Cost tracking includes model API costs, CI compute costs, and runner costs per iteration

Evidence

·ITS dashboard showing iteration count distribution per PR
·CPI dashboard showing cost per CI iteration
·CI feedback latency chart with P50, P95, P99 breakdowns

What It Is

CI Feedback Latency is the time from when an agent pushes a commit to when CI produces a result (pass or fail) that the agent can act on. For humans, a 15-minute CI run is annoying but manageable - a developer can context-switch to something else and return when CI is done. For agents, a 15-minute CI feedback loop is an architectural problem. An agent waiting for CI feedback is either idle (wasting time) or guessing at fixes without feedback (producing lower quality code). Either outcome degrades the agent workflow significantly.

The reason CI feedback latency matters so much more for agents than for humans is the iteration structure. A human developer runs CI once, waits, and takes action. An agent running at ITS of 2-3 iterates: push, wait for CI, read result, push fix, wait for CI again. If each CI run takes 15 minutes and the agent iterates 3 times, that's 45 minutes of latency before a PR is ready for review. If CI takes 2 minutes, the same workflow completes in 6 minutes. The 7x latency difference directly translates to agent throughput.

At L3, tracking CI feedback latency separately from other metrics is important because it's a controllable variable that has an outsized impact on everything else. Teams that invest in CI speed - through better caching, test parallelization, incremental test selection, and faster runner hardware - see immediate, measurable improvements in ITS, in agent throughput, and in the development experience for human developers as a side benefit.

The target at L3 is to understand the current distribution and drive toward the L4 target of under 5 minutes. The measurement itself - tracking p50, p90, and p99 CI run duration for agent-triggered runs - is the starting point. Teams often discover that their CI latency distribution is bimodal: a fast path for small changes (2-3 minutes) and a slow path for changes touching shared infrastructure (20-30 minutes). The slow path is where agent workflows get stuck, and it's the highest-priority optimization target.

Why It Matters

Directly multiplies agent throughput - every minute reduction in CI latency is a minute reduction in agent iteration time, multiplied by ITS; teams with 5-minute CI run 3x faster agent workflows than teams with 15-minute CI
Reduces agent errors from guessing - agents that wait a long time for CI feedback are more likely to make additional changes while waiting, compounding errors; fast CI keeps the agent's reasoning tightly coupled to real test feedback
Makes high-ITS less costly - even if ITS is 4-5 on complex tasks, fast CI means the total iteration time is still acceptable; the combination of tracking both metrics reveals the actual throughput ceiling
Improves human developer experience as a side effect - CI speed investments benefit human developers equally; the business case for CI infrastructure investment is stronger when it serves both human and agent workflows
Reveals test suite health problems - slow CI latency often traces to specific slow tests or test dependencies; tracking latency by test suite segment identifies which tests are the bottleneck and should be candidates for optimization or parallelization

Getting Started

Instrument CI run duration for agent-triggered runs - Most CI platforms provide timestamps for each workflow run (queued, started, completed). Track the time from commit push to first CI result for runs triggered by agent commits (identifiable by commit author tag). Store this as a time series.
Compute the distribution, not just the average - CI latency is a right-skewed distribution: most runs complete quickly, but a tail of slow runs dominates the developer experience. Track p50 (median), p90, and p99 latency separately. The p90 is where most agent pain lives.
Segment by change type - CI latency varies by what the commit touches. Changes to core libraries may trigger full test suites. Changes to isolated modules may trigger partial suites. Segment latency by "change type" or "affected test scope" to identify which types of changes are slowest.
Identify the slow tail - What characterizes the slowest 10% of CI runs? Is it a specific set of integration tests? Flaky tests that require retries? Resource contention from concurrent runs? Run an analysis to identify the top 3 causes of slow CI runs on agent-triggered commits.
Track CI queue wait time separately - CI latency has two components: time waiting in the queue (no runners available) and time running. Queue wait time is an infrastructure capacity problem; run time is a test efficiency problem. Track both separately and address them with different interventions.
Set a monthly latency reduction target - If current p90 CI latency is 18 minutes, set a target of 12 minutes for next month. This forces the team to make a concrete improvement: add runners, parallelize tests, add caching. Track progress weekly.

Tip

For agent workflows specifically, consider a two-tier CI strategy: a "fast path" suite that runs in under 2 minutes (unit tests, linting, type checking) that runs on every agent commit, and a "full path" suite that runs on PR merge. Agents get near-instant feedback from the fast path, which dramatically reduces ITS on most task types. The full path runs after the agent is done, not between every iteration.

6 steps to get from here to the next level

Common Pitfalls

Measuring average CI latency and missing the tail. A p50 of 4 minutes looks great. A p99 of 45 minutes is catastrophic for agent workflows. Always track the tail. The average is distorted by fast runs and hides the worst agent experiences.

Conflating CI latency with CI reliability. A slow CI run that always produces correct results is a different problem from a fast CI run that's flaky. Latency tracking tells you how fast CI is. TORS tells you how reliable it is. Fix reliability before optimizing latency - reducing the time to a wrong answer doesn't help.

Optimizing CI speed by removing important tests. The tempting but wrong path to faster CI is deleting slow tests. The correct path is parallelization, caching, and incremental test selection. Removing tests may reduce latency but increases defect rates. Measure defect rate alongside CI latency to ensure that speed improvements don't come at a quality cost.

Not accounting for CI queue wait time. A team might invest heavily in test parallelization and get individual run duration from 15 minutes to 4 minutes - but if agents are waiting 10 minutes in the queue before the run starts, the effective latency improvement is minimal. Track queue wait time explicitly and add runner capacity if queue wait time is significant.

Not treating CI latency as an agent-specific problem. Human developers can tolerate 10-minute CI. Agents cannot. If your platform team is prioritizing CI improvements based on human developer feedback alone, agent workflow latency will be underweighted. Explicitly represent agent workflows in CI improvement prioritization - show the math: 10-minute CI at ITS of 3 = 30 minutes per agent PR; 3-minute CI = 9 minutes per agent PR. The throughput impact is compelling.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob is seeing that his team's agent workflows are slower than expected. Individual agents take 30-45 minutes to produce a PR ready for review, when he expected 10-15 minutes. The agents are running but the throughput isn't materializing.

What Bob should do: Bob should pull CI latency data for agent-triggered runs over the last two weeks. He'll likely find that CI runs are averaging 15-20 minutes, which means at ITS of 2-3, agents are spending 30-60 minutes just waiting for CI feedback. Bob should put CI latency on the engineering platform team's Q3 roadmap as a top-3 priority. The specific ask: reduce p90 CI latency for agent-triggered runs from 20 minutes to 5 minutes through a combination of parallel test execution and a fast-path test suite. Bob should connect the investment to the productivity metric: "Every minute we cut from CI latency increases agent throughput by 3-5%." The math makes the infrastructure investment easy to justify.

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah is investigating why some developers' agents are more productive than others. She's collected ITS data and notices that the developers with high ITS tend to work in a specific area of the codebase. She suspects CI latency might be part of the story.

What Sarah should do: Sarah should join ITS data with CI latency data segmented by codebase area (inferred from which files the PR touches). Her hypothesis is that the high-ITS developers are working in an area with slow CI runs - which means their agents iterate more times but each iteration takes longer, compounding the problem. If the hypothesis is confirmed, the fix is targeted: optimize the test suite for that specific codebase area, not the whole suite. This kind of targeted intervention is only possible because Sarah tracked both ITS and CI latency. The data points to the specific problem rather than leaving engineering to optimize the entire CI pipeline blindly.

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor has a 2-minute CI fast path for his agent workflows and runs full CI only on PR merge. His agents get nearly instant feedback and consistently achieve ITS of 1-2 on well-specified tasks. He considers fast CI as important as good context for agent productivity.

What Victor should do: Victor should document his two-tier CI strategy as a reusable template for the team's agent workflow configuration. The fast path is not complex to implement - it's a separate GitHub Actions workflow that runs the unit tests and linting in parallel with no integration tests. Victor should share the exact workflow configuration file and the criteria he uses to decide which tests belong in the fast path vs. the full path. He should also quantify the impact: "My agents produce PRs in an average of 8 minutes. Before the two-tier CI setup, it was 32 minutes." Specific numbers attached to a specific technical decision is the advocacy that moves platform teams to prioritize infrastructure improvements.

What Victor should do - role-specific action plan