Maturity Matrix

CPI (Cost-per-Iteration): target < $0.50

Cost-per-Iteration (CPI) measures what a single agent CI attempt costs - in model API costs, CI compute, and related infrastructure.

  • ·ITS (Iterations-to-Success) is tracked with a target of 1-3
  • ·CPI (Cost-per-Iteration) is tracked with a target under $0.50
  • ·CI feedback latency is tracked as a metric (time from push to CI result)
  • ·Metrics are broken down per team and per repository
  • ·Cost tracking includes model API costs, CI compute costs, and runner costs per iteration

Evidence

  • ·ITS dashboard showing iteration count distribution per PR
  • ·CPI dashboard showing cost per CI iteration
  • ·CI feedback latency chart with P50, P95, P99 breakdowns

May 2026 Update

Cost telemetry stopped being optional. ccusage (13.2k GitHub stars, ccusage.com) prints token spend per Claude Code session from local JSONL files - cache-aware, offline-capable, MCP-integrated. Claude-Code-Usage-Monitor adds live charts and "when will I hit my limit" predictions. Both /usage and /context shipped as built-in commands. Reddit documented multiple "agentic fork bomb" incidents - including a $3,800 overnight bill - that turned per-session spend caps into baseline governance.

The bigger picture: Pawel Dolega's AI subscriptions are on borrowed time (Apr 26) makes the structural case that CPI thresholds will move. A $20 Pro plan currently burns $50-100 of compute; total enterprise LLM spend doubled in six months despite per-token prices falling (Jevons paradox); Anthropic pulled Claude Code from Pro and GitHub paused Copilot signups. Track CPI now, set per-session caps now, and assume re-pricing within 12 months.

What It Is

Cost-per-Iteration (CPI) measures what a single agent CI attempt costs - in model API costs, CI compute, and related infrastructure. When an agent submits a commit and CI runs, that's one iteration. The cost of that iteration includes: the token cost of the agent's reasoning and code generation to produce the commit, plus the CI runner cost for executing the test suite. The target at L3 is below $0.50 per iteration.

The $0.50 target is not arbitrary. It's derived from the math of agent economics: at $0.50/iteration and an ITS (Iterations-to-Success) target of 1-3, the total agent cost per PR is $0.50-$1.50. A typical engineer-hour of review and direction costs $50-100 (fully loaded). An agent that costs $1.50 to produce a PR and needs 15 minutes of human review is delivering enormous leverage. But if CPI is $3-5 per iteration and ITS is 5-8, a single PR can cost $15-40 in raw compute - approaching the cost of the human time it was meant to save.

CPI has two components that must be optimized separately. The AI token cost depends on model choice, context window size, and output length. Claude Sonnet is significantly cheaper per token than Opus; using the right model for the task type can cut AI costs by 5-10x without sacrificing quality for well-specified tasks. The CI compute cost depends on CI pipeline efficiency, test suite duration, and runner instance type. A test suite that takes 20 minutes to run on a large instance costs dramatically more per iteration than a 2-minute suite on a standard runner.

At L3, teams that track CPI for the first time frequently discover that 20% of their agent tasks are consuming 80% of their iteration costs. These high-cost tasks typically share two characteristics: high ITS (many failed iterations before success) and large context windows (agents consuming excessive tokens trying to understand complex requirements). Fixing these high-cost outliers - through better context management and task specification - produces dramatic improvements in overall CPI.

Why It Matters

  • Prevents agent cost from becoming a budget blocker - without CPI tracking, agent costs grow silently as the team adds more agents and more tasks; the first monthly cloud bill that shocks leadership can cause an overcorrection that cuts the entire agent program
  • Creates incentive for CI speed investment - CI pipeline cost is directly visible in CPI; teams that track CPI have a clear financial argument for CI infrastructure investment: "cutting CI from 20 minutes to 5 minutes saves $0.30/iteration and at 500 iterations/week, that's $6K/month"
  • Drives model right-sizing - not every task needs the most capable (and expensive) model; CPI tracking reveals that many tasks can use a cheaper model without quality loss, creating a data-driven argument for model tiering
  • Enables per-task-type cost optimization - some task types (test generation) have inherently low CPI; others (complex feature implementation) have inherently high CPI; knowing this allows teams to structure agent workflows to maximize value per dollar
  • Provides the unit economics for scaling - before expanding agent usage from 10 to 100 concurrent agents, you need to know what that costs; CPI * ITS * weekly PR volume gives you the monthly cost projection for any scale level

Getting Started

6 steps to get from here to the next level

Common Pitfalls

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

B
BobHead of Engineering

Bob approved an expansion of the agent program last quarter and is now seeing unexpectedly high cloud bills. The AI API costs are three times what he projected. He doesn't know which agents are consuming the budget or why.

What Bob should do - role-specific action plan

S
SarahProductivity Lead

Sarah is building the quarterly AI productivity report and wants to include unit economics alongside throughput metrics. She wants to show: "Here is the cost of producing each agent PR and here is the value it delivers."

What Sarah should do - role-specific action plan

V
VictorStaff Engineer - AI Champion

Victor tracks CPI for all his agent workflows and has achieved sub-$0.30 CPI through a combination of model tiering (Haiku for simple tasks, Sonnet for complex tasks, Opus reserved for architectural reasoning), optimized context windows (only the most relevant files included, not the whole codebase), and a fast CI pipeline (2-minute test runs via incremental test selection).

What Victor should do - role-specific action plan