"How much did we save?" - silence

"How much did we save with AI?" is the question every engineering leader eventually faces from finance, from the CTO, or from the board.

·Delivery is tracked with at least basic metrics
·Standard delivery metrics are in place (AI-specific metrics come later)

·Team acknowledges the need for AI-specific metrics beyond traditional DORA
·Basic deployment frequency is at least known (even if not dashboarded)

Evidence

·Absence of metrics dashboard or inconsistent/manual tracking
·No AI-specific fields in existing metrics systems

What It Is

"How much did we save with AI?" is the question every engineering leader eventually faces from finance, from the CTO, or from the board. At L1, the answer is silence. Not "we saved $X" and not "we don't know yet" - just an uncomfortable pause followed by anecdotes, vague claims about "developer happiness," and a promise to get better data. The silence is the defining symptom of L1 metrics: the investment has been made, the tools have been deployed, but no one built the measurement infrastructure to answer the ROI question.

The silence has predictable consequences. Without ROI data, AI tool budgets become targets during cost-cutting cycles. Without ROI data, decisions about which AI tools to invest in are made on vendor demos and developer preference rather than evidence. Without ROI data, engineering leaders cannot make the case for expanding AI usage to skeptical business stakeholders. The silence is not just an embarrassment - it's a strategic liability.

The root cause of the silence is not malice or negligence. It's a sequencing error. Teams adopt AI tools because they seem promising and developers want them. They focus on adoption: getting tools installed, getting developers trained, getting the workflows working. Measurement is deferred until "after we're settled in." But "after we're settled in" never creates a natural moment to build measurement infrastructure, so it never happens. The deferral becomes permanent, and the team is left trying to reconstruct impact from incomplete data months or years later.

Understanding what "we saved" actually means is also harder than it sounds. Saved compared to what? Compared to hiring two more developers? Compared to the velocity you would have had without AI tools? Compared to a competitor who didn't adopt AI? The ROI question requires a counterfactual, and without baseline data, the counterfactual is unanswerable. This is why the measurement infrastructure must be built before or at the moment of AI tool adoption, not after.

Why It Matters

Silence kills the budget - when finance asks for AI ROI and gets silence, the default assumption is "we don't know if this is working" - which justifies cutting the budget; teams with data protect their tools, teams without data lose them
Silence prevents scaling - you can't make the case for expanding AI usage from 10 developers to 100 if you can't show what the 10 developers gained; measurement is the prerequisite for organizational scaling
Silence erodes leadership credibility - engineering leaders who champion AI investment and can't produce ROI data look like they made decisions on hype; one bad budget cycle can set AI adoption back a year
The window for baseline data closes - every month without measurement is a month of baseline data that's permanently lost; when you eventually instrument AI impact, you can never fully reconstruct the pre-AI comparison
Silence is answered by alternative metrics - in the absence of real ROI data, organizations default to proxy metrics like PR count or lines of code; these metrics are gameable and misleading, and they incentivize the wrong behaviors

Getting Started

Identify the ROI question you'll be asked - Before building measurement infrastructure, understand what question leadership will ask. Is it "how much faster are developers shipping?" (throughput ROI), "how much less are we spending on headcount to achieve the same output?" (cost ROI), or "how much faster are we moving compared to competitors?" (competitive ROI). Different questions require different metrics.
Establish pre-AI baselines immediately - If you haven't done this yet, do it now. Pull historical data from your version control system: average PRs per developer per week over the last 6 months, average time from PR creation to merge, average time from code commit to production deploy. These are your counterfactual benchmarks.
Create a simple ROI calculation template - Build a one-page calculation: monthly AI tooling cost (licenses + infrastructure) vs. estimated value of throughput improvement. Even a rough calculation based on hourly developer rates and PR throughput improvement is better than silence. Be explicit about assumptions.
Track AI tool costs with the same rigor as engineering headcount costs - Add AI tool spend to the engineering cost model. Break it down by team and by tool. Knowing the cost side of the equation precisely makes the ROI calculation cleaner.
Run a before/after productivity study for a pilot cohort - Select 5-10 developers, measure their PR throughput and cycle time for 4 weeks before AI tool adoption, then for 4 weeks after. This is the cleanest possible ROI signal. The results won't generalize perfectly but they provide the first real data point.
Build a quarterly AI ROI report - Create a standing quarterly report that shows: AI tool cost, active usage rate, throughput metrics (before vs. current), and a rough ROI estimate. Present it to engineering leadership. The discipline of producing the report forces the measurement work.

Tip

A simple ROI calculation: if AI tools increase a developer's effective output by 20%, and you have 50 developers at $200K fully-loaded cost, that's $2M in value from throughput gain. Even at 10% improvement, AI tool costs typically represent a fraction of that value. The math almost always works - the problem is that you need measurement to know what percentage improvement you're actually achieving.

6 steps to get from here to the next level

Common Pitfalls

Trying to calculate ROI with lines of code. Lines of code written is not a productivity metric - it's an activity metric that AI tools will inflate dramatically (agents write verbosely). ROI calculations based on lines of code will show huge "gains" that don't correspond to any real business value. Use outcome metrics: features shipped, bugs fixed, PR cycle time, deployment frequency.

Relying entirely on developer self-report. Asking developers "how much faster do you work with AI?" produces overestimates from enthusiasts and underestimates from skeptics. Use self-report as qualitative color, but pair it with behavioral data (actual PR throughput, actual cycle time) to get an honest signal.

Waiting for a perfect attribution model. Teams sometimes delay ROI reporting because they can't cleanly attribute productivity improvements to AI tools vs. other factors (better CI, reduced tech debt, new team members). Perfect attribution is impossible. Report with caveats: "We believe AI tools contributed to this improvement, though other factors also played a role." Honest uncertainty is better than silence.

Conflating savings with headcount reduction. AI ROI is most honestly framed as throughput improvement, not headcount reduction. "We can now do the work of 60 developers with 50" is a politically fraught framing that terrifies engineers. "Our 50 developers can now produce 20% more features in the same time" is accurate, value-positive, and doesn't threaten anyone's job. Frame AI ROI as capacity expansion, not efficiency reduction.

Not connecting ROI to business outcomes. Throughput metrics are better than activity metrics, but business outcomes are better than throughput metrics. "We shipped the payment feature 3 weeks faster than projected" is more compelling to finance than "our PR throughput increased 25%." Connect the throughput improvement to specific business outcomes - faster time to market, faster bug resolution, more features per quarter.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob is in a quarterly business review when the CFO asks: "We're spending $400K a year on AI coding tools. What's the return?" Bob knows the tools are valuable - his developers love them - but he has no data to cite. He mentions that developers seem more productive, that he's heard positive feedback, and that he'll get better data for the next review. The CFO makes a note.

What Bob should do: Bob has 90 days until the next quarterly review. He needs to produce a credible ROI estimate, not a perfect one. The fastest path: pull PR throughput data for the past year, identify the inflection point where AI tool adoption ramped up (when did 50%+ of developers become active users?), and compare pre-adoption and post-adoption throughput. Even with noise in the data, this analysis will show whether throughput increased after AI adoption. Pair it with a cost calculation: what would it have cost to achieve the same throughput increase through hiring? That gap is the conservative ROI estimate. Bob should present this at the next quarterly review with honest caveats and a commitment to better instrumentation going forward.

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah has been asked to produce a report on AI tool ROI for the annual engineering review. She starts pulling data and realizes she has almost nothing: some license invoices, some developer survey responses from 8 months ago, and GitHub commit history. There's no systematic measurement of AI impact anywhere.

What Sarah should do: Sarah should produce a retrospective ROI estimate using the data that exists, while simultaneously proposing the measurement infrastructure that would make future estimates more precise. For the retrospective estimate: use GitHub commit history to approximate when individual developers started using AI tools heavily (commit patterns change with AI usage), then compare their PR throughput in the 90 days before and after. This is imperfect but it's data. For the forward-looking proposal: Sarah should outline a minimal measurement stack (usage tracking, PR labeling, quarterly throughput reports) with a cost estimate and a timeline. The report says: "Here's the best estimate we can produce with what we have. Here's what we need to build to answer this question properly."

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor wants to advocate for expanding the team's AI investment - more agent workflows, dedicated CI infrastructure, team-wide MCP server setup. But every time he makes the case, leadership asks "but what's the ROI on what we've already invested?" and he doesn't have a clean answer.

What Victor should do: Victor should build the ROI case from his own data. He should pull his personal GitHub commit history and PR data for the 12 months before and 12 months after adopting CLI agents and parallel workflows. He should calculate his personal throughput increase and extrapolate: if 10 developers adopted similar workflows, what would the team-level impact be? Victor should present this at the next engineering all-hands as a case study with specific numbers. The advocacy is more powerful when it comes with data. And once Victor has built the personal ROI calculation framework, he can help the team instrument the same calculation at scale - turning his personal case study into the team's measurement infrastructure.

What Victor should do - role-specific action plan