No AI-specific metrics
At L1, engineering teams that have adopted AI tools - GitHub Copilot, Cursor, Claude Code - are tracking those tools with zero AI-specific metrics.
- ·DORA metrics (deployment frequency, lead time, change failure rate, MTTR) are not tracked, or tracked inconsistently
- ·No AI-specific metrics exist
- ·Team acknowledges the need for AI-specific metrics beyond traditional DORA
- ·Basic deployment frequency is at least known (even if not dashboarded)
Evidence
- ·Absence of metrics dashboard or inconsistent/manual tracking
- ·No AI-specific fields in existing metrics systems
What It Is
At L1, engineering teams that have adopted AI tools - GitHub Copilot, Cursor, Claude Code - are tracking those tools with zero AI-specific metrics. They know how many licenses they've purchased. They do not know how many are actively used, what percentage of production code was AI-assisted, whether AI-generated code has a higher or lower defect rate than human-written code, or whether AI adoption has changed developer throughput in any measurable way. The AI tools are running, but there's no instrumentation to see what they're doing.
This is the normal starting state. Teams adopt AI tools because developers request them or because leadership mandates them, and the focus is on adoption and tooling setup. Measurement is treated as something to figure out later, once the tools are embedded in the workflow. The problem is that "later" often never arrives, and the team ends up 12 months into AI tool usage with no ability to answer the question every engineering leader eventually faces: "What did we get for this investment?"
The absence of AI-specific metrics is different from the absence of DORA metrics. DORA metrics measure delivery performance, which existed before AI. AI-specific metrics measure a new phenomenon: the degree to which AI is changing how code is produced. They answer different questions. How many iterations does it take an agent to produce a passing CI result? What percentage of merged PRs were primarily AI-authored? How does the defect rate of AI-generated code compare to human-generated code in the same codebase? These questions have no analog in traditional software metrics, and they require instrumentation that most teams at L1 haven't built.
The gap between "AI tools installed" and "AI impact measured" is where most organizations live. It's a dangerous place because decisions about AI investment are being made on impressions and anecdotes rather than data. Some developers love the tools and exaggerate their productivity gains. Others are skeptical and underreport their usage. Without AI-specific metrics, there's no way to cut through the noise and understand what's actually happening.
Why It Matters
- Invisible AI usage means invisible impact - if you don't know which PRs were AI-assisted and which were human-written, you can't compare defect rates, review time, or test coverage between the two cohorts; the signal is permanently lost
- License spend without usage data is waste - at L1, teams often discover that 30-40% of Copilot licenses are paid for but never actively used; without usage metrics, this waste is invisible until someone does a license audit
- AI tool selection requires evidence - deciding whether to use Claude Code vs Cursor vs Copilot should be data-driven; teams without AI-specific metrics make these decisions based on demos and developer preferences rather than actual productivity impact in their specific codebase
- Regulatory and compliance risk - as AI-generated code becomes subject to audit requirements (EU AI Act, SOC2 extensions, etc.), teams that didn't track AI usage have no way to produce the records they need; the measurement gap becomes a compliance gap
- Cannot learn what works - some AI usage patterns produce much better outcomes than others; without metrics, there's no way to identify which patterns are working and systematize them across the team
Getting Started
6 steps to get from here to the next level
Common Pitfalls
Mistakes teams actually make at this stage - and how to avoid them
How Different Roles See It
Bob has approved AI tool purchases for his team over the past year and is now being asked to present the ROI to the CTO. He pulls together license costs, queries a few developers about their experience, and realizes he has no consistent data to show. Some developers love the tools, some barely use them, and he has no way to quantify the impact on delivery.
What Bob should do - role-specific action plan
Sarah is tasked with improving developer productivity and suspects that AI tools are having an impact - she just can't prove it. She's heard anecdotes from some developers about being dramatically more productive and from others about the tools being distracting. She needs data to separate signal from noise.
What Sarah should do - role-specific action plan
Victor runs sophisticated agent workflows and is convinced the productivity gains are real. But when he advocates for deeper AI investment in architecture reviews, people ask for data he doesn't have. His personal experience is compelling but not generalizable.
What Victor should do - role-specific action plan
Further Reading
4 resources worth reading - hand-picked, not scraped