DORA metrics (if at all)

At L1 (Ad-hoc), most engineering teams track DORA metrics inconsistently or not at all.

·Delivery is tracked with at least basic metrics
·Standard delivery metrics are in place (AI-specific metrics come later)

·Team acknowledges the need for AI-specific metrics beyond traditional DORA
·Basic deployment frequency is at least known (even if not dashboarded)

Evidence

·Absence of metrics dashboard or inconsistent/manual tracking
·No AI-specific fields in existing metrics systems

What It Is

At L1 (Ad-hoc), most engineering teams track DORA metrics inconsistently or not at all. DORA - the four key metrics from the DevOps Research and Assessment program - are deployment frequency, lead time for changes, change failure rate, and mean time to restore. These metrics were designed to measure software delivery performance for traditional human-driven development. Teams at L1 may have heard of them, may have a dashboard that pulls some of the data, but the metrics are rarely reviewed in retrospectives, rarely acted on, and often incomplete because the underlying tooling isn't set up to capture them reliably.

The "if at all" qualifier is honest. Many engineering teams at this level are in one of two positions: either they've never implemented DORA tracking because they're focused on feature delivery and haven't had the bandwidth for measurement infrastructure, or they have a partial DORA dashboard that was set up by a past initiative and is now ignored. Both situations share the same consequence - no reliable signal about whether the team is delivering software effectively.

DORA metrics were conceived before AI-assisted development existed. They measure the output of human developers working in a traditional code-review-and-merge cycle. They remain useful at L2 and L3 as a baseline, but at L1 they're a symptom of a broader problem: the team doesn't yet have a culture of measurement. Without measurement, there's no feedback loop for improvement, and without a feedback loop, AI adoption will be unguided and its impact invisible.

The goal at L1 is not to achieve perfect DORA tracking - that's a distraction. The goal is to recognize that the absence of measurement is itself a significant problem, and to take the first step toward establishing any consistent signal about delivery performance. Even imperfect DORA data is vastly better than no data.

Why It Matters

No measurement means no improvement signal - teams that don't track deployment frequency don't know if they're deploying more or less than they were six months ago; AI adoption decisions are made on gut feel rather than evidence
DORA baselines enable before/after comparisons - when AI tools are introduced, teams with existing DORA data can immediately measure impact; teams without baselines can never prove (or disprove) that the investment paid off
Deployment frequency predicts everything else - teams that deploy frequently have shorter lead times, lower failure rates, and faster recovery; DORA is a proxy for overall development health, not just a CI/CD metric
Leadership will ask for ROI - at some point, the CTO or CFO will ask what the AI tooling investment produced; teams with DORA baselines have an answer; teams without them have nothing to show
Measurement creates accountability - tracking lead time makes slow review cycles visible; tracking change failure rate makes test quality visible; measurement surfaces problems that were previously invisible and therefore unaddressed

Getting Started

Audit what data you currently have - Before building new instrumentation, check what your existing tools already capture. GitHub, GitLab, and Jira all have APIs that can surface deployment frequency and lead time data. You may have more data than you think.
Define "deployment" precisely - The most common DORA measurement mistake is inconsistent definitions. Decide: does a deployment mean merging to main, deploying to staging, or deploying to production? Document the definition and apply it consistently everywhere.
Start with deployment frequency only - Resist the urge to instrument all four DORA metrics simultaneously. Deployment frequency is the easiest to measure accurately and the most predictive of overall delivery health. Get it working and reviewed weekly before adding the others.
Put the metric on a visible dashboard - A metric that nobody looks at is not a metric. Add deployment frequency to your team's engineering dashboard, your weekly standup, or your sprint review. Visibility creates the social pressure that drives behavior change.
Set a realistic baseline target - Don't compare your L1 team to elite performers deploying multiple times per day. Set a 90-day target that represents meaningful improvement over your current state. "Increase from 2 deployments per month to 8" is a concrete, achievable goal.
Schedule a monthly DORA review - Establish a recurring meeting - even 30 minutes - where the team looks at the DORA numbers together. What moved? What didn't? What's blocking improvement? The meeting creates the accountability loop that makes measurement meaningful.

Tip

Linear, Jira, and most project management tools already track cycle time implicitly. The first DORA metric you can often measure without any new tooling is "ticket created to merged PR" as a proxy for lead time. Start there before investing in dedicated DORA tooling.

6 steps to get from here to the next level

Common Pitfalls

Treating DORA as a compliance checkbox. Teams often set up a DORA dashboard because an initiative or a manager asked for it, then never look at it again. A metric that exists but is never reviewed is worse than no metric - it creates the illusion of measurement without the reality. DORA only works if someone is accountable for acting on what the data shows.

Measuring the wrong thing. A common mistake is measuring "time a PR is open" instead of "lead time from code committed to running in production." The DORA definition is precise: lead time measures the full cycle from commit to production deploy. Shortcuts produce misleading numbers that look better than reality.

Waiting until the data is perfect. At L1, the data will never be perfect. Deployment events will be missing, definitions will be inconsistent across teams, and historical data will have gaps. Start publishing imperfect numbers, explain the caveats, and improve the instrumentation over time. Waiting for perfect data means waiting forever.

Using DORA metrics to compare teams. DORA metrics are designed to track a team's improvement over time, not to compare Team A against Team B. Using them for inter-team comparison creates perverse incentives - teams optimize for the metric rather than for delivery quality. Keep DORA as an internal improvement tool, not a ranking system.

Not connecting DORA to the AI investment. If your team is using AI tools and you're tracking DORA, the two need to be connected. When deployment frequency improves, can you attribute any of that to AI-assisted development? When lead time decreases, is that partly because agents are writing first drafts faster? Track AI tool adoption alongside DORA so you can build the causal story over time.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob leads a 40-person engineering organization where some teams have informal DORA tracking and most don't. He's invested in GitHub Copilot licenses and is getting pressure from leadership to show impact. When he asks his engineering managers what the team's lead time is, he gets different answers from every manager - because they're all measuring different things.

What Bob should do: Bob needs to standardize before he can measure impact. The first step is a cross-team alignment on definitions: what counts as a deployment, how is lead time calculated, what's the definition of a change failure. This is a 2-hour workshop, not a month-long project. After alignment, Bob should designate one engineer or engineering manager as responsible for DORA instrumentation - not a committee, a single owner. The goal for Q1 is a single dashboard that shows deployment frequency for all teams with a consistent definition. In Q2, Bob adds lead time. By Q3, he has a baseline he can use to measure the AI investment impact. The ROI conversation with leadership becomes: "Here's where we were before AI tooling, here's where we are now."

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah has been asked to evaluate whether the team's AI tool investments are working. She pulls the data she can find - PR counts, commit frequency, some velocity metrics from Jira - but nothing is connected to outcomes. She can tell that some developers are using AI tools more than others, but she can't tell whether that usage is translating into faster delivery.

What Sarah should do: Sarah should recognize that the measurement infrastructure needs to come before the impact measurement. Without DORA baselines, any AI impact analysis is speculative. Sarah should propose a 60-day measurement sprint: instrument deployment frequency and lead time for all teams, establish baselines, then overlay AI tool usage data to see if there's a correlation. The correlation won't be causal proof, but it will be the first data-driven signal about whether AI tool adoption is associated with improved delivery performance. That signal is enough to justify continued investment and more rigorous measurement at L2.

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor is the team's most advanced AI user and is frustrated that he can't make the case for the AI patterns he's pioneered. He can feel the productivity improvement in his own work - he ships more and context-switches less - but he has no data to back up the claim when he advocates for team-wide adoption.

What Victor should do: Victor should instrument his own DORA metrics as a case study. Track his personal deployment frequency, lead time from branch creation to production, and PR cycle time for three months before and after adopting parallel agents and CLI-first workflows. Even a single-person case study with clean data is more persuasive than a team-wide claim with no data. Victor can present this at the next engineering all-hands: "Here is my delivery performance before and after. Here is what changed in how I work. Here is what would need to be true for the whole team to see similar results." A concrete case study with numbers is the strongest possible argument for the AI investment.

What Victor should do - role-specific action plan