PR throughput per dev

PR throughput per developer is the first meaningful per-developer productivity metric for AI-assisted development: how many pull requests does a developer merge per week, on averag

·DORA metrics are tracked consistently with a dashboard
·AI tool license count vs. active usage rate is measured
·PR throughput per developer is tracked

·AI acceptance rate (% of AI suggestions accepted) is measured per tool
·Metrics are reviewed in team retrospectives at least monthly

Evidence

·DORA metrics dashboard with current data
·License utilization report (licenses purchased vs. active users)
·PR throughput chart showing per-developer breakdown

What It Is

PR throughput per developer is the first meaningful per-developer productivity metric for AI-assisted development: how many pull requests does a developer merge per week, on average? At L2 (Guided), this becomes the primary signal for tracking whether AI tools are changing developer output. It's not a perfect metric - PRs vary enormously in size and complexity - but it's the most directly measurable output signal available without sophisticated instrumentation.

Before AI tools, typical developer throughput in a well-functioning team is 3-6 PRs per week, depending on the codebase, review culture, and task complexity. With active AI tool usage at L2, teams consistently see this number move to 6-10 PRs per week for high-usage developers. At L3 and L4 with agent-heavy workflows, throughput can reach 15-30 PRs per week per developer. These numbers aren't universal - they depend heavily on PR size convention, codebase maturity, and CI speed - but the direction and magnitude of the shift is consistent across teams that have tracked it carefully.

PR throughput is the gateway metric to a richer productivity picture. By itself, it tells you output rate but not output quality. A developer running agents that produce 20 PRs per week of mediocre, heavily-edited code might have lower net output than a developer producing 8 carefully crafted PRs that merge cleanly. This is why PR throughput is always paired with PR cycle time (how long from PR open to merge?), review burden (how much effort do reviewers spend on each PR?), and defect rate (how often does the code break in production?). Together, these four metrics give a much more complete picture.

The reason PR throughput is the right starting metric at L2 is that it requires no new instrumentation. GitHub, GitLab, and Bitbucket all expose this data directly. A simple query or dashboard plugin will give you per-developer weekly PR counts going back as far as your git history. This makes it possible to compute a pre-AI baseline from historical data even if you didn't instrument anything before AI adoption began.

Why It Matters

Directly observable - PR throughput requires zero new tooling to track; it's the most accessible productivity signal available and can be computed retroactively from git history
Reveals the AI impact immediately - teams that add AI tools and then measure PR throughput almost always see an inflection point at the month when adoption ramped up; this is the clearest data point available for AI ROI conversations
Per-developer granularity - reporting average throughput across the team hides enormous variance; per-developer tracking reveals who has adopted effective AI workflows (high throughput) and who hasn't (unchanged throughput), enabling targeted intervention
Creates healthy benchmarks - once the team knows the distribution of PR throughput, the benchmark becomes the high-performer's output; developers can see that 10 PRs/week is achievable for their codebase and are motivated to find the AI workflows that get them there
Leading indicator for capacity planning - if developers are producing PRs faster, the bottleneck moves to review, to CI, or to planning; tracking PR throughput early reveals where the next constraint is before it becomes a crisis

Getting Started

Pull historical PR throughput data - Query your version control API for the last 6-12 months of merged PRs, grouped by author and week. This gives you the pre-AI baseline. In GitHub, this is a simple GraphQL query against the repository PRs; LinearB, Jellyfish, and most engineering analytics platforms provide this out of the box.
Normalize for PR size - Raw PR count is a noisy metric if PR size varies widely. Add a second dimension: PR size in lines changed. Track both "PRs per week" and "PR-weeks normalized for size" (e.g., PRs per week weighted by the inverse of lines changed). The size-normalized metric is more stable and more comparable across developers working on different areas of the codebase.
Identify the pre-AI baseline - Find the date when each developer's AI tool usage began and compute their average throughput in the 8 weeks before. This is their personal baseline. Post-adoption throughput is measured against this baseline, not against team averages.
Build a distribution view, not just averages - Don't report average PR throughput; report the distribution. Show the 25th, 50th, and 75th percentile. The distribution reveals the spread: some developers have dramatically higher throughput, others haven't moved. The gap between percentiles is where the adoption story lives.
Track PR cycle time alongside throughput - A developer who pushes 20 PRs per week that each wait 3 days in review is not actually faster than a developer who pushes 8 PRs that merge in 4 hours. Pair throughput with cycle time (PR creation to merge) to get a more accurate productivity picture.
Set a team throughput target for the quarter - "Increase median PR throughput from 5/week to 8/week by end of Q2" is a concrete, trackable goal. Assign this goal to the team's productivity or engineering manager. Track progress monthly. Celebrate when the team hits the target - it's a real, measurable productivity improvement.

Tip

GitHub's GraphQL API lets you pull merged PR data with dates and author information with a single query. If you're not a data engineer, ask an AI agent to write the query for you - it's a straightforward task that produces the exact data you need. Prompt: "Write a GitHub GraphQL query to fetch all merged PRs for repository X in the last 6 months, including author login, merged date, and additions/deletions."

6 steps to get from here to the next level

Common Pitfalls

Treating PR throughput as an individual performance metric. Displaying per-developer PR throughput on a leaderboard creates perverse incentives: developers will split large PRs into many small ones, merge half-finished work, and focus on maximizing PR count rather than shipping valuable features. Use throughput for team-level analysis and improvement identification, not individual performance evaluation.

Comparing developers on different work types. A developer maintaining a large legacy codebase with complex dependencies will have lower PR throughput than a developer working on a greenfield service, regardless of AI tool usage. Throughput comparisons are only valid within cohorts doing similar work. Cross-cohort comparisons are misleading and demotivating.

Not accounting for PR size conventions. Some teams have a culture of small, atomic PRs (5-50 lines each). Others merge large feature PRs (500-2000 lines). PR throughput numbers mean completely different things in these two cultures. If your team doesn't have a consistent PR size convention, establish one before using throughput as a benchmark.

Ignoring the review burden side. Agents that produce more PRs increase the review burden on human reviewers. If agent-authored PR throughput doubles but review cycle time also doubles (because reviewers are overwhelmed), net delivery speed hasn't improved. Track PR cycle time and review queue depth alongside throughput to see if the review process is keeping up.

Using throughput without outcome correlation. A team that ships 30 PRs per week but sees an increase in production incidents has higher throughput with worse outcomes. Throughput is a means, not an end. Always correlate PR throughput with outcome metrics: production defect rate, customer-reported bugs, and deployment success rate. If throughput goes up but quality goes down, something in the agent workflow needs fixing.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob's team has been using AI tools for six months. He wants to show the CTO that the tools are producing results, but his engineering managers are giving him conflicting reports about productivity impact. Some say the team is shipping faster, others say they're not sure.

What Bob should do: Bob should pull the PR throughput data himself. Query GitHub for the last 12 months of merged PRs, compute per-developer weekly averages, and find the inflection point at the month when the majority of the team adopted AI tools. If there's a visible increase in median throughput at that inflection point - even a 20-30% increase - that's the ROI data point Bob needs. He should present this analysis in the next engineering leadership review with the caveat that it's observational but directionally significant. Bob should also ask his engineering managers: "For the developers whose throughput hasn't increased, what's different about their setup or workflow?" The non-improvers are the adoption gap that needs attention.

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah has been asked to design a developer productivity scorecard. She's been given several options: PR throughput, lines of code, story points, commit frequency. She needs to pick the right metrics for a fair and useful scorecard.

What Sarah should do: Sarah should build a productivity index that weights PR throughput (40%), PR cycle time (30%), and a qualitative self-assessment score from monthly developer surveys (30%). The three-metric index avoids the pitfalls of relying on a single number: throughput alone incentivizes gaming, cycle time alone penalizes developers working on hard problems, and qualitative alone is too subjective. Sarah should also add a quarterly calibration step where the team reviews whether the index is capturing the right behaviors. If developers are optimizing for the metric rather than for actual productivity, the index needs adjustment. The goal is a scorecard that helps developers see where they're working well and where they have room to improve, not a leaderboard.

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor's personal PR throughput has tripled since he adopted parallel agent workflows. He produces 20-25 PRs per week, up from 7-8. He wants to help the rest of the team reach similar levels, but he knows that simply showing his numbers will seem unbelievable or intimidating.

What Victor should do: Victor should run a transparent "productivity audit" of his own workflow and publish it as a technical write-up. The write-up should break down where his PRs come from: what percentage are agent-authored, what percentage are human-written, how much time he spends on review vs. implementation, and what his CI pass rate looks like. The goal is to show that his throughput isn't magic - it's the result of specific, learnable workflow choices that any developer can adopt. Victor should also identify one other developer on the team who is willing to adopt his workflow for a 4-week experiment, document the before/after, and publish that as a case study. Two data points are twice as convincing as one.

What Victor should do - role-specific action plan