Review shifts: from writing to evaluating code
When most code in a PR is agent-generated, the reviewer's job changes fundamentally.
- ·Platform Engineer role with AI tooling responsibility exists on the platform team
- ·Context Engineer is a full dedicated role (not part-time, not combined with other duties)
- ·Team's primary activity has shifted from writing code to evaluating and reviewing AI-generated code
- ·Role definitions are updated to reflect AI-augmented responsibilities
- ·Hiring criteria include AI tool proficiency
Evidence
- ·Platform Engineer job description including AI tooling responsibilities
- ·Context Engineer role as a dedicated position (headcount or full-time allocation)
- ·Time tracking showing majority of developer time on review/evaluation vs. writing
What It Is
When most code in a PR is agent-generated, the reviewer's job changes fundamentally. In a world where every line was written by a human who thought carefully about the implementation, review is primarily about finding bugs and improving the code. In a world where most code was generated by an agent working from a task specification, review is primarily about evaluating whether the agent understood the intent and produced something that fits the system - and secondarily about finding the specific failure modes of AI-generated code.
The shift is from microreview to macrofit evaluation. Human-written code review looks at individual lines: is this the right algorithm? Is this null check necessary? Is this naming clear? AI-generated code review looks at the system level: did the agent understand what this feature is supposed to do? Does the implementation fit how the rest of the system is structured? Does it handle the edge cases that this domain requires? The individual line-level concerns are still there, but they are a smaller fraction of the review value.
This shift requires new review skills that aren't obvious from traditional code review practice. The reviewer needs to hold two things in mind simultaneously: the original intent (what the task specification said to build) and the actual implementation (what the agent built). A significant class of AI-generated defects is not "the code is wrong" - it's "the code is a plausible implementation of a misunderstanding of the spec." These are harder to catch than traditional bugs because the code runs, passes tests, and looks reasonable. Catching them requires the reviewer to independently reason about whether the implementation achieves the intended purpose, not just whether it is correct code.
The practical change is that reviewers should spend more time reading the task specification and less time tracing the implementation. A reviewer who starts with the code and works their way back to the intent is less effective than one who starts with the intent and checks whether the code achieves it. This may feel counterintuitive for developers trained in traditional code review, where the code is the primary artifact.
Why It Matters
This shift in review approach produces better outcomes than applying traditional review methods to AI-generated code:
- Catches intent misalignment before it reaches production - the most expensive AI-generated defects are spec misunderstandings that pass tests; intent-first review catches these where line-by-line review misses them
- Makes review faster - evaluating whether 200 lines of code achieves the stated intent is often faster than checking every line for correctness; reviewers who make this shift report review becoming less cognitively exhausting
- Improves agent task specification quality - when reviewers consistently identify that agents misunderstood the spec, it creates evidence for improving the spec template; review becomes a feedback channel for prompt quality
- Enables higher throughput - L3 teams produce more code volume than L1 or L2 teams; traditional line-by-line review becomes a bottleneck at this volume; the shift to intent-based review is what makes the throughput sustainable
- Develops a new high-value skill - developers who become expert at evaluating AI-generated code fitness are developing a skill that becomes more valuable as AI code volume increases; this is the foundation of the senior-reviewer role at L4
A good review question to start with for AI-generated code is: "What would the agent have had to misunderstand to write this?" If you can answer this, you know what to check first. Most AI-generated defects fall into 3-5 consistent misunderstanding categories for any given codebase - learn them and look for them explicitly.
Getting Started
6 steps to get from here to the next level
Common Pitfalls
Mistakes teams actually make at this stage - and how to avoid them
How Different Roles See It
Bob's team is producing more code volume than ever, but review has become a bottleneck. His senior developers are drowning in PR reviews and complaining about the volume. The junior developers, who are generating most of the AI-assisted code, are frustrated by the slow review cycle. Bob suspects the review process is not adapting to the new reality.
What Bob should do - role-specific action plan
Sarah's developer survey shows a split: AI tool users rate code review as "more frustrating" than non-AI users, despite generating code faster. When she digs into the comments, the pattern is clear: AI-generated PRs are getting the same level of detailed stylistic feedback as hand-written PRs, and developers feel the standard is unreasonably high. She needs to recalibrate the review culture without reducing quality.
What Sarah should do - role-specific action plan
Victor reviews more AI-generated code than anyone on the team and has developed strong intuitions for what to look for. He can scan a 200-line AI-generated PR in 10 minutes and find the one or two issues that actually matter. He can also spot when an AI has misunderstood the spec before reading a single line of code, just by reading the diff size versus the spec complexity. He hasn't articulated how he does this.
What Victor should do - role-specific action plan
Further Reading
4 resources worth reading - hand-picked, not scraped
From the Field
Recent releases, projects, and discussions relevant to this maturity level.