Back to Development
developmentL2 GuidedCode Review & Quality

Diff awareness - reviewer knows it's AI code

When reviewers know which parts of a PR are AI-generated, they can calibrate review depth to match the actual risk - spending more time on business logic and less on syntax.

  • ·AI-assisted review tool (CodeRabbit, Qodo, or equivalent) is active on all repositories
  • ·Linter rules are configured and run in CI on every PR
  • ·PRs clearly indicate whether code is AI-generated or AI-assisted (labels, tags, or commit metadata)
  • ·AI review suggestions are triaged (accepted/rejected) rather than ignored
  • ·Linter configuration is committed to the repository and versioned

Evidence

  • ·AI review tool configuration in CI pipeline
  • ·Linter configuration file in repository
  • ·PR labels or commit metadata distinguishing AI-generated code

What It Is

Diff awareness is the practice of making AI code provenance visible to reviewers at review time - so a reviewer looking at a pull request knows which parts were AI-generated versus hand-crafted. At L2 (Guided), this is implemented through lightweight conventions: a PR label (ai-assisted), a note in the PR description ("The database query in UserRepository.kt was generated by Claude and reviewed by me"), or a commit message convention that distinguishes AI-generated changes.

This awareness is a stepping stone from the L1 state (no distinction between AI and human code) to the L3 state (systematic tracking and policy enforcement). At L2, the mechanism is social and manual - the developer discloses, the reviewer uses the information - rather than automated or enforced. But even manual disclosure provides immediate value: it changes how reviewers allocate their attention.

Knowing a function was AI-generated tells a reviewer something important: the code is likely syntactically correct and stylistically consistent, but may be semantically wrong for the specific business context. AI code tends to be good at the "how" and inconsistent at the "what" - it produces well-structured code that does the wrong thing, or that's missing the crucial edge case that exists only in your specific system's behavior. This is the exact opposite of the typical human-generated code failure mode (correct in intent, inconsistent in execution), and it calls for a different review focus.

Diff awareness doesn't mean AI code gets more review - it means it gets differently focused review. The reviewer can skim the syntax (the AI handled that) and focus on whether the logic matches the business requirement.

Why It Matters

Review calibration based on provenance makes review more efficient and more effective simultaneously:

  • More efficient - Reviewers don't need to read AI-generated boilerplate as carefully as human-written logic. Knowing a function is AI-generated allows skimming the implementation and focusing on the interface and contract.
  • More effective - AI-generated code has characteristic failure modes: plausible-looking but wrong business logic, missing error cases that aren't present in training data analogues, over-engineering for generic cases. Knowing code is AI-generated prompts reviewers to test it against these failure modes specifically.
  • Cultural normalization - When developers disclose AI use, it becomes normal. This is healthier than the alternative (silent AI use), where reviewers are reviewing AI code without knowing it and applying inappropriate review heuristics.
  • Data foundation - PR-level disclosure (even informal) is the seed of the provenance data you'll need for L3 systematic measurement. You can't analyze AI code quality patterns if you don't know which code is AI-generated.
  • Author self-review trigger - The act of writing "this was AI-generated" in a PR description prompts the author to consciously review the AI code before submitting. This alone catches a meaningful number of issues.

The key insight is that AI code is not uniformly higher or lower quality than human code - it's differently distributed in its failure modes. Review practices that calibrate to those specific failure modes get more value from the same review effort.

Tip

Establish a lightweight convention now rather than a heavyweight one. A single PR label and a sentence in the description is enough to start. Teams that try to track AI code at line-level granularity before they have automated tooling burn out quickly. Get the habit established first; improve the precision later.

Getting Started

6 steps to get from here to the next level

Common Pitfalls

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

B
BobHead of Engineering

Bob's team has been using Copilot for six months. In retrospective discussions, he's heard senior engineers say they sometimes feel like they're reviewing code without knowing how it was produced - some looks AI-generated but they're not sure, which makes them uncertain about review depth. There's no disclosure convention, so every review is flying blind on provenance.

What Bob should do - role-specific action plan

S
SarahProductivity Lead

Sarah has noticed that her PR cycle time metrics show high variance: some PRs close in 2 hours, others take 3 days. She suspects the long-tail PRs involve a lot of revision cycles (review comments, fixes, re-review). She wants to understand whether AI-generated code is correlated with more or fewer revision cycles.

What Sarah should do - role-specific action plan

V
VictorStaff Engineer - AI Champion

Victor reviews 15-20 PRs per week. He's started recognizing Copilot-generated code by its patterns - a certain way of structuring error handling, characteristic variable names, boilerplate that's slightly too complete to be hand-typed. He's getting good at it, but it feels like guesswork. He wants explicit disclosure so he can calibrate efficiently.

What Victor should do - role-specific action plan