No distinction between AI vs human code

At L1, teams can't see how much of their codebase is AI-generated - making it impossible to measure adoption, calibrate review depth, or understand quality patterns.

·All code is reviewed by a human before merge
·Basic CI checks run on changes

·Code review turnaround is tracked (even if slow)
·Team is aware that AI-generated code has higher defect rates (1.7x issues, 2.74x security vulnerabilities)

Evidence

·PR approval records showing human reviewer on every merged PR
·Average review turnaround time in PR analytics

What It Is

At L1, AI-generated code and human-written code are indistinguishable in the repository. A developer uses Copilot to generate a function body, accepts the suggestion, and commits it. The commit looks identical to a commit where the developer typed every character manually. There are no labels, no metadata, no annotations - nothing in the version history reveals the proportion of code that came from an AI model.

This lack of distinction is the natural starting point. When developers first adopt autocomplete or chat-based AI, they're adding a tool to their personal workflow without changing how they interact with the team's shared systems. The code looks the same, passes the same CI checks, gets reviewed the same way. There's no reason - yet - to distinguish the provenance.

The problem emerges when AI code becomes a significant fraction of commits. As adoption grows from one enthusiast to half the team, the organization develops a meaningful blind spot: it can't measure AI adoption at all. It can't answer "what percentage of our new code is AI-generated?" It can't tell whether AI-generated code has different defect rates than human-written code. It can't calibrate review depth for changes where AI did most of the work versus changes that required careful human reasoning. All of this information is invisible because no one added provenance tracking when adoption was small.

The solution isn't restriction - it's instrumentation. Teams don't need to prevent AI code, they need to see it.

Why It Matters

The invisibility of AI-generated code creates several compounding problems as teams move up the maturity ladder:

Adoption is unmeasurable - If you can't see how much of the codebase is AI-generated, you can't measure whether your AI tooling investment is being used. ROI calculations become guesswork.
Quality patterns are invisible - If AI-generated code has different defect rates (in either direction), you can't discover that pattern without provenance data. You can't improve what you can't see.
Review calibration is impossible - A reviewer who knows a PR is 90% AI-generated might (correctly) spend more time on business logic correctness and less on syntax. Without that signal, review effort is allocated uniformly regardless of how the code was produced.
Compliance risk accumulates silently - Some industries and clients require disclosure of AI-generated content. Without tracking, compliance is effectively impossible.
Context is lost for future analysis - Code that seems fine today may turn out to have been generated by a model with a known weakness. Without provenance, you can't audit which parts of your codebase might be affected.

As of April 2026, the cost of not distinguishing AI from human code is no longer theoretical - it's quantifiable. Studies show AI-generated code has 1.7x more issues than human-written code, with 2.74x more security vulnerabilities. In March 2026 alone, 35 new CVEs were attributed to AI-generated code. Not distinguishing AI code provenance is now a measurable security risk, not just a process gap. Organizations without attribution cannot audit which parts of their codebase may be affected by model-specific weaknesses, cannot prioritize security review for AI-heavy modules, and cannot demonstrate compliance with emerging AI code disclosure requirements.

The transition from L1 to L2 on this dimension is the shift from "we know AI is being used somewhere" to "we can see where AI code is in our repository and how it behaves." That visibility is the foundation for L3 systematic measurement and L4 policy-based automation.

Tip

Start tracking AI code origin at the PR level - a simple label or PR template checkbox is enough to begin. You don't need line-level attribution to get actionable data. Even coarse-grained data ("this PR was primarily AI-generated") is far more useful than no data at all.

Getting Started

Adding provenance tracking doesn't require a large infrastructure investment. Start with the lightest possible approach and refine from there:

Add an AI attribution field to your PR template - A simple checkbox: "This PR includes significant AI-generated code (>20% of changes)." This gives immediate coarse-grained signal with zero tooling investment. Even this minimal data will surface useful patterns within a few months.
Create a PR label for AI-assisted work - GitHub and GitLab both support labels. Create an ai-assisted label and establish a team norm: developers apply it when AI generated more than a token amount of the code in the PR. This makes AI PRs filterable in your PR history.
Review your AI tool's analytics - GitHub Copilot's dashboard shows acceptance rates and estimated lines of code generated per developer. This doesn't give you per-PR attribution, but it gives you team-level and developer-level adoption data. Pull this monthly.
Establish a code comment convention for AI-generated blocks - Some teams use a convention like // AI-generated, reviewed by [author] for non-trivial AI-generated functions. This is optional but useful for future auditing.
Analyze your first month of labeled PRs - After 30 days of PR labeling, look at defect rates for AI-assisted vs. human-only PRs. Is there a pattern? Do AI-assisted PRs require more or fewer revision cycles? This data becomes the basis for review calibration decisions.
Move toward tooling as volume grows - Once AI code exceeds 30% of new commits (common at L3-L4), manual labeling becomes insufficient. Evaluate tools like GitHub's code provenance features or internal tooling that detects AI-generated patterns automatically.

Common Pitfalls

Treating attribution as surveillance. If developers perceive AI attribution as management monitoring their AI tool usage, they'll resist it or game it. Frame attribution as quality instrumentation: "We're measuring AI code quality, not AI code quantity." The goal is understanding patterns, not policing behavior.

Waiting for perfect attribution before starting. Line-level attribution would be ideal, but it requires tooling that doesn't broadly exist yet. Coarse PR-level attribution (a checkbox, a label) provides most of the value immediately. Don't wait for perfect instrumentation - start with good enough.

Assuming AI code is lower quality. The instinct to flag AI code for extra scrutiny assumes it's worse. The reality is more nuanced: AI code tends to be better at boilerplate and worse at business logic. Attribution data helps you discover the actual pattern for your team rather than assuming.

Ignoring the provenance of rejected suggestions. Tracking only accepted AI suggestions misses the question of how AI is influencing code that was ultimately human-written. A developer who reads and rejects 5 Copilot suggestions before writing their own code has still been influenced by AI in ways that may matter for quality analysis.

How Different Roles See It

BobHead of Engineering

Bob's leadership team is asking him to report on AI adoption. He approved Copilot licenses for 50 engineers six months ago. He knows usage is uneven but can't answer basic questions: How many developers are using it weekly? What percentage of new code is AI-generated? Is the code quality different? He has no data.

What Bob should do: Bob needs to establish two immediate tracking mechanisms. First, pull the GitHub Copilot usage dashboard - this requires no new process and gives per-developer acceptance rate and estimated lines generated. Second, add an AI attribution checkbox to the team's PR template. With these two data sources, Bob can produce a monthly AI adoption report within 30 days. He should also set expectations with leadership: the goal of measurement at L1-L2 is adoption visibility, not ROI proof. The ROI case comes at L3 when systematic measurement (ITS, cycle time) can be attributed to AI-assisted workflows.

SarahProductivity Lead

Sarah is preparing a quarterly review of the company's AI tooling investment. The CFO wants to know the return on the Copilot licenses. Sarah has no data showing AI impact on productivity because the team has never tracked AI code provenance. She's looking at usage metrics from GitHub's dashboard but they don't connect to outcomes.

What Sarah should do: Sarah should be honest with stakeholders that the current measurement gap is a solvable problem, not a permanent blind spot. Her 60-day plan: add AI attribution to PR templates, pull Copilot acceptance rate data per team, and run a retrospective with 3-5 developers who are heavy Copilot users to gather qualitative data on where it's saving time. This gives her a credible narrative: "We're building the instrumentation that will let us measure ROI systematically. Here's what the data looks like today, here's what it will look like in 90 days." The PR attribution data becomes the foundation for the L3 metrics she'll eventually use to close the loop on investment.

VictorStaff Engineer - AI Champion

Victor is a heavy Copilot user and has been encouraging his teammates to use it. But in code review, he's noticed he has no way to know which parts of a PR were AI-generated. Sometimes he reviews AI-generated code as if it were carefully hand-crafted, only to discover (from a conversation with the author) that it was a Copilot completion accepted without much scrutiny. He thinks this should be visible.

What Victor should do: Victor should champion the attribution convention. He can start with his own PRs: add a brief note to the PR description ("Functions X and Y were Copilot-generated, reviewed by me") and demonstrate to the team why it's useful. When he's reviewing PRs, he can ask authors directly: "Was any of this AI-generated?" and use the answer to calibrate his review. Victor is well-positioned to propose the formal PR template addition to Bob. He can make the case in terms Bob understands: "Without this data, we can't measure AI adoption or understand quality patterns. It's a 30-second addition to PR creation that gives us months of useful data."

From the Field

Recent releases, projects, and discussions relevant to this maturity level.

articleL1

williamangel.netThe AI Tarpit: Why You Can't Stop Reading Your CodeEngineering maturity is stalling at the 'human-in-the-loop' verification stage as AI-generated code shifts the bottleneck from synthesis to validation. The 'AI williamangel.net

Where does your team actually sit on this?

This guide describes one level of one area. Run the assessment to place your team across all 16 areas, see which gates you have passed, and get a report you can take to your stakeholders.

Start the assessment

Code Review & Quality

Review turnaround measured in hours AI-assisted review suggestions (CodeRabbit, Qodo 2.0)