AI code vs human code distinction in VCS

AI code vs. human code distinction in version control means that the repository's history explicitly tags which lines, commits, or files were generated by AI systems versus written

·Full provenance tracking per change: model version, prompt context, agent session ID, iteration count
·Automated compliance checks run without manual intervention on every merge
·AI-generated code is distinguishable from human-written code in version control (metadata, labels, or attribution)

·Provenance data is queryable (e.g., "show all changes made by model X in the last 30 days")
·Compliance check results are aggregated into a governance dashboard

Evidence

·Provenance metadata on commits/PRs showing full attribution chain
·Automated compliance check configuration with zero manual steps
·VCS query showing AI-vs-human code distinction

What It Is

AI code vs. human code distinction in version control means that the repository's history explicitly tags which lines, commits, or files were generated by AI systems versus written by humans. It goes beyond the MVAT's PR-level metadata to create line-level or commit-level attribution that makes AI authorship queryable across the entire codebase. The question "what percentage of the payment processing module was AI-generated?" has a concrete, computable answer.

This distinction matters at L4 (Optimized) because the scale of AI code generation has crossed a threshold where aggregate statistics become operationally relevant. When 30-50% of new code is AI-generated, the questions change: which parts of the codebase have the highest AI-code concentration? Are AI-dense areas of the codebase correlated with higher defect rates? Do changes to AI-generated code by human developers introduce more bugs than changes to human-written code, or vice versa? These questions require line-level attribution, not just PR-level flags.

The technical approach to AI/human distinction in VCS combines several mechanisms. Git trailers on commits flag AI-generated commits, as established in L3. But at L4, this is extended to a git blame-compatible attribution: each line in the codebase has an authorship record that includes whether the committing AI session generated it or the human wrote it. This is implemented as a custom git attribute (.gitattributes) combined with a wrapper around git blame that reads the commit-level AI flags and presents them alongside normal blame information.

There's a philosophical question embedded in the AI/human distinction: what does it mean for code to be "AI-generated"? A line that an AI suggested, a human modified slightly, and then committed - is that AI-generated or human-written? The pragmatic answer at L4 is: code that originated in an AI session is AI-attributed unless a human substantially modified it before commit (where "substantially" means more than whitespace, variable renaming, or formatting changes). Organizations that try to achieve perfect categorization end up in a definitional swamp. A practical threshold, consistently applied, is better than a theoretically pure definition that cannot be implemented.

Why It Matters

Enables codebase-level AI attribution analytics - without line-level attribution, you can only ask "was AI used in this PR?" With line-level attribution, you can ask "what percentage of the authentication module is AI-generated, and what is its defect density compared to the human-written portions?"
Satisfies EU AI Act documentation requirements for high-risk systems - for organizations building high-risk AI systems (under EU AI Act Annex III), documentation of how AI was used in developing the system is required; VCS-level attribution is the most credible form of this documentation
Enables license risk assessment - AI-generated code may carry different license obligations depending on the model and the training data. Knowing which code is AI-generated enables targeted legal review of AI-attributed code sections
Supports differential review policies - organizations can implement a policy where AI-generated code requires stricter review (a second reviewer, mandatory security scan, specific reviewer qualification) by detecting AI attribution at the line or file level
Feeds the model performance analysis - correlating AI attribution with downstream defect rates, performance issues, and security vulnerabilities gives the most granular signal available about which AI-generated patterns are problematic

Getting Started

Establish the attribution standard - write a one-page specification that defines: what constitutes AI-generated code (generated in an AI session with less than N% human modification), what the VCS attribution format is (a commit-level tag that applies to all lines in that commit), and what the attribution exception process is (how to mark a commit as "AI-generated but substantially human-modified"). This specification is the normative reference for the implementation.
Implement commit-level AI attribution tags - extend the MVAT git trailers with an AI-Coverage: [none|partial|substantial|full] field. "Full" means the commit content was generated by AI with minimal human modification. "Partial" means a mix of AI-generated and human-written code in the same commit. "Substantial" means AI was used in ideation or context but the code was written by the human. "None" means no AI involvement.
Build the AI-attribution git blame wrapper - write a script (ai-blame) that runs standard git blame and overlays AI attribution data from commit trailers. The output shows, for each line, the standard blame information (commit hash, author, date) plus the AI-Coverage value from that commit. This makes AI attribution as accessible as standard blame information.
Build the repository attribution dashboard - aggregate AI-Coverage data across all commits in a repository to produce: percentage of current code that is AI-generated (by file, by directory, by module), trend over time (how has AI coverage changed month by month), and concentration map (which areas have the highest AI-code density). Update this dashboard weekly.
Integrate with code review tools - configure your code review tool to display AI attribution alongside diff lines. When a reviewer sees a diff, they should be able to see whether each hunk of the change is AI-generated or human-written. This context informs the depth of review appropriate for each section.
Define differential review policies - decide whether AI-attributed code in high-risk repositories requires additional review steps. Candidate policies: "full AI-generated commits in SOC2-scope repositories require two human reviewers," "AI-generated code in security-sensitive modules requires a security team member as a reviewer." Implement these as compliance gate rules that evaluate the AI-Coverage field.

Tip

Track the human modification rate as a leading indicator of AI tool quality. If the AI-Coverage goes from "full" to "partial" for a large percentage of commits in a given month, it means developers are significantly modifying AI output before committing - which is a signal that either the AI tool's quality has declined or the task mix has shifted to areas where AI is less effective. The attribution data is a feedback loop for AI tool configuration.

Common Pitfalls

Trying to track attribution at the line level rather than the commit level. Line-level attribution is theoretically more accurate but practically untenable: it requires tracking which specific lines within a commit were AI-generated, which changes as developers modify AI output, which changes again as the codebase evolves and lines are moved and modified. Commit-level attribution with a coverage classification (full/partial/none) is accurate enough for all practical purposes and infinitely more maintainable.

Creating a two-tier codebase culture. If AI-generated code is visibly flagged in review tools, developers may start treating AI-generated code as "lower quality by default" even when it isn't. The attribution data should be used to calibrate review intensity based on empirical risk data, not to create blanket quality judgments. The message should be: "we apply stricter review to AI-generated code because we're still building empirical confidence in it, not because it's inherently worse."

Ignoring the classification of human modifications. When a human developer modifies AI-generated code, the modified version remains "AI-generated" by the attribution standard. But that may become increasingly misleading over time as the code evolves. Consider implementing an attribution decay policy: after a file has been 50%+ modified by human developers since its AI-generated commit, it gets reclassified as "human-modified AI original" rather than "AI-generated." This keeps attribution data meaningful as the codebase ages.

Not addressing generated test code separately. AI-generated tests and AI-generated production code are different attribution categories with different risk profiles. A function that was AI-generated has one risk profile; a test that was AI-generated to test that function has a different (and potentially correlated) risk profile. The attribution system should distinguish these categories.

Attribution data that nobody uses. An AI attribution system that generates data but has no consumers is expensive governance theater. The attribution data needs at least two active use cases to justify its maintenance: a compliance use case (EU AI Act documentation, SOC2 evidence) and an operational use case (model performance analysis, differential review routing). If neither use case is active, descope the implementation until the use cases exist.

How Different Roles See It

BobHead of Engineering

Bob's organization has been generating code with AI agents for 18 months and is now in a situation where a significant fraction of the production codebase is AI-generated, but nobody knows exactly what percentage or which areas have the highest concentration. A new enterprise customer is asking for an AI content disclosure as part of their vendor security assessment.

What Bob should do: Bob should approach this in two phases. Phase 1 (immediate): do a best-effort analysis of the last 18 months of commits using the AI disclosure fields in PR descriptions. Even imperfect data (some PRs have disclosures, some don't) gives a directional answer to "what percentage of the codebase is AI-generated?" and "which modules have the highest concentration?" This gives Bob an answer for the customer assessment with appropriate confidence qualifications. Phase 2 (structural): implement the AI-Coverage commit trailer and attribution dashboard going forward, so future assessments are data-driven rather than estimated. Bob should be honest with the customer about the current state: "our retrospective analysis suggests X% AI-generated code; we're implementing systematic attribution tracking that will provide precise data going forward."

SarahProductivity Lead

Sarah wants to test the hypothesis that AI-generated code has a different defect profile than human-written code. She has 18 months of production defect data and can correlate it with AI attribution data from the PR disclosure records (imperfect, but available). This analysis would be the first empirical evaluation of AI code quality in the organization's own codebase.

What Sarah should do: Sarah should define the analysis carefully before running it, to avoid confirmation bias. The hypothesis is specific: "production defects that require a code fix (not configuration fixes) have a different rate in AI-attributed files versus human-attributed files, controlling for file age and complexity." Sarah should run this analysis with a statistician or data scientist who can handle the confounding variables appropriately. The results - whether the hypothesis is confirmed, refuted, or the data is insufficient - are equally valuable. Confirmed: implement differential review policies for high AI-concentration areas. Refuted: use the data to counter the "AI code is lower quality" narrative. Insufficient: make the case for systematic attribution data collection.

VictorStaff Engineer - AI Champion

Victor's workflow produces commits where the AI-Coverage classification is genuinely ambiguous: he often takes an AI-generated implementation, restructures it significantly, adds edge case handling, and updates it based on code review feedback. By the time the code is committed, it's a collaboration between him and the AI that's hard to classify as "full" or "none."

What Victor should do: Victor should be the advocate for the "partial" classification and help define what it means precisely. He should document his own workflow as a case study: here is an AI session that generated the initial implementation, here are the modifications I made before committing, here is my judgment about the classification (partial: AI generated the structure, human added the edge cases and review feedback). Victor should propose that "partial" commits include a brief description of what the human contributed: AI-Coverage: partial\nAI-Human-Split: AI generated authentication flow; human added token refresh edge cases and rate limiting. This richer attribution data is more useful for both compliance documentation and model performance analysis than a simple classification alone.

From the Field

Recent releases, projects, and discussions relevant to this maturity level.

articleL4

news.ycombinator.comViatoris: Signed receipts and audit trails for AI agentsViatoris establishes cryptographic accountability for AI agents by replacing standard logs with signed receipts, facilitating systematic enterprise governance. news.ycombinator.com

articleL4

github.comShow HN: Lmscan – Detect AI text and fingerprint which LLM wrote it (zero deps)lmscan is a zero-dependency Python tool designed for local detection and fingerprinting of LLM-generated text, identifying specific model signatures without extgithub.com

discoveredL4

regent-vcs/re_gentGit for AI coding agents.re_gent (rgt) implements automated, tool-level version control for AI agents, specifically optimized for Claude Code workflows. Written in Go, the tool replacesgithub.com

releaseL4

mem0ai/mem0Mem0 v0.1.1 formalizes OpenCode Plugin distribution via a Bun-based CI/CD pipeline (opencode-plugin-cd.yml) and OIDC-backed npm provenance for supply chain secugithub.com

Where does your team actually sit on this?

This guide describes one level of one area. Run the assessment to place your team across all 16 areas, see which gates you have passed, and get a report you can take to your stakeholders.

Start the assessment

Governance & Compliance

Automated compliance checks; audited MCP/skill supply chain (PolicyLayer State of MCP: 42% of servers expose a destructive tool; NVIDIA-Verified Skills)Continuous compliance: agent monitors regulatory changes