Full provenance tracking per change

Full provenance tracking means that for every change that reaches production, you can reconstruct the complete lineage: the business requirement that originated the work, the ticke

·Full provenance tracking per change: model version, prompt context, agent session ID, iteration count
·Automated compliance checks run without manual intervention on every merge
·AI-generated code is distinguishable from human-written code in version control (metadata, labels, or attribution)

·Provenance data is queryable (e.g., "show all changes made by model X in the last 30 days")
·Compliance check results are aggregated into a governance dashboard

Evidence

·Provenance metadata on commits/PRs showing full attribution chain
·Automated compliance check configuration with zero manual steps
·VCS query showing AI-vs-human code distinction

What It Is

Full provenance tracking means that for every change that reaches production, you can reconstruct the complete lineage: the business requirement that originated the work, the ticket that specified it, the AI sessions that generated the code (with model versions and prompts), the human review that approved it, the CI pipeline that validated it, and the deployment event that released it. Provenance is not just an audit trail - it is a complete, queryable graph of causality from intent to production.

At L4 (Optimized), provenance tracking is automated and comprehensive rather than partial and manual. The L3 minimum viable audit trail captures four fields per AI-assisted commit. L4 provenance tracking captures the full context: linked tickets, linked requirements, the sequence of AI sessions including prompts and responses, reviewer identities and their roles, test results, security scan results, deployment metadata, and the configuration of every tool in the pipeline that handled the change. This is the difference between knowing "an AI helped write this" and knowing "here is the exact sequence of decisions and actions that produced this artifact."

The technical infrastructure for full provenance typically follows the SLSA (Supply Chain Levels for Software Artifacts) framework. SLSA defines provenance as a structured attestation: a signed document that describes what inputs went into a build, what process produced it, and what the outputs were. At SLSA Level 3, provenance is generated by the CI system (not by the developer) and is signed with a key that the developer cannot access - making it tamper-resistant. SLSA Level 4 adds hermetic, reproducible builds. For AI-assisted development, SLSA-style attestations can be extended to include AI session metadata as part of the build inputs.

As of June 2026 the bar for agent provenance is cryptographic tamper-evidence, not just a stored log. Dapr 1.18 introduced Verifiable Execution on June 11: tamper-evident workflow traces, built on SPIFFE identity, that record which agent did what, provably. The same period made it clear that "which model" is no longer enough - provenance must also record the runtime an action ran in (hosted vendor API vs self-hosted in your VPC) and the inference region, because a model can be pulled or silently re-routed between runs. Capture model id, model version, runtime, and region alongside each AI session in the attestation.

The graph structure is what distinguishes full provenance from a sequential audit trail. A change may have been influenced by multiple AI sessions across multiple days, with human decisions interleaved. A graph representation can capture this non-linear causality: the initial implementation came from session A, the security review comment came from session B, the fix for that comment came from session C, and the final reviewer was human. This graph is queryable in ways that a sequential log is not: "find all changes where an AI session generated the security fix rather than the initial implementation."

Why It Matters

Enables complete incident root cause analysis - when a production incident involves AI-generated code, full provenance lets you trace back through every decision point: what was the requirement, what did the AI generate, what did the human reviewer see, was the AI's output the issue or was the human's modification the issue?
Satisfies emerging regulatory documentation requirements - the EU AI Act's Article 12 record-keeping requirements and forthcoming technical standards will require documentation of AI system involvement at a level of detail that the minimum viable audit trail does not meet; full provenance does
Enables AI system performance analysis - with provenance data, you can ask: which AI model versions have the highest downstream incident rates? Which types of tasks have the highest human modification rates? This analysis drives AI tool adoption and configuration decisions
Creates the foundation for autonomous agent governance - as AI agents operate more autonomously (L4-L5), the human who "reviews" a change may be reviewing a summary rather than the full diff. Full provenance captures what the agent actually did, not just what the human saw when they approved
Supports supply chain security - SLSA provenance prevents the substitution or tampering of build artifacts; AI-assisted code with provenance attestations creates the same supply chain integrity guarantees for AI-generated code as SLSA provides for traditional builds

Getting Started

Implement SLSA Level 2 or 3 for your build pipeline - start with the standard SLSA provenance attestation before extending it for AI. SLSA Level 2 requires provenance from the CI system (not developer-generated); Level 3 adds signing. The official SLSA GitHub Actions generator (slsa-framework/slsa-github-generator) makes this straightforward for GitHub Actions pipelines.
Define the AI provenance schema extension - extend the SLSA provenance subject to include AI metadata: ai_sessions: [{model_id, model_version, runtime, region, session_id, prompt_hash, response_hash, timestamp, human_approver}] (where runtime distinguishes a hosted vendor API from a self-hosted deployment, and region records where inference ran). Using hashes of prompts and responses rather than the full text keeps the attestation compact while preserving the ability to verify the specific interaction.
Integrate Claude Code session export - Claude Code can export session logs in structured format. Build a post-session hook that exports the session log, computes a hash, and stages the hash for inclusion in the next commit's provenance attestation. This creates the link between the AI session and the specific commit it influenced.
Build the provenance graph store - individual attestations are useful; a queryable graph is transformative. Store provenance attestations in a graph database (Neo4j, or a PostgreSQL jsonb schema with graph query extensions) that links tickets, sessions, commits, PRs, reviews, and deployments as nodes with edges. The graph enables the investigations and analyses that a flat log cannot.
Connect to the ticket system - provenance is most valuable when it links to the business requirement. Build a bi-directional link: the ticket includes the commit hashes that implement it, and the commit provenance includes the ticket ID. This "from business requirement to production artifact" traversal is the full chain that regulators and auditors are asking for.
Validate provenance on deployment - before deploying an artifact, verify that its provenance attestation is present, valid, and signed by the expected CI system. A deployment without a valid provenance attestation should fail or require manual override with audit logging. This is the control that prevents untracked changes from reaching production.

Tip

Prompt hashing in provenance attestations is tricky because prompts that are semantically identical may have superficial differences (whitespace, punctuation). Use a canonical form of the prompt before hashing - strip whitespace, lowercase, and apply a stable JSON serialization - so that equivalent prompts produce the same hash. This makes deduplication and correlation across sessions practical.

Common Pitfalls

Trying to capture everything and making the system unusable. Full provenance does not mean infinite provenance. Define the provenance boundary: what's inside the graph (tickets, AI sessions, commits, reviews, deployments) and what's outside (developer internal reasoning, pair programming conversations, documentation research). The boundary should be driven by what questions you need to answer, not by what's technically capturable.

Not addressing the privacy implications of prompt logging. AI session prompts may contain proprietary code, internal architecture details, or (accidentally) personal data. Before storing provenance that includes prompt hashes or prompt content, assess the data classification and access control requirements. Who can query the provenance graph? Under what circumstances can full prompt content be retrieved? This policy needs to exist before the system is built.

Provenance that's generated by the developer. Provenance that the developer generates can be falsified by the developer. For compliance purposes, provenance needs to be generated by the CI system using information that the developer cannot manipulate. SLSA addresses this with builder signing; AI session provenance should follow the same principle by having the Claude Code session log exported directly to the provenance system rather than passing through a developer-controlled step.

Graph stores that aren't queried. A provenance graph that's built but never queried provides compliance evidence but no operational value. The system needs to be queried regularly - during incident investigations, during model version evaluations, during compliance audits. Build example queries into the documentation and use them in real scenarios to validate that the graph structure supports the questions you actually need to answer.

Ignoring provenance for automated agent PRs. If an autonomous agent creates a PR directly (without a developer initiating the session), the provenance model changes: there may be no human who chose to run an AI session - the agent ran itself. Ensure your provenance model has a valid representation for autonomous agent actions where the "human initiator" is an automation trigger (a schedule, a webhook, a merge event).

How Different Roles See It

BobHead of Engineering

Bob is preparing for a major enterprise customer's security review, which includes a software supply chain assessment. The customer is asking for evidence that every change to the software that runs their data has complete provenance - from ticket to production. Bob has the MVAT audit trail from L3, but the customer wants to see the ticket linkage and the deployment record, not just the commit metadata.

What Bob should do: Bob should scope a provenance graph MVP that connects the three systems the customer cares about: the issue tracker (Jira or Linear), the code repository (GitHub), and the deployment platform (Kubernetes with deployment records). The MVP is a script that, given a ticket ID, traverses the graph and outputs: the ticket, the PRs that closed it, the commits in those PRs with AI provenance data, and the deployment record. This traversal report is the supply chain evidence the customer wants. Bob doesn't need to build the full graph database in the first iteration - a script that queries three APIs and joins the results is enough to demonstrate provenance for the audit.

SarahProductivity Lead

Sarah has been tracking AI adoption metrics but wants to build a more sophisticated analysis: what types of tasks are AI most effective at, and does effectiveness correlate with any provenance attributes (model version, human modification rate, task type)? The provenance graph gives her the data to answer these questions systematically.

What Sarah should do: Sarah should build an analysis pipeline on top of the provenance graph. Starting with a cohort of the last 200 AI-assisted PRs, she can compute: percentage of AI-generated code that was modified during human review (high modification = AI less effective for this task type), time from AI session to PR merge (efficiency measure), downstream defect rate by task type and model version. The goal is a "what works and what doesn't" map that guides how the team uses AI tools. Sarah should present this analysis quarterly and use it to update the guidance on which tasks are best suited for AI assistance.

VictorStaff Engineer - AI Champion

Victor's workflow involves complex multi-agent orchestration: a planner agent creates a task breakdown, multiple worker agents implement the tasks in parallel, and Victor reviews the combined output. The provenance graph needs to capture this hierarchical agent structure, not just flat session-to-commit links.

What Victor should do: Victor should propose and implement a hierarchical provenance model that captures agent orchestration. The graph extension adds: a "workflow" node that represents the top-level task, "session" nodes for each agent session, and parent/child edges between orchestrator and worker sessions. The commit provenance then links to the workflow node rather than individual sessions. This model makes it possible to answer: "what was the top-level task that motivated this change?" and "which agent sessions contributed to this PR and in what roles?" Victor should prototype this with his own workflows, validate that the resulting graph is queryable and useful, and then propose it as the standard for multi-agent provenance.

From the Field

Recent releases, projects, and discussions relevant to this maturity level.

articleL4

news.ycombinator.comViatoris: Signed receipts and audit trails for AI agentsViatoris establishes cryptographic accountability for AI agents by replacing standard logs with signed receipts, facilitating systematic enterprise governance. news.ycombinator.com

articleL4

github.comShow HN: Lmscan – Detect AI text and fingerprint which LLM wrote it (zero deps)lmscan is a zero-dependency Python tool designed for local detection and fingerprinting of LLM-generated text, identifying specific model signatures without extgithub.com

discoveredL4

regent-vcs/re_gentGit for AI coding agents.re_gent (rgt) implements automated, tool-level version control for AI agents, specifically optimized for Claude Code workflows. Written in Go, the tool replacesgithub.com

releaseL4

mem0ai/mem0Mem0 v0.1.1 formalizes OpenCode Plugin distribution via a Bun-based CI/CD pipeline (opencode-plugin-cd.yml) and OIDC-backed npm provenance for supply chain secugithub.com

Where does your team actually sit on this?

This guide describes one level of one area. Run the assessment to place your team across all 16 areas, see which gates you have passed, and get a report you can take to your stakeholders.

Start the assessment

Governance & Compliance

Compliance gates in CI Automated compliance checks; audited MCP/skill supply chain (PolicyLayer State of MCP: 42% of servers expose a destructive tool; NVIDIA-Verified Skills)