Vercel SDI model: infra recommends code changes

The Vercel Software-Defined Infrastructure (SDI) model describes a paradigm where infrastructure does not just run code - it actively analyzes production behavior and surfaces spec

·Production anomaly detection auto-creates tickets and triggers agent investigation
·Self-healing for known patterns: agent detects known error pattern, applies known fix, deploys, and verifies
·Infrastructure recommends code changes based on production data (Vercel SDI model)

·Auto-created tickets include full context (traces, logs, affected users, similar past incidents)
·Self-healing success rate is tracked (% of auto-fixes that resolve the issue without human intervention)

Evidence

·Auto-ticket creation logs triggered by production anomalies
·Self-healing event logs showing detection, fix, deploy, and verification steps
·Infrastructure recommendation pipeline configuration (production data to code change suggestions)

What It Is

The Vercel Software-Defined Infrastructure (SDI) model describes a paradigm where infrastructure does not just run code - it actively analyzes production behavior and surfaces specific, actionable code changes back to developers. The infrastructure becomes a feedback mechanism: it observes what the code does in production, identifies patterns that indicate performance problems, reliability risks, or optimization opportunities, and generates concrete recommendations in the form of code diffs or PR suggestions. The flow is bidirectional: code defines how infrastructure behaves, and infrastructure informs how code should be written.

The Vercel implementation is the clearest public example of this pattern. Vercel's infrastructure analyzes deployment size, cold start frequency, bundle composition, edge caching behavior, and runtime performance for every deployment. When it detects that a specific JavaScript import is adding 800KB to a serverless function bundle, it surfaces that information as a recommendation: "this import can be replaced with this smaller alternative, which would reduce your cold start time by 340ms." When it detects that a route is being rendered server-side when it could be statically generated, it proposes the change. The infrastructure is not just a passive execution environment - it is an opinionated advisor that knows the production implications of code decisions and communicates them directly.

At L4, implementing this model requires connecting three systems: the observability stack (which captures production behavior), a code analysis layer (which links production behavior to specific code), and a developer workflow integration (which surfaces recommendations as PRs, comments, or notifications). The observability stack captures that request X is slow. The code analysis layer identifies that request X executes this specific function, which has this N+1 query pattern. The developer workflow integration creates a PR suggesting the fix. The infrastructure moves from "here is what is happening" (monitoring) to "here is what you should change" (prescription).

The key technical enabler is the linkage between production traces and source code. OpenTelemetry traces capture which functions were called during a request, how long they took, and what database queries they triggered. When these traces are analyzed against the source code (via a code intelligence layer), the production behavior can be attributed to specific lines of code. A trace showing 47 database queries for a single request can be attributed to the ORM call on line 234 of user_service.py that generates an N+1 query pattern. The infrastructure recommendation is then not abstract ("you have an N+1 query problem") but specific ("change line 234 to use select_related('profile') to reduce this to 1 query").

Why It Matters

Infrastructure that recommends code changes closes the loop between production reality and development decisions:

Zero-latency production feedback for developers - instead of discovering performance problems months later when users complain, developers receive specific recommendations based on production behavior within hours of deployment
Recommendations are specific to the production context - unlike static analysis tools that flag potential issues, production-informed recommendations are triggered by actual runtime behavior under real traffic patterns and data distributions
Eliminates the "I did not know it would behave that way in production" failure mode - developers who cannot predict production behavior from development testing get real production data instead
Infrastructure expertise encoded as recommendations - not every developer knows optimal database query patterns, bundle composition, or caching strategies; production-informed recommendations democratize this expertise
Creates a continuous improvement cycle - each recommendation accepted and merged improves production performance, which refines the signal for subsequent recommendations; the system gets better at identifying valuable optimizations over time

Getting Started

Build production-to-code attribution - This is the hardest step and the prerequisite for everything else. You need to link a specific slow trace to the specific lines of source code that produced it. OpenTelemetry custom spans with code.function and code.filepath attributes (from the OpenTelemetry semantic conventions) embed source location in traces. For database queries, ORM integrations often include the source file and line number in the query trace. Build or adopt a system that maps trace spans to source code locations.
Define the recommendation categories - Before generating recommendations, catalog the specific patterns you want to detect and fix: N+1 database queries, missing database indexes for observed query patterns, large bundle imports that could be replaced, synchronous I/O in hot paths, missing cache headers for cacheable responses. Each category needs a detection logic (what trace pattern indicates this problem) and a recommendation template (what code change addresses it).
Build a production performance regression detector - Track P99 latency and error rate per endpoint, per function, and per database query pattern over time. When a deployment introduces a regression, attribute it to the code changes in that deployment. This is the "infra notices a regression and tells you where" flow: after deployment, the system compares current production metrics to baseline and generates "this endpoint got 40% slower after your last deploy; the new code path on line 89 of checkout.py added a synchronous database call."
Integrate recommendations into the developer workflow - Recommendations that appear only in a dashboard will be ignored. Surface them where developers work: as PR comments on the code that caused the production issue, as Slack notifications when a new performance regression is detected, or as GitHub Actions annotations on deployment PRs that show projected performance impact. The recommendation needs to reach the developer at the moment they are most likely to act on it.
Use agents to generate the recommendation code - The detection logic identifies the problem and the code location; an AI agent generates the actual fix. "This ORM call generates N+1 queries; add .select_related('profile') to fix it" can be generated as a full code diff by an agent that receives the ORM call context. The agent creates a PR with the fix, assigns it to the developer who wrote the original code, and explains the production evidence for the change.
Track recommendation acceptance rate - Measure what fraction of generated recommendations are accepted, modified and accepted, or rejected. A low acceptance rate indicates that recommendations are not targeting real problems or are proposing incorrect fixes. A high acceptance rate (above 60%) indicates the system is generating high-quality, actionable recommendations. Tune the detection logic based on this feedback.

Tip

Start the SDI model with one high-value, low-risk recommendation category: N+1 database query detection. N+1 queries are easy to detect from traces (47 queries where 1 was expected), the fix is well-understood (add eager loading), and the production impact is significant. Building and validating this one category end-to-end gives you the architecture for all subsequent recommendation categories.

6 steps to get from here to the next level

Common Pitfalls

Recommendations that are technically correct but ignore business constraints. Infrastructure might recommend replacing a feature with a cached static version for performance, without knowing that the feature requires real-time data for regulatory reasons. Recommendations need to be reviewed for business context before automatic PR creation. Building a human review step into the recommendation pipeline prevents automation from proposing architecturally incompatible changes.

Attribution errors sending recommendations to the wrong developer. When multiple developers have modified the relevant code recently, the system may attribute a production issue to the wrong person's change. Poor attribution creates noise and erodes trust in the recommendation system. Build attribution with the git blame data and deployment causality data together, not just one or the other.

Recommendation volume overwhelming developers. A system that generates 50 recommendations per week creates exactly the kind of alert fatigue you were trying to eliminate. Apply a value filter: only generate recommendations that are estimated to have significant production impact (above some latency or error rate threshold). Five high-impact, high-confidence recommendations per week are more valuable than 50 low-confidence suggestions.

Not closing the feedback loop on accepted recommendations. When a developer accepts a recommendation and merges the fix, the system should verify in production that the fix achieved the expected improvement. If the recommended change on line 234 did not actually reduce the query count, the detection logic is wrong. Measuring post-fix production behavior and comparing to the predicted improvement is how you validate and improve the recommendation engine over time.

Treating recommendations as orders. The SDI model works best when recommendations are framed as suggestions backed by evidence, not mandates. "Your deployment caused this endpoint to become 40% slower; here is the evidence and here is a suggested fix - let us know if you disagree" is collaborative. "Fix this or your deployment will be blocked" destroys the trust that makes the system useful.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob wants production performance to be a first-class engineering concern, not something discovered in quarterly performance reviews. He has observed that performance regressions often go undetected for weeks because there is no mechanism connecting production metrics to the developers who write the code.

What Bob should do: Bob should frame the SDI model as the closing of the feedback loop between production and development. His goal is simple: no performance regression should persist for more than 24 hours without the responsible developer being notified with specific evidence. Bob should work with Victor to build the production-to-code attribution layer and define the top 3 recommendation categories for their system. Bob should also set a team expectation: performance recommendations from the production system are treated with the same priority as test failures in CI. A recommendation saying "your code is causing a 40% latency regression in production" should be investigated the same day it arrives.

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah wants developers to feel connected to the production impact of their code. Currently, developers write code, it passes CI, it gets merged, and it disappears into the production black box. They receive no signal about whether their changes performed well or poorly in production unless something breaks badly enough to trigger an incident.

What Sarah should do: Sarah should introduce the SDI recommendation system as a developer experience improvement, not a surveillance mechanism. The framing is: "you now have a production data window that shows you exactly how your code performs under real conditions." Sarah should track developer engagement with recommendations: are developers clicking through to the evidence, reading the suggested fixes, and providing feedback? Low engagement indicates the recommendations are not landing effectively - perhaps they are arriving at the wrong time, in the wrong format, or with insufficient context. Sarah should also ensure the recommendation system is blameless: the goal is to improve code quality collectively, not to create a record of who introduced regressions.

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor is building the technical infrastructure for the SDI model. He has OTel traces, Prometheus metrics, and git history. He needs to connect these to create a system that generates specific, actionable code change recommendations.

What Victor should do: Victor should build the attribution pipeline in three stages. Stage 1: link traces to deployments (already done with deployment annotations). Stage 2: link deployment code changes to trace performance (compare P99 latency distributions before and after each deployment, attributing changes to the diff). Stage 3: link performance regressions to specific code constructs (find the spans that got slower in the trace diff, look up the source code for those spans using the OTel code.function attribute, feed the code to an analysis agent that identifies the problematic pattern). Victor should then build a GitHub App that creates PR comments when a deployment introduces a regression, with the evidence (before/after trace distributions) and the code suggestion (agent-generated fix). This closes the loop from production signal to developer action within the existing workflow developers already use.

What Victor should do - role-specific action plan