Vercel SDI model: infra recommends code changes
The Vercel Software-Defined Infrastructure (SDI) model describes a paradigm where infrastructure does not just run code - it actively analyzes production behavior and surfaces spec
- ·Production anomaly detection auto-creates tickets and triggers agent investigation
- ·Self-healing for known patterns: agent detects known error pattern, applies known fix, deploys, and verifies
- ·Infrastructure recommends code changes based on production data (Vercel SDI model)
- ·Auto-created tickets include full context (traces, logs, affected users, similar past incidents)
- ·Self-healing success rate is tracked (% of auto-fixes that resolve the issue without human intervention)
Evidence
- ·Auto-ticket creation logs triggered by production anomalies
- ·Self-healing event logs showing detection, fix, deploy, and verification steps
- ·Infrastructure recommendation pipeline configuration (production data to code change suggestions)
What It Is
The Vercel Software-Defined Infrastructure (SDI) model describes a paradigm where infrastructure does not just run code - it actively analyzes production behavior and surfaces specific, actionable code changes back to developers. The infrastructure becomes a feedback mechanism: it observes what the code does in production, identifies patterns that indicate performance problems, reliability risks, or optimization opportunities, and generates concrete recommendations in the form of code diffs or PR suggestions. The flow is bidirectional: code defines how infrastructure behaves, and infrastructure informs how code should be written.
The Vercel implementation is the clearest public example of this pattern. Vercel's infrastructure analyzes deployment size, cold start frequency, bundle composition, edge caching behavior, and runtime performance for every deployment. When it detects that a specific JavaScript import is adding 800KB to a serverless function bundle, it surfaces that information as a recommendation: "this import can be replaced with this smaller alternative, which would reduce your cold start time by 340ms." When it detects that a route is being rendered server-side when it could be statically generated, it proposes the change. The infrastructure is not just a passive execution environment - it is an opinionated advisor that knows the production implications of code decisions and communicates them directly.
At L4, implementing this model requires connecting three systems: the observability stack (which captures production behavior), a code analysis layer (which links production behavior to specific code), and a developer workflow integration (which surfaces recommendations as PRs, comments, or notifications). The observability stack captures that request X is slow. The code analysis layer identifies that request X executes this specific function, which has this N+1 query pattern. The developer workflow integration creates a PR suggesting the fix. The infrastructure moves from "here is what is happening" (monitoring) to "here is what you should change" (prescription).
The key technical enabler is the linkage between production traces and source code. OpenTelemetry traces capture which functions were called during a request, how long they took, and what database queries they triggered. When these traces are analyzed against the source code (via a code intelligence layer), the production behavior can be attributed to specific lines of code. A trace showing 47 database queries for a single request can be attributed to the ORM call on line 234 of user_service.py that generates an N+1 query pattern. The infrastructure recommendation is then not abstract ("you have an N+1 query problem") but specific ("change line 234 to use select_related('profile') to reduce this to 1 query").
Why It Matters
Infrastructure that recommends code changes closes the loop between production reality and development decisions:
- Zero-latency production feedback for developers - instead of discovering performance problems months later when users complain, developers receive specific recommendations based on production behavior within hours of deployment
- Recommendations are specific to the production context - unlike static analysis tools that flag potential issues, production-informed recommendations are triggered by actual runtime behavior under real traffic patterns and data distributions
- Eliminates the "I did not know it would behave that way in production" failure mode - developers who cannot predict production behavior from development testing get real production data instead
- Infrastructure expertise encoded as recommendations - not every developer knows optimal database query patterns, bundle composition, or caching strategies; production-informed recommendations democratize this expertise
- Creates a continuous improvement cycle - each recommendation accepted and merged improves production performance, which refines the signal for subsequent recommendations; the system gets better at identifying valuable optimizations over time
Getting Started
6 steps to get from here to the next level
Common Pitfalls
Mistakes teams actually make at this stage - and how to avoid them
How Different Roles See It
Bob wants production performance to be a first-class engineering concern, not something discovered in quarterly performance reviews. He has observed that performance regressions often go undetected for weeks because there is no mechanism connecting production metrics to the developers who write the code.
What Bob should do - role-specific action plan
Sarah wants developers to feel connected to the production impact of their code. Currently, developers write code, it passes CI, it gets merged, and it disappears into the production black box. They receive no signal about whether their changes performed well or poorly in production unless something breaks badly enough to trigger an incident.
What Sarah should do - role-specific action plan
Victor is building the technical infrastructure for the SDI model. He has OTel traces, Prometheus metrics, and git history. He needs to connect these to create a system that generates specific, actionable code change recommendations.
What Victor should do - role-specific action plan
Further Reading
5 resources worth reading - hand-picked, not scraped
From the Field
Recent releases, projects, and discussions relevant to this maturity level.