Ephemeral devboxes: 10s spin-up (Stripe benchmark)

The 10-second devbox spin-up is the performance target that Stripe's agent infrastructure team set as the benchmark for production-grade agent environments.

·Ephemeral devboxes spin up in under 10 seconds (Stripe benchmark)
·Devboxes come pre-loaded with codebase, dependencies, and MCP tools
·Kernel-level policy enforcement restricts agent actions (syscall filtering, resource limits)

·Devbox spin-up P99 latency is under 30 seconds
·Firecracker microVMs or equivalent provide VM-level isolation with container-level startup speed

Evidence

·Devbox spin-up latency dashboard showing P50 under 10 seconds
·Devbox snapshot configuration showing pre-loaded codebase, deps, and MCP tools
·Kernel policy configuration (seccomp profiles, cgroup limits)

What It Is

The 10-second devbox spin-up is the performance target that Stripe's agent infrastructure team set as the benchmark for production-grade agent environments. The full sequence - environment provisioned, codebase present at the correct commit, dependencies installed, MCP tools configured, agent process started and ready to receive the task - in under 10 seconds. This target is not a theoretical aspiration; Stripe's "Minions" agent fleet achieves it in production with the infrastructure techniques described here.

Reaching 10 seconds requires combining multiple techniques. First, pre-warmed container pools (environments waiting in a ready state) eliminate the need to initialize from scratch on every task. Second, Firecracker microVMs or optimized Docker with snapshot support enable fast restore from a checkpointed state rather than full initialization. Third, content-addressable storage for the codebase means that cloning a large repository at a specific commit is a local operation against a pre-populated cache, not a network transfer from GitHub. Fourth, pre-installed and pre-authenticated MCP servers mean the agent does not spend time connecting and authenticating to tools when it starts.

Why does 10 seconds matter? Below roughly 30 seconds of startup time, developers experience the devbox as "instant" - they submit a task and by the time they have documented what they just did or reviewed the previous task, the new environment is ready. Above 30 seconds, there is a perceptible wait that breaks the developer's flow and adds mental overhead to the task dispatch decision. The 10-second benchmark is a usability threshold, not just a technical achievement.

The gap between a naive implementation (3-5 minute cold start) and the 10-second benchmark is mostly infrastructure investment in pre-warming and caching. The actual agent runtime and codebase initialization have to happen either way - the question is whether they happen on the critical path (cold start) or in the background before the task arrives (pre-warming). Getting to 10 seconds means moving as much initialization work as possible off the critical path.

Why It Matters

10-second startup makes parallel agent dispatch practical - when each devbox starts in 10 seconds, dispatching 10 parallel agent tasks has an 10-second startup overhead; at 3-minute cold start, that becomes 30 minutes, effectively forcing sequential execution
Sub-30-second startup unlocks quick tasks - developers will only dispatch an agent task if the expected task duration is substantially longer than the startup overhead; at 10-second startup, tasks as short as 2 minutes become worth dispatching to an agent
Startup time is a lever on iteration rate - an agent that can start a new task in 10 seconds enables faster iteration cycles; the feedback loop between task completion and task dispatch tightens when startup is transparent
10 seconds sets the standard that CI systems can meet - CI pipelines that provision agent environments for automated test remediation, automated PR review, or automated dependency updates need startup times that do not add significant latency to the overall pipeline
Pre-warming infrastructure for devboxes overlaps with pre-warming for CI - the investment in fast devbox spin-up improves the startup time for automated pipeline agents simultaneously, compounding the return on infrastructure investment

Getting Started

Baseline your current startup time - Time the full initialization sequence for your existing devbox setup and identify the slowest steps. Common slow steps: git clone (2-5 minutes for large repos), npm install / pip install / go mod download (1-5 minutes), Docker image pull (30-120 seconds). The bottlenecks tell you where to invest.
Implement Firecracker for microVM-speed isolation - If you need VM-level isolation (stronger than containers) without VM-level startup overhead, Firecracker microVMs are the technology. Firecracker boots in under 125 milliseconds and supports snapshotting - saving and restoring VM state. Set up Firecracker, configure a snapshot of the agent runtime state post-initialization, and measure the restore time.
Switch to snapshot-based initialization - Instead of running git clone and npm install in every container, create a Docker image snapshot that embeds the repository at a recent commit with dependencies installed. Container initialization then runs git pull --depth=1 to apply recent commits (fast, local-cache-friendly) rather than a full clone.
Deploy content-addressable object storage for source - Bazel's remote cache, GitHub's object storage, or a custom content-addressable store can serve repository contents at specific commits from a local cache. Instead of cloning from GitHub over the network, the container fetches the specific files it needs from a local cache server.
Pre-authenticate MCP servers in the base snapshot - Include MCP server processes in the container snapshot in a running, authenticated state. When the container restores from snapshot, MCP servers are already running and connected. The agent does not spend time starting or authenticating tools.
Measure the full end-to-end time from task submit to agent-ready - Do not optimize just container startup time. Measure the time from when the developer submits a task to when the agent sends its first message. This end-to-end time includes pool assignment, credential injection, and agent process startup. Target 10 seconds for this full sequence.

Tip

Separate the "what is slow" question from the "what should I optimize" question. Profile the initialization sequence step by step before optimizing. Teams that try to optimize without profiling often improve the wrong step. The biggest win is usually the dependency installation step, which is also the easiest to pre-warm.

6 steps to get from here to the next level

Common Pitfalls

Optimizing container startup but not pool availability. A container that starts in 5 seconds but is not available in the pre-warmed pool results in a cold start when demand exceeds pool capacity. The pool size must match peak demand. A fast container that is always cold-starting is no better than a slow container for the tasks that hit the capacity limit.

Snapshotting at the wrong initialization stage. Snapshotting the container before dependency installation saves the image pull time but not the install time. Snapshot after all initialization steps are complete: repository cloned, dependencies installed, MCP tools running, agent runtime configured. The snapshot should be the fully-ready state, not a partially-initialized state.

Not invalidating stale snapshots. A snapshot that was created 5 days ago has a codebase that is 5 days behind main. Tasks that work on recent commits will spend time applying a large diff, negating the startup benefit. Refresh snapshots at least daily for active repositories, triggered by commits to monitored branches.

Treating 10 seconds as a set-and-forget achievement. Infrastructure evolves: codebases grow, new dependencies are added, tool configurations change. A startup time that is 10 seconds today may be 30 seconds in 6 months without maintenance. Track startup time as a continuous metric and investigate when it exceeds the target.

Not benchmarking the tail latency. Average startup time of 10 seconds can coexist with a P99 startup time of 90 seconds if pool exhaustion causes occasional cold starts. The developer experience is defined by the worst cases, not the average. Monitor P50, P95, and P99 startup times and investigate the tail.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob's team has devboxes running but the startup time is 4 minutes. Developers are using them but grumbling about the wait, and Bob knows the team is not getting the full parallelism benefit because the startup cost makes it hard to justify dispatching tasks for short work. Bob has heard about the 10-second benchmark and wants to achieve it but does not know whether it requires a full infrastructure rewrite.

What Bob should do: Bob should commission a startup time analysis sprint: one infrastructure engineer spends one week profiling the current initialization sequence and identifying the top three slowest steps. The output is a prioritized optimization plan with estimated effort and expected impact for each step. Most teams will find that the top two steps (dependency installation and codebase cloning) account for 80% of the startup time, and both can be addressed with pre-warming and snapshot techniques. Bob should then allocate a two-week infrastructure sprint specifically for startup time optimization, with the explicit target of under 30 seconds (a major improvement from 4 minutes) as the first milestone, 10 seconds as the stretch goal.

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah has measured that developers dispatch agent tasks less frequently than expected based on the scale of work they are doing. When she follows up, developers consistently say that they factor in the startup time when deciding whether to use an agent: "it is not worth waiting 4 minutes for a task that will take 8 minutes." If the startup were under 30 seconds, developers would dispatch many more short tasks to agents. Sarah sees startup time reduction as a direct productivity lever.

What Sarah should do: Sarah should estimate the value of the suppressed agent tasks. If developers are passing on 3 agent tasks per day because the startup cost is too high, and each of those tasks would take 15 minutes manually, that is 45 minutes of daily agent leverage not being captured. For a team of 20 developers, that is 900 person-minutes of productivity per day. The math will justify a significant infrastructure investment in startup time reduction. Sarah should use this calculation to fund the infrastructure work and set a success metric: after startup time drops below 30 seconds, measure whether the agent task dispatch rate increases. The before/after comparison becomes the ROI evidence for the investment.

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor has been tracking Stripe's engineering blog and read the Minions post about 10-second devbox spin-up. He has replicated several of the techniques in his personal setup: a pre-warmed Docker container with the codebase at HEAD and dependencies installed, achieving ~15-second startup times. He is confident that 10 seconds is achievable with Firecracker snapshots and a proper pool manager.

What Victor should do: Victor should propose a concrete architecture for the team's path to 10-second devbox startup: switch from Docker cold-start to Docker pre-warmed pools as an immediate improvement (30-second startup), then implement Firecracker with snapshots as the next step (10-second target). Victor should write the proposal as a one-page architecture document with estimated implementation effort for each phase. The proposal should include the expected startup times at each phase, grounded in his own prototype measurements. Victor should also identify which team members have the skills to implement Firecracker integration (it requires Linux systems knowledge) and propose a staffing plan.

What Victor should do - role-specific action plan