Back to Infrastructure
infrastructureL4 OptimizedAgent Runtime & Sandboxing

Ephemeral devboxes: 10s spin-up (Stripe benchmark)

The 10-second devbox spin-up is the performance target that Stripe's agent infrastructure team set as the benchmark for production-grade agent environments.

  • ·Ephemeral devboxes spin up in under 10 seconds (Stripe benchmark)
  • ·Devboxes come pre-loaded with codebase, dependencies, and MCP tools
  • ·Kernel-level policy enforcement restricts agent actions (syscall filtering, resource limits)
  • ·Devbox spin-up P99 latency is under 30 seconds
  • ·Firecracker microVMs or equivalent provide VM-level isolation with container-level startup speed

Evidence

  • ·Devbox spin-up latency dashboard showing P50 under 10 seconds
  • ·Devbox snapshot configuration showing pre-loaded codebase, deps, and MCP tools
  • ·Kernel policy configuration (seccomp profiles, cgroup limits)

What It Is

The 10-second devbox spin-up is the performance target that Stripe's agent infrastructure team set as the benchmark for production-grade agent environments. The full sequence - environment provisioned, codebase present at the correct commit, dependencies installed, MCP tools configured, agent process started and ready to receive the task - in under 10 seconds. This target is not a theoretical aspiration; Stripe's "Minions" agent fleet achieves it in production with the infrastructure techniques described here.

Reaching 10 seconds requires combining multiple techniques. First, pre-warmed container pools (environments waiting in a ready state) eliminate the need to initialize from scratch on every task. Second, Firecracker microVMs or optimized Docker with snapshot support enable fast restore from a checkpointed state rather than full initialization. Third, content-addressable storage for the codebase means that cloning a large repository at a specific commit is a local operation against a pre-populated cache, not a network transfer from GitHub. Fourth, pre-installed and pre-authenticated MCP servers mean the agent does not spend time connecting and authenticating to tools when it starts.

Why does 10 seconds matter? Below roughly 30 seconds of startup time, developers experience the devbox as "instant" - they submit a task and by the time they have documented what they just did or reviewed the previous task, the new environment is ready. Above 30 seconds, there is a perceptible wait that breaks the developer's flow and adds mental overhead to the task dispatch decision. The 10-second benchmark is a usability threshold, not just a technical achievement.

The gap between a naive implementation (3-5 minute cold start) and the 10-second benchmark is mostly infrastructure investment in pre-warming and caching. The actual agent runtime and codebase initialization have to happen either way - the question is whether they happen on the critical path (cold start) or in the background before the task arrives (pre-warming). Getting to 10 seconds means moving as much initialization work as possible off the critical path.

Why It Matters

  • 10-second startup makes parallel agent dispatch practical - when each devbox starts in 10 seconds, dispatching 10 parallel agent tasks has an 10-second startup overhead; at 3-minute cold start, that becomes 30 minutes, effectively forcing sequential execution
  • Sub-30-second startup unlocks quick tasks - developers will only dispatch an agent task if the expected task duration is substantially longer than the startup overhead; at 10-second startup, tasks as short as 2 minutes become worth dispatching to an agent
  • Startup time is a lever on iteration rate - an agent that can start a new task in 10 seconds enables faster iteration cycles; the feedback loop between task completion and task dispatch tightens when startup is transparent
  • 10 seconds sets the standard that CI systems can meet - CI pipelines that provision agent environments for automated test remediation, automated PR review, or automated dependency updates need startup times that do not add significant latency to the overall pipeline
  • Pre-warming infrastructure for devboxes overlaps with pre-warming for CI - the investment in fast devbox spin-up improves the startup time for automated pipeline agents simultaneously, compounding the return on infrastructure investment

Getting Started

6 steps to get from here to the next level

Common Pitfalls

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

B
BobHead of Engineering

Bob's team has devboxes running but the startup time is 4 minutes. Developers are using them but grumbling about the wait, and Bob knows the team is not getting the full parallelism benefit because the startup cost makes it hard to justify dispatching tasks for short work. Bob has heard about the 10-second benchmark and wants to achieve it but does not know whether it requires a full infrastructure rewrite.

What Bob should do - role-specific action plan

S
SarahProductivity Lead

Sarah has measured that developers dispatch agent tasks less frequently than expected based on the scale of work they are doing. When she follows up, developers consistently say that they factor in the startup time when deciding whether to use an agent: "it is not worth waiting 4 minutes for a task that will take 8 minutes." If the startup were under 30 seconds, developers would dispatch many more short tasks to agents. Sarah sees startup time reduction as a direct productivity lever.

What Sarah should do - role-specific action plan

V
VictorStaff Engineer - AI Champion

Victor has been tracking Stripe's engineering blog and read the Minions post about 10-second devbox spin-up. He has replicated several of the techniques in his personal setup: a pre-warmed Docker container with the codebase at HEAD and dependencies installed, achieving ~15-second startup times. He is confident that 10 seconds is achievable with Firecracker snapshots and a proper pool manager.

What Victor should do - role-specific action plan