Pre-loaded services, code, MCP tools

A pre-loaded devbox is one where everything the agent needs to do its work is already running and available when the task starts - not just the codebase, but the dependent services

·Ephemeral devboxes spin up in under 10 seconds (Stripe benchmark)
·Devboxes come pre-loaded with codebase, dependencies, and MCP tools
·Kernel-level policy enforcement restricts agent actions (syscall filtering, resource limits)

·Devbox spin-up P99 latency is under 30 seconds
·Firecracker microVMs or equivalent provide VM-level isolation with container-level startup speed

Evidence

·Devbox spin-up latency dashboard showing P50 under 10 seconds
·Devbox snapshot configuration showing pre-loaded codebase, deps, and MCP tools
·Kernel policy configuration (seccomp profiles, cgroup limits)

What It Is

A pre-loaded devbox is one where everything the agent needs to do its work is already running and available when the task starts - not just the codebase, but the dependent services, the MCP servers, the observability tooling, and the test infrastructure. The agent does not spend time starting a local database, connecting to the code search service, authenticating MCP tools, or waiting for a background service to become healthy. It starts and everything is already there.

The "services" in a pre-loaded devbox are the local instances of dependencies the agent needs to run tests and validate its work: a database seeded with test data, a message broker with test queues, a stub for the external payment service, a local caching layer. These are the same services a developer would spin up with docker compose up before starting work, but in a pre-loaded devbox they are already running at environment creation time.

The "MCP tools" dimension is equally important. An agent working in a pre-loaded devbox has its MCP servers already running and authenticated: the GitHub MCP server connected to the repository, the code search MCP server with the codebase indexed, the Jira MCP server authenticated to the project, the observability MCP server connected to the staging metrics endpoint. The agent can call any tool immediately without setup overhead. For agents that use 5-10 MCP tools, the startup time saved by pre-loading them is significant.

This pattern is the full realization of the devbox concept. A devbox that only has the codebase pre-loaded saves codebase initialization time. A devbox that has the codebase, services, and MCP tools pre-loaded reduces the total time-to-productive for an agent task to the bare minimum: the time to read the task description and start executing.

Why It Matters

Agent time-to-first-action drops to near zero - an agent in a fully pre-loaded environment can execute its first meaningful action within seconds of receiving the task, rather than spending minutes on initialization
Service startup failures do not block agent tasks - in environments where agents start their own services, startup failures (database not healthy, port conflict, configuration error) block the task entirely; pre-loaded services that are already running and healthy eliminate this class of failure
MCP tool pre-authentication eliminates auth failures - MCP tools that are pre-loaded and pre-authenticated do not fail because a token expired or an auth flow is not supported in the non-interactive environment
Consistent service state across all tasks - a pre-loaded database seeded with the same test data ensures that every agent task starts from the same service state, making results reproducible and test failures comparable
Enables complex multi-service tasks - tasks that require multiple services to be running and healthy simultaneously (e.g., an agent testing a microservice that depends on a database, a cache, and a downstream API stub) become reliable when all services are pre-loaded

Getting Started

Catalogue the services your agents need - For each agent use case, list every service the agent needs to be running: databases, caches, message brokers, stub services, local API proxies. This catalogue is the manifest of what needs to be pre-loaded in the devbox.
Create a docker-compose file for the devbox service stack - Define all dependent services in a docker-compose file that starts with the devbox. Include health checks for each service. The devbox environment is not considered ready until all services report healthy.
Build the MCP server pre-load configuration - Create a static MCP server configuration file (for Claude Code: .claude/mcp_servers.json) that defines all MCP servers the agent will use, with pre-configured connection parameters and credential references. When the devbox starts, include a step that starts all configured MCP servers and verifies they are healthy.
Seed the database with test data - Create a database seed script that populates the test database with the data fixtures the agent's tasks require. Run the seed script as part of devbox initialization before the environment is marked ready. Version the seed data alongside the service configuration.
Implement a readiness check - Before marking a devbox as available in the pool, run a readiness check that verifies: all services are running and healthy, all MCP servers are responding, the codebase is at the correct commit, and the agent process starts without errors. Only devboxes that pass all readiness checks enter the available pool.
Monitor service health within running devboxes - Services in a long-running devbox can die or become unhealthy during an agent task. Implement a sidecar health monitor that restarts unhealthy services and notifies the devbox manager if a service cannot be recovered (which may warrant aborting and recreating the devbox).

Tip

Pre-loading services adds complexity to the devbox definition and increases startup time. Start by pre-loading only the services that agents most frequently need and measure whether the improvement justifies the complexity. Add services incrementally based on measured impact, not based on theoretical completeness.

6 steps to get from here to the next level

Common Pitfalls

Including production service connections in the pre-load configuration. A pre-loaded devbox that connects to staging databases or staging API endpoints at startup exposes those services to the agent and increases the risk of staging environment pollution. Pre-loaded services should be local, in-devbox instances - not connections to shared environments.

Not version-controlling the service configuration. The docker-compose file, MCP server configuration, and database seed data that define the pre-loaded devbox should live in the repository alongside the code. When services or their configurations change, the devbox configuration should be updated in the same PR. A devbox definition that drifts from the current codebase will start failing in hard-to-diagnose ways.

Treating service startup errors as devbox startup errors. A devbox that fails to start because a service's Docker image tag does not exist, or because a seed script has a bug, is not available for tasks. Track service startup failure rates separately from devbox startup failures. Service startup failures often indicate a problem with the configuration that needs to be fixed independently of the agent runtime.

Not testing that pre-loaded services actually work for agent tasks. A database that starts and passes a health check (TCP connection succeeds) may still fail when the agent runs queries against it (schema version mismatch, missing data). Health checks should verify functional readiness, not just connectivity. For databases, a health check that runs a sample query against the test tables is more reliable than a basic connection check.

MCP server configuration that includes global/shared credentials. MCP servers pre-loaded in a devbox should use task-scoped credentials, not shared team credentials. An MCP server that is pre-authenticated with a shared team GitHub token gives the agent org-wide access. Pre-load MCP servers with per-devbox credentials generated at creation time.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob's team has ephemeral devboxes working, but agent tasks frequently fail in the first 5 minutes with service-related errors: database connection failures, MCP server authentication errors, missing test data. Developers spend time debugging these environment issues before the agent can do any actual work. Bob is starting to wonder whether the devbox overhead is worth it.

What Bob should do: Bob should classify the failure modes before deciding. He should ask developers to log their last 10 agent task failures with the first error message. If the majority of failures are "service not available," "MCP auth failed," or "database not seeded," the fix is better pre-loading, not abandoning devboxes. Bob should assign an infrastructure engineer to build out the service pre-loading stack: docker-compose for services, seed scripts for test data, and MCP server pre-authentication. The investment should cut the "environment setup failures" category from the failure distribution significantly. If environment failures are not the majority of failures, the investment should go toward whatever failure category is largest.

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah is observing that agents spend a significant portion of each task session on setup: starting services, authenticating tools, waiting for databases to be ready. In her monitoring, the average agent task spends 8-12 minutes on initialization before doing anything productive. This is 20-30% of the typical task duration. Pre-loading services and tools is a direct way to reclaim that time.

What Sarah should do: Sarah should instrument agent sessions to measure the time between task start and first substantive action (first file edit, first test run, first API call). The gap between task start and first substantive action is the "initialization overhead" that pre-loading targets. Sarah should set a target (initialization overhead under 60 seconds) and track it as a pipeline efficiency metric. When pre-loading is implemented, the before/after comparison of initialization overhead times is the ROI metric that justifies the investment.

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor has built a fully pre-loaded devbox configuration for his team's main service: database with 50 seed records, Redis cache, GitHub MCP server, code search MCP server, and the LLM API connection. When an agent task starts in his environment, it can immediately run tests, search the codebase, and push commits without any setup. His agent sessions start executing within 15 seconds of task dispatch.

What Victor should do: Victor should package his devbox configuration as a template for the team. The template should include: the docker-compose file with all services, the MCP server configuration, the database seed script, the readiness check script, and documentation for how to extend the template for new services. Victor should run a working session with the infrastructure team to walk through the template and explain the design decisions. He should also document the places where his configuration is team-specific (the seed data, the specific MCP servers) vs. the parts that are generic infrastructure that any team can reuse. The goal is a configuration that other teams can adopt with minimal customization.

What Victor should do - role-specific action plan