Basic sandboxing (Docker)

Basic Docker sandboxing wraps the agent's execution environment in a container that is isolated from the host system.

·Dedicated development environments exist for agent execution (separate from developer's primary workspace)
·Basic sandboxing via Docker or equivalent containers is implemented
·Agent credentials are scoped per project (not a single org-wide key)

·Container images for agent environments are versioned and reproducible
·Credential rotation schedule exists for agent-scoped keys

Evidence

·Docker or container configuration files for agent environments
·Credential management configuration showing per-project scoping
·Environment provisioning documentation or scripts

May 2026 Update

Lighter-weight wrappers emerged for Claude Code in particular. cc-mini wraps execution in bubblewrap for safer per-command sandboxing without spinning up Docker; the Claude Code source-map leak revealed the production permission layers (granular, per-tool, per-domain) that vendors are now exposing as configuration. Use these when Docker overhead is too high for fast inner-loop work but you still need isolation from the host filesystem and network.

Pair the runtime sandbox with a per-session token spend cap - the community documented multiple "agentic fork bomb" incidents in April, including a $3,800 overnight bill. Process isolation alone does not protect the bill.

What It Is

Basic Docker sandboxing wraps the agent's execution environment in a container that is isolated from the host system. The agent process runs inside the container with access only to what has been explicitly mounted and injected - a specific project directory, specific environment variables for credentials, and specific network access. The host's home directory, SSH keys, cloud credentials, and other projects remain invisible to the container.

Docker sandboxing is the most accessible form of agent isolation. Every developer on a modern team has Docker installed. Creating a Dockerfile for an agent runtime takes an afternoon. The result is a meaningful security boundary: the agent can affect files in the mounted project directory and make network calls to permitted hosts, but nothing else on the developer's machine is in scope.

The mechanics are simple. You create a Docker image that contains the agent's runtime dependencies (language runtimes, CLI tools, the agent software itself). When you want to run an agent task, you run a container from that image with a volume mount for the project directory and environment variables for the specific credentials the task needs. The container's filesystem is ephemeral - when the container exits, any state not in the mounted volume is gone. This ephemerality is a feature: the agent cannot accumulate persistent state outside the project directory.

Docker sandboxing does not solve every problem. Containers are not full VMs - a container running as root with broad Linux capabilities provides weaker isolation than a VM. Docker's default network mode allows outbound connections to any host. And Docker's startup time (5-30 seconds depending on image size) is acceptable for interactive developer use but becomes a bottleneck when spinning up hundreds of agent tasks. Basic Docker sandboxing is the right L2 tool; stronger isolation models (seccomp profiles, network policy, Firecracker VMs) come at L4.

Why It Matters

Credential isolation without workflow disruption - mounting project-specific credentials as environment variables means agents get exactly what they need for the task, without inheriting the developer's full credential set
Repeatable agent execution environment - the Dockerfile defines exactly what tools the agent has available, eliminating the "agent worked on my machine but not in CI" class of problems
Safe blast radius for mistakes - an agent running in a container with a mounted project directory can corrupt files in that directory but cannot touch other projects, SSH keys, or cloud configurations
Fast path to compliance - running agents in containers creates the audit trail and access boundary documentation that compliance auditors expect to see
Foundation for CI integration - Docker containers are the native execution unit of CI systems; an agent that runs in a container locally can run identically in CI without any modification

Getting Started

Write a minimal agent Dockerfile - Start with the base image for your project's language runtime, add the agent tool (Claude Code or similar), add any required CLI tools (git, the package manager), and set a working directory. Keep the image small: it will be started frequently and image size directly impacts spin-up time.
Define a run script - Create a shell script (run-agent.sh) that runs the container with the correct volume mounts and environment variables. The mount should cover only the project directory (-v $(pwd):/workspace), and environment variables should be injected for the specific credentials the task needs.
Remove host credential access - Explicitly verify that the agent container cannot access ~/.aws, ~/.ssh, or ~/.config/gcloud. Run the container and check that env | grep AWS returns nothing. This verification step is easy to skip but important to do.
Test the container with a real task - Run a task the agent commonly performs (fix a failing test, add a linter rule) inside the container and verify that the output is identical to what the agent would produce locally. Fix any missing tools or permissions that the container lacks.
Configure network restrictions - By default, Docker containers can reach any host the developer's machine can reach. Use Docker network modes or iptables rules to restrict outbound access to a permitted list (GitHub, the package registry, the project's staging environment). This step is often deferred but is important for security.
Integrate with your IDE agent workflow - Configure Claude Code or Cursor to use the container as its execution environment. Both tools support custom execution environments that allow the IDE to remain on the host while agent commands execute inside a container.

Tip

Keep a docker-compose.yml in the repository root that defines the agent runtime environment. This makes it trivial for any developer to run agents in the same sandboxed environment, and it documents the intended execution context for new team members.

6 steps to get from here to the next level

Common Pitfalls

Running the container as root. The default Docker behavior runs the container process as root, which means file writes from the container are owned by root on the host volume mount. This creates permission problems and also means the agent has root-level access within the container, bypassing any file-level restrictions. Add --user $(id -u):$(id -g) to the Docker run command to run as the current user.

Using fat base images that slow down spin-up. A Docker image that includes everything the agent might ever need (multiple language runtimes, all cloud CLIs, every possible tool) will be 3-5 GB and take 30+ seconds to start on a cold pull. Build lean, task-specific images and use multi-stage builds to minimize image size. Target under 500 MB for a fast spin-up experience.

Mounting the home directory instead of the project directory. A common mistake is mounting /Users/developer instead of /Users/developer/projects/myapp. This re-exposes all the credential files the sandboxing was supposed to protect. Always mount the minimum necessary directory, never the home directory.

Forgetting to version the agent image. A Docker image with a latest tag that gets updated periodically is not reproducible. Tag agent images with the date or a semantic version and record which image version was used for which agent sessions. This makes debugging much easier and creates the audit trail that security teams need.

Not building the image in CI. If the agent image is only built on developer machines, it will drift over time and different developers will have different images. Build the agent image in CI, push it to a registry, and have all developers pull from the registry. This ensures a consistent, controlled execution environment.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob's team has been running agents in the developer's IDE with no isolation. A recent security audit flagged this as a risk and Bob has been asked to show a remediation plan. He knows Docker is the right tool but does not want to disrupt developer workflows while they are in the middle of a product sprint.

What Bob should do: Bob should assign one infrastructure engineer to build the basic Docker agent runtime image over a two-week sprint. The deliverable is a Docker image, a run script, and a one-page setup guide that any developer can follow in 30 minutes. Bob should then run a voluntary pilot with 3-4 interested developers who work with the infrastructure engineer to make the container experience feel seamless. After the pilot, Bob presents to the security team: "we have basic containment in place for agent execution, and here are the developers already using it." This gives the security audit something to point to while the full rollout follows over the next quarter.

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah is concerned that requiring Docker containers for agent execution will slow developers down. She has heard anecdotes that "the container setup is complex" and "agents run slower in containers." She wants to support better security practices but not at the cost of developer experience.

What Sarah should do: Sarah should measure the actual overhead before assuming it is prohibitive. Run a controlled comparison: 10 agent tasks locally vs. the same 10 tasks in the Docker container. Measure wall-clock time for the developer (not just agent execution time). If the overhead is under 5%, it is negligible - developer perception of overhead often exceeds reality. If the overhead is significant, Sarah should work with infrastructure to optimize the image and run script to eliminate it. The goal is a container experience that is transparent to the developer: run the same commands, get the same results, with isolation happening invisibly underneath.

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor has been thinking about Docker sandboxing for months and has been building a prototype on his own. He has a working Docker image for the team's main service with Claude Code installed and the project's dependencies pre-installed. Startup time is under 10 seconds. He has been running all his agent sessions in this container for two weeks without any workflow disruption.

What Victor should do: Victor should write up the prototype as a "getting started with agent sandboxing" guide with the actual Dockerfile, run script, and VS Code configuration he uses. He should present it at the next engineering all-hands as "this is what agent security looks like in practice - I have been running it for two weeks and it does not slow me down." The combination of concrete implementation and lived experience from a respected engineer is far more persuasive than a theoretical security proposal. Victor should offer to pair with every developer who wants to set it up, targeting full team adoption within 30 days.

What Victor should do - role-specific action plan