Back to Infrastructure
infrastructureL5 AutonomousAgent Runtime & Sandboxing

Agent fleet on dedicated compute

An agent fleet on dedicated compute is the infrastructure pattern where AI agent workloads run on a distinct, purpose-built compute layer that is separate from developer laptops, C

  • ·Dedicated compute infrastructure exists for agent fleet (not shared with developer workstations or production)
  • ·Agent fleet auto-scales with load (agents scale up during business hours, scale down off-hours)
  • ·Each agent runs in a fully isolated environment (Cursor approach: one machine per agent, or smart resource management)
  • ·Cost per agent-hour is tracked and optimized
  • ·Fleet scaling responds to demand within 60 seconds

Evidence

  • ·Infrastructure allocation showing dedicated agent compute (separate from dev and prod)
  • ·Auto-scaling configuration and scaling event logs
  • ·Agent fleet dashboard showing per-agent isolation and resource utilization

What It Is

An agent fleet on dedicated compute is the infrastructure pattern where AI agent workloads run on a distinct, purpose-built compute layer that is separate from developer laptops, CI infrastructure, and application workloads. Instead of agents sharing resources with other systems, there is a compute cluster specifically sized, configured, and optimized for running agent processes. The fleet is managed as a first-class infrastructure concern with its own capacity planning, monitoring, autoscaling, and operational runbooks.

At L5, agent workloads are substantial enough to warrant dedicated compute. A team running 50-100 simultaneous agent tasks, 24 hours a day, 7 days a week, needs compute that is always available, appropriately sized, and not subject to resource contention from unrelated workloads. Sharing this with CI (which has peak-and-trough load patterns) or with application servers (which have different resource profiles) creates interference that degrades agent reliability and response time.

The dedicated compute layer is typically Kubernetes-based, because Kubernetes provides the primitives needed for agent fleet management: pod isolation, resource quotas, namespace separation, and integration with autoscaling systems. Agent devboxes run as Kubernetes pods in a dedicated namespace, with node selectors that pin them to nodes in the agent node pool. The agent node pool uses instance types optimized for agent workloads: high memory (agents are memory-intensive), fast local NVMe storage (disk I/O is a key bottleneck for concurrent agent workloads), and high-bandwidth networking (large codebases move significant data).

The dedicated fleet is also the level at which hardware specialization becomes relevant. Cursor's engineering team documented that disk I/O is the hidden bottleneck when running hundreds of agents: each agent reads large numbers of files, and concurrent file reads on a shared disk system create I/O contention that serializes what should be parallel work. Dedicated NVMe SSDs with high IOPS ratings, or a distributed filesystem optimized for concurrent reads, is the solution at this scale.

Why It Matters

  • No resource contention from unrelated workloads - CI pipelines, application servers, and agent workloads have different resource profiles; mixing them on shared compute creates unpredictable performance characteristics that are hard to diagnose and fix
  • Compute can be optimized for agent-specific workloads - agent processes are memory-heavy, disk-read-heavy, and episodically network-intensive; dedicated compute uses instance types and storage configurations matched to this profile rather than to general-purpose workloads
  • Capacity planning becomes straightforward - with dedicated agent compute, you can measure utilization, forecast growth, and make capacity decisions independently of other workloads; shared compute makes capacity planning for agents nearly impossible
  • Fleet-level observability enables optimization - when all agent workloads run on dedicated compute, fleet-level metrics (tasks per node, disk IOPS per task, memory per task, network bandwidth per task) become meaningful for optimization; this data does not exist when agents are scattered across shared infrastructure
  • Reliability becomes a compute infrastructure SLO - a dedicated fleet can have an SLO (e.g., 99.9% task start within 30 seconds, 99.5% task completion without infrastructure failure) that is distinct from application SLOs; this SLO commitment drives infrastructure investment and is auditable

Getting Started

6 steps to get from here to the next level

Common Pitfalls

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

B
BobHead of Engineering

Bob's organization has reached the scale where agent workloads are competing with CI and application workloads for shared compute. CI times are getting longer, agent tasks are getting slower, and the operations team is complaining about resource contention they cannot explain. Bob needs to make the case for dedicated agent compute but is having trouble quantifying the business case.

What Bob should do - role-specific action plan

S
SarahProductivity Lead

Sarah has been tracking agent task throughput and has noticed that performance degrades during peak hours - afternoons when most of the engineering team is online and running agents simultaneously. The per-task completion time is 30-40% longer during peak hours than off-peak hours. This degradation is directly impacting developer productivity: developers are running agents off-peak (evenings, early mornings) to get reasonable performance, which is not a sustainable workflow.

What Sarah should do - role-specific action plan

V
VictorStaff Engineer - AI Champion

Victor has been watching the agent infrastructure evolve and is ahead of the curve on the dedicated compute question. He has been advocating for dedicated agent compute for three months based on his own performance analysis. He has data showing that disk IOPS are the binding constraint at the current shared infrastructure scale: when more than 20 agents run concurrently on the shared nodes, disk wait time accounts for 40% of agent task duration.

What Victor should do - role-specific action plan