# Dedicated runners per team

> Dedicated runners per team means each engineering team has its own isolated pool of CI runners, not shared with other teams.

Perspective: delivery
Source: https://visdom-maturity-matrix.virtuslab.com/guides/delivery/dedicated-runners-per-team

## What It Is

Dedicated runners per team means each engineering team has its own isolated pool of CI runners, not shared with other teams. When Team A pushes code, their jobs run on Team A's runners. When Team A's agents generate a burst of 30 CI jobs in an hour, those jobs saturate Team A's pool - not the organization-wide shared pool. Team B's developers continue to get immediate CI feedback, unaffected by Team A's load.

The technical implementation varies by platform. In GitHub Actions, this means registering self-hosted runners with team-specific labels (`runner: team-a`) and configuring team workflows to use those labels. In BuildKite, it means creating separate queues per team and assigning agents to team-specific queues. In CircleCI, it means using organization runner classes with team-specific resource class configurations. The operational model also varies: some organizations let teams manage their own runner pools, others have a platform team provision and manage pools on teams' behalf.

Beyond load isolation, dedicated runners enable per-team customization. Team A's runners can have pre-installed dependencies specific to Team A's services. Team B's runners can have a pre-warmed Docker cache for Team B's base images. A team with security requirements can run on runners with additional compliance controls. Shared runners must be configured for the lowest common denominator - or configured with a complex matrix that's hard to maintain. Dedicated runners can be configured precisely for each team's needs.

For organizations scaling AI agent adoption, dedicated runners are the infrastructure prerequisite for sustainable agent use. Without isolation, agent adoption is inherently a zero-sum game: each team's agents consume shared capacity at the expense of other teams. With dedicated runners, agent adoption is contained to the adopting team's pool, and teams can independently choose how many runners to provision based on their agent usage level.

## Why It Matters

- **Agent load is contained to the generating team** - bursts of agent-driven CI activity don't spill over to affect other teams, removing the organizational friction that shared queues create
- **Predictable CI performance per team** - with a dedicated pool, a team can size their runners for their actual load (human + agent) and guarantee consistent p50 queue times
- **Enables per-team CI customization** - pre-installed toolchains, pre-warmed caches, security controls, and resource sizes can be tuned per team rather than averaged across the organization
- **Clear cost accountability** - each team's CI costs are visible and attributable; teams that adopt agents heavily see their CI costs increase and can make informed capacity decisions
- **Foundation for agent-specific runner pools** - dedicated team runners are the stepping stone to the next level: dedicated agent sandbox pools where each agent gets its own ephemeral environment

## Getting Started

1. **Audit current shared runner usage by team** - Use your CI platform's analytics API or the organization admin console to pull per-repository CI minutes consumed over the last 30 days. Identify the top 3 teams by CI consumption - they're the highest-priority candidates for dedicated runners and the ones most likely currently creating queue pressure for others.
2. **Register dedicated runners for the top-load teams first** - For GitHub Actions: provision VMs (AWS EC2, GCP Compute, Azure VMs) with the GitHub Actions runner agent installed, register them with `--labels team-a`, and update Team A's workflow files to specify `runs-on: [self-hosted, team-a]`. For BuildKite: create a team-specific queue and register agents to it.
3. **Right-size the runner pool for team load** - Look at peak concurrent CI jobs for each team over the past week. The dedicated runner pool should have enough capacity to handle the team's expected peak load (human push frequency + agent push frequency) without significant queuing. Start with 2-3 runners per team and scale based on actual queue time data.
4. **Configure runner autoscaling** - Static runner pools either waste money (over-provisioned for off-hours) or cause queuing (under-provisioned for peak hours). Implement autoscaling: GitHub Actions supports autoscaling with Actions Runner Controller (Kubernetes) or with AWS Auto Scaling Groups. BuildKite supports Elastic CI Stack for AWS. Target: runners scale up within 60 seconds when queue depth grows, scale down after 10 minutes of inactivity.
5. **Update CI workflows to use team-specific runners** - This is the most mechanical step but requires coordination. Update each team's workflow files to specify the team-specific runner label. Do this as a single PR per team to minimize coordination overhead. Keep the shared pool as a fallback during the transition.
6. **Establish a runner SLA and monitoring** - Define a target: 95% of CI jobs start within 60 seconds of submission (queue time SLO). Set up monitoring for queue depth and wait time per team pool. Alert when queue time exceeds the SLA. This transforms CI performance from a passive complaint to an actively monitored service.

> **Tip**: Start with autoscaling configured conservatively (scale up at queue depth > 2, scale down after 15 minutes of idle) and tune based on observed behavior. Teams that over-provision static runners pay for unused capacity 24/7. Autoscaling reduces costs while maintaining performance.

## Common Pitfalls

**Provisioning dedicated runners without autoscaling.** A static pool of 4 runners is excellent at 9 AM when agents are running, but wastes money at 2 AM when nothing is happening. Implement autoscaling from the start or plan to migrate to it immediately - the cost savings from autoscaling typically pay for the implementation effort within weeks.

**Forgetting to update branch protection rules and environment configurations.** When you migrate from shared runners to dedicated runners, environment configurations (secrets, environment-specific runner requirements) may be tied to the old runner labels. Audit every workflow file and environment configuration that references runner labels before migrating, to avoid broken deployments.

**Creating team-specific runners without per-team cost reporting.** The cost visibility benefit of dedicated runners only materializes if per-team costs are reported. Set up per-team cost dashboards (tagging cloud resources by team, using GitHub Actions billing reports broken down by repository) at the same time as provisioning the runners.

**Under-provisioning for agent peak load.** Teams provision runners for human push frequency and then add agents. When agents start running, they generate 5-10x the CI load of the human team. Runners that handled human load fine are immediately saturated by agent load. When provisioning dedicated runners, size for anticipated agent load, not current human load.

**Neglecting shared runner maintenance during migration.** During the migration to dedicated runners, some workflows still run on the shared pool. Don't let the shared pool degrade during the transition - it's still serving teams that haven't migrated yet. Maintain the shared pool's capacity until all teams have dedicated runners.

## Bob - Head of Engineering

Bob's team has been assigned blame for slowing down the organization's shared CI due to agent usage. He knows he needs to move his team to dedicated runners, but he's not sure who owns runner provisioning - the platform team or his team. There's a two-week coordination delay while he waits for the platform team to prioritize the work.

Bob should accelerate by provisioning dedicated runners himself using GitHub Actions' self-hosted runner feature and his team's AWS account. The platform team's involvement is needed for the long-term operational ownership, but the immediate isolation can be done in a day without waiting for the platform team. Bob should provision 4 EC2 runners tagged with his team's name, register them with GitHub, update his team's workflows to use them, and send a note to the platform team: "We've provisioned dedicated runners for our team to resolve the shared pool saturation issue. Here are the instance IDs and the configuration we used - can you take operational ownership?" This approach resolves the immediate problem and gives the platform team a working reference implementation, rather than waiting for them to start from scratch.

## Sarah - Productivity Lead

Sarah's CI feedback latency dashboard has been showing degraded queue times for the last three weeks, and the pattern is clear: queue times spike during business hours when agents are running and recover overnight. The shared pool is chronically undersized for AI-era load. Multiple teams have complained; none have taken action because runner provisioning is perceived as the platform team's responsibility.

Sarah should present the queue time data at the next engineering leadership meeting and propose a clear policy: teams that adopt AI agents must provision dedicated runner pools within 30 days of beginning agent use. The policy makes runner provisioning a precondition for agent adoption, preventing the "tragedy of the commons" pattern where everyone's agent use degrades shared infrastructure. Sarah should also define the standard: minimum 4 autoscaling runners per team with an Actions Runner Controller or equivalent, and a queue time SLO of 95% of jobs starting within 60 seconds. The policy gives teams a clear target and the platform team a standard to implement once and reuse across teams.

## Victor - Staff Engineer - AI Champion

Victor's team already has dedicated runners with autoscaling configured using Actions Runner Controller on a shared Kubernetes cluster. His runners scale from 0 to 8 instances within 45 seconds of a queue spike, and scale back to 0 during off-hours, making his team's CI costs near-zero outside business hours.

Victor should package his Actions Runner Controller configuration as a Helm chart or Terraform module that other teams can deploy with minimal customization (just a team name and runner label). A self-service runner provisioning system - where a team opens a PR against the infrastructure repo with a two-line config addition, and a GitHub Action automatically provisions their dedicated runner pool - is the L5 version of this capability. Victor is the right person to build it: he has the Kubernetes access, the Actions Runner Controller expertise, and the credibility to get platform team buy-in on the automation approach. The ROI is clear: each team that self-provisions saves the platform team 2-3 hours of provisioning work.

## Links

- [GitHub Actions - Self-hosted runners](https://docs.github.com/en/actions/hosting-your-own-runners)
- [Actions Runner Controller - Kubernetes-native GitHub Actions runners](https://github.com/actions/actions-runner-controller)
- [BuildKite - Elastic CI Stack for AWS](https://github.com/buildkite/elastic-ci-stack-for-aws)
- [CircleCI - Self-hosted runner overview](https://circleci.com/docs/runner-overview/)
- [GitLab CI - Autoscaling GitLab Runner](https://docs.gitlab.com/runner/configuration/autoscale.html)
