Span of control = how many agents you can effectively supervise

Span of control is a management concept from organizational science: how many direct reports can one manager effectively supervise? The answer for human teams is typically 5-9, wit

·Developer role is formally defined as "manager of agent fleet"
·Span of control is measured: how many parallel agents each developer effectively supervises
·Performance evaluation includes agent supervision effectiveness (not just personal code output)

·Span of control target is defined per role (e.g., 3-5 agents for standard developers, 5-10 for senior)
·Agent supervision training is part of standard developer onboarding

Evidence

·Updated role descriptions defining developer as agent supervisor
·Span of control metrics dashboard
·Performance review criteria including agent supervision effectiveness

What It Is

Span of control is a management concept from organizational science: how many direct reports can one manager effectively supervise? The answer for human teams is typically 5-9, with 7 as a common optimum. The same concept applies to agent fleet management: how many AI agents can one developer effectively supervise simultaneously? The answer turns out to be in the same range - typically 3-7 - and for the same underlying reasons.

The parallel between human team management and agent fleet management is not superficial. In both cases, the limiting factor is the supervisor's attention and cognitive bandwidth. A manager with 12 direct reports cannot give each person the direction, feedback, and support they need; quality degrades and people go off track. A developer with 8 parallel agents cannot review each agent's output carefully, catch errors promptly, and course-correct quickly enough; quality degrades and agents go off track. The cognitive load of maintaining a mental model of "where is each unit, what do they need, what decision needs to be made next" scales with the number of units, and eventually exceeds the supervisor's capacity.

The 3-7 range for agents specifically reflects two constraints. The lower bound (3) represents the minimum parallelism that actually delivers a throughput multiplier worth the coordination overhead. Running 2 agents has limited advantage over sequential work once you account for the attention cost. The upper bound (7) reflects the maximum number of simultaneous agent states that most developers can hold in working memory while maintaining adequate supervision quality. Some developers with highly developed task management systems and high-context, well-defined tasks can operate at 7-10; most developers working on moderately complex tasks with moderate context quality are most effective at 4-6.

The practical span of control for a given developer on a given day depends on several variables: task complexity (simple, well-defined tasks support higher spans), context quality (agents with excellent context make fewer disruptive decisions), review cycle discipline (structured 15-minute cycles support higher spans than ad-hoc monitoring), and the developer's experience with fleet management (experienced fleet managers can handle more agents than those new to the practice).

Why It Matters

Understanding span of control provides a principled framework for structuring AI-augmented work:

Sets realistic throughput expectations - teams that expect developers to run unlimited parallel agents will be disappointed; teams that plan around 4-6 agents per developer get accurate velocity projections
Identifies where productivity gains plateau - the throughput benefit of adding agents is not linear; the first 3 provide high additional value, the next 2-3 provide moderate additional value, beyond 7 the marginal value typically falls below the marginal supervision cost
Guides task design decisions - if you want to increase a developer's effective span, the most direct intervention is increasing context quality and task specificity so that agents need less supervision per task; this is a better investment than trying to increase raw supervision capacity
Informs hiring and team structure - an organization of 10 developers with spans of 5 agents per developer can sustain 50 simultaneous agent workstreams; this changes how you think about the relationship between headcount and development capacity
Creates a growth metric for developers - increasing your effective span of control is a concrete, measurable developer skill; a developer who moves from a span of 3 to a span of 6 has demonstrably become a more effective fleet manager; this is a new dimension of developer growth that the traditional role structure doesn't capture

Tip

Track your actual span of control over a two-week period. Log how many agents you started, how many you were supervising simultaneously at peak, and how often you had to clean up agent work that went significantly wrong. This is your real span of control, not your theoretical maximum. Start from this baseline rather than aspirational targets.

Getting Started

Establish your current effective span - run at your current natural comfort level for two weeks and observe: at what number of parallel agents does your review quality decline? At what number do you start missing agent problems at check-in? This is your current span ceiling. Most developers discover their effective span is 3-4 before they develop fleet management discipline.
Identify your span-limiting factor - is your limit cognitive (too many agent states to hold in working memory)? Logistical (your terminal setup makes it hard to see multiple agents at once)? Or review bottleneck (you can manage the agents but reviewing their outputs takes too long)? Different limiting factors have different solutions.
Address logistics before cognitive load - the easiest span expansion is tooling. A well-designed terminal layout showing all agents simultaneously (tmux split panes, multiple iTerm2 windows, a status dashboard) reduces the cognitive load of maintaining mental state about where each agent is. Set up your environment before trying to push your span higher.
Increase context quality to expand span - the biggest multiplier for span is agent context quality. Agents that are better contextualized make better decisions, need less supervision, and produce cleaner outputs. A 20% improvement in context quality can increase your effective span by 1-2 agents because you're spending less supervision time per agent. Invest in context before pushing span higher.
Expand span incrementally - don't jump from 3 to 6. Add one agent at a time and observe the effect on quality and review burden. If quality holds when you add a fourth agent, keep the four running for a week and observe again. If review burden becomes unsustainable, go back to three and identify the bottleneck before expanding.
Use task homogeneity to support higher spans - running 6 agents on similar, well-understood task types is easier than running 4 agents on highly varied complex tasks. When you want to operate at higher spans, batch similar work. When you have heterogeneous, complex tasks, operate at lower spans with more supervision time per agent.

6 steps to get from here to the next level

Common Pitfalls

Treating span as a competition. Some developers try to maximize span as a status signal ("I run 10 agents at once"). This is the wrong mental model. The goal is effective span - the maximum number of agents you can supervise while maintaining quality and not creating a cleanup workload that exceeds the throughput benefit. An effective span of 5 is better than a nominal span of 10 with poor quality.

Ignoring the review bottleneck. Many developers find that their span is not limited by supervision capacity but by review capacity. They can monitor 8 agents but can't review 8 completed PRs per day at the quality standard the team requires. Span of control must account for the full developer bandwidth cost: supervision plus review. If review is the bottleneck, increasing span creates a PR backlog that eliminates the throughput benefit.

Not accounting for variation in task complexity. A developer who calibrates span of 6 for simple, well-defined tasks will be overwhelmed if they try to run 6 agents on complex, ambiguous tasks. Task complexity is a first-order variable in effective span. Calibrate span to the tasks at hand, not to a general personal limit.

Failing to decrease span when context quality degrades. If context infrastructure is under-maintained - CLAUDE.md files become stale, MCP servers have outages, documentation falls behind code changes - the effective span will decrease because agents make more errors per task. Organizations that don't maintain context quality will see effective spans decrease as codebases evolve.

Missing the organizational span of control question. Individual span of control is one question; the organizational equivalent is another. How many total agents can the organization's infrastructure support simultaneously? How many agents can the review process absorb? These organizational span limits are distinct from individual developer limits and need to be planned for separately.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob is doing sprint planning and trying to estimate velocity for the upcoming quarter. His team has 8 developers operating at various levels of AI fluency. He's trying to figure out how many parallel agent workstreams the team can sustain and what the actual throughput multiplier is compared to pure human development.

What Bob should do: Bob should survey each developer's current effective span and use this to estimate total concurrent agent capacity. If six developers are effectively running 4-5 agents each and two are at 2-3 agents, the team has roughly 28-36 concurrent agent workstreams available. This is a more accurate planning number than "we have 8 developers with AI tools." Bob should also distinguish between developer time (thinking, specifying, reviewing) and agent time (executing tasks). A developer managing 5 agents is still working a full day - they're just doing different work. The throughput multiplier is in the agent execution, not in the developer headcount. Bob should communicate this distinction to stakeholders who ask "how many developers does this replace."

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah is designing the organizational metrics dashboard for AI adoption. She wants to include a "productivity" metric but is struggling to define what to measure. She's been asked to demonstrate that the AI tooling investment is delivering value, and stakeholder expectations are high.

What Sarah should do: Sarah should build the metrics around effective span of control because it captures the right thing: how effectively are developers using the AI capacity available to them? She should track, per developer per sprint: number of parallel agents run, number of tasks completed by agents versus directly, and review-to-agent ratio (how many PRs did the developer review versus write). The trend over time should show increasing effective span as developers develop fleet management skills, which translates to increasing throughput per developer. This metric is honest - it measures actual agent utilization, not just tool installation - and it provides actionable data for improving AI practices.

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor runs at a span of 6-7 agents effectively and has developed the most sophisticated fleet management workflow on the team. He's been asked to help two other developers increase their effective span from 3 to 5. He's not sure how to coach this transition because most of what he does has become intuitive.

What Victor should do: Victor should work backwards from his current practice to make the implicit explicit. He should spend a day narrating his decisions aloud as he manages his agent fleet: "I'm launching three agents now instead of five because these tasks are all in the same service and I want to sequence them to avoid conflicts. I'm checking in on agent two first because it's working in an area with unclear requirements. I'm not interrupting agent four because it's on track with a well-defined task." This narration reveals the judgment calls that experienced fleet management requires. Victor should record this or take notes, then work with the two developers he's coaching to help them make the same judgments consciously before they become automatic.

What Victor should do - role-specific action plan