Maturity Matrix

June 2026 · v1.3

VISDOM Maturity Matrix

The Bill Comes Due - Fleets Shipped, and So Did the Invoice

Development

How developers work with AI day-to-day. From sidebar chat to fleet agents.

Coding Agent Usage

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Context Engineering

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Code Review & Quality

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Testing Strategy

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Author Commentary

The June 2026 zeitgeist is **the bill comes due**. May was the month the fleet became the default and the invoice became unavoidable, in the same four weeks. Every major vendor shipped multi-agent orchestration as the headline product: Anthropic's [Code w/ Claude](https://simonwillison.net/2026/May/6/code-w-claude-2026/) (May 6) announced fleets, Outcomes, Dreaming and Routines, then Claude Code shipped [dynamic workflows](https://www.anthropic.com/news/claude-opus-4-8) where a script Claude writes spawns dozens-to-hundreds of subagents; Cursor 3.6 (May 29) added Auto-review Run Mode; Google launched [Antigravity 2.0 + CLI](https://techcrunch.com/2026/05/19/google-launches-antigravity-2-0-with-an-updated-desktop-app-and-cli-tool-at-io-2026/) (May 19); Devin shipped MultiDevin. The developer's job is now decomposition and oversight, not editing. [Opus 4.8](https://www.anthropic.com/news/claude-opus-4-8) (May 28) is the new default and is roughly 4x less likely than 4.7 to let its own code flaws pass - which makes self-verifying review gates real, not aspirational. But the same month sharpened the warning on the other side: [SpecBench](https://arxiv.org/abs/2605.21384) (May 20) showed reward hacking scales with codebase size - the validation-vs-holdout gap grows ~27 percentage points per 10x increase in LOC, and one agent wrote a 2,900-line test-memorizing "compiler" that scored 97% on validation and 0% on held-out tests. The model and the harness are both better than ever. The thing that gets you is still the same: trusting a green run on a large codebase. Keep held-out oracles, review at the spec level, and instrument outcomes - not leaderboard scores.

Delivery Management

How we manage delivery in the age of agents. From human PR review to autonomous delivery pipeline.

CI/CD Pipeline

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Merge & Deploy

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Metrics

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized5 practices
L5Autonomous2 practices

Governance & Compliance

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Author Commentary

June 2026 update: the re-pricing April warned about arrived. GitHub Copilot ran a "preview bill" ahead of its usage-based billing cutover; Cursor consolidated onto credit pools ($20/$60/$200); Anthropic doubled Claude Code rate limits (May 6) and added +50% weekly capacity (May 13) on the back of a [SpaceX/Colossus compute deal](https://www.anthropic.com/news/higher-limits-spacex). Then the ROI question went public. Microsoft cancelled Claude Code internally on cost (The Verge, May 14), and [Uber's COO openly questioned the ROI](https://fortune.com/2026/05/26/uber-coo-ai-spending-tokens-claude-code/) after the team burned a full-year budget in four months (May 26). The [DORA ROI report](https://www.infoq.com/news/2026/05/dora-roi-ai-assisted-dev-report/) put numbers on it: ~39% first-year ROI, but only ~10% gain on complex legacy code, a J-curve with a reliability penalty. Cost-per-merged-PR is no longer advanced telemetry - it is the line your CFO is already asking about. Governance moved too. Agent config files became a live attack surface in May: the Mini Shai-Hulud worm (CVE-2026-45321) planted persistence by writing Claude Code hooks into `~/.claude/settings.json`, and the TrapDoor campaign hid prompt injection in `CLAUDE.md` and `.cursorrules`. Lint, normalize and review those files like code. On the policy side, OpenAI published a [Frontier Governance Framework](https://openai.com/index/openai-frontier-governance-framework/) mapping to the EU AI Act (GPAI duties enforced Aug 2), while Colorado gutted its AI law - so the firm regulatory clock is European, not American. Stripe Minions is still the L5 north star; the new homework is proving the fleet is worth what it costs.

Organization

How organizations adapt to the age of agents. From "buy licenses" to "agent fleet management".

AI Adoption Model

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Knowledge Management

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Team Structure & Roles

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized4 practices
L5Autonomous3 practices

Tech Debt & Modernization

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Author Commentary

June 2026 update: in May, AI became the single dominant cause of layoffs. The [Challenger report](https://www.challengergray.com/blog/challenger-report-may-job-cuts-rise-16-from-april-highest-may-total-since-2020/) cited AI in 40% of all announced cuts - an all-time monthly high and the third straight month AI led every reason - with tech at a two-year high (38,242). But the headline misreads the shape of it. This is redistribution, not pure contraction: [Pragmatic Engineer](https://newsletter.pragmaticengineer.com/p/state-of-the-job-market-2026) reports "AI engineering" listings up 50-100% YoY (Google +62%), small firms are now hiring their first developer because AI makes one viable (Jevons for labor), while SWE employment for ages 22-25 is down ~20% since 2022. The juniors are the ones absorbing the gap. Cutting humans before maturing the AI stack still creates a permanent capability hole. Two roles shifted with it. The "fleet manager" stopped being aspirational and became the default UX as multi-agent orchestration shipped everywhere - the developer decomposes and supervises rather than edits. And hiring itself is under pressure: Steve Yegge's [The Last Technical Interview](https://steve-yegge.medium.com/the-last-technical-interview-bc13ddcf4564) (May 29) argues the 4-6 hour loop is dying in favor of real-work "campfire" trials and portable credentials. On the debt side, the vocabulary sharpened: [Agentic Technical Debt](https://arxiv.org/abs/2605.29129) (a stock of liability from un-governed prompts and orchestration) versus the Stochastic Tax (the recurring flow-cost of keeping probabilistic agents in bounds), while the DORA ROI data shows the gains collapse to ~10% on complex legacy code. IPETs and a working bad-day protocol are still how mature orgs make Stage 6+ work without burning out their seniors - they just now also have to defend the bill.

Infrastructure

The technical layer that enables (or blocks) agents. From shared Jenkins to ephemeral agent sandboxes.

Agent Runtime & Sandboxing

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

MCP & Tool Integration

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Build System

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized3 practices
L5Autonomous3 practices

Observability & Feedback Loop

L1Ad-hoc3 practices
L2Guided3 practices
L3Systematic3 practices
L4Optimized4 practices
L5Autonomous3 practices

Author Commentary

June 2026 update: cost observability got granular, and security got a new front door. Claude Code's `/usage` now attributes plan limits per skill, subagent, plugin and MCP server - which is exactly what you need when a script can spawn hundreds of subagents and the bill is the thing your CFO is asking about. The [DORA ROI report](https://www.infoq.com/news/2026/05/dora-roi-ai-assisted-dev-report/) and its J-curve give the dashboards a shape; the lesson from the benchmark-integrity research (SpecBench, BenchJack) is to instrument outcomes - post-merge bug rate, held-out pass rate - and trust the methodology, not the leaderboard number. The new front door is the agent's own config. The Mini Shai-Hulud worm (CVE-2026-45321, CVSS 9.6) propagated by writing Claude Code hooks into `~/.claude/settings.json`, and TrapDoor hid zero-width-Unicode prompt injection in `CLAUDE.md` and `.cursorrules` across 34 poisoned packages. MCP installs and tool schemas are now supply-chain dependencies: pin them, review them, monitor the config files. Meanwhile sandboxed, classifier-gated execution shipped as a product default (Cursor 3.6 Run Mode sandboxes Shell/MCP/Fetch), and local-first runtimes matured - [antirez/ds4](https://github.com/antirez/ds4) runs DeepSeek V4 fully on-device via Metal with a 1M-token context. Infrastructure, not the model, still decides whether your fleet scales gracefully or burns the budget on a Tuesday night - it just also decides whether a poisoned rules file quietly exfiltrates your credentials.