Development
How developers work with AI day-to-day. From sidebar chat to fleet agents.
Coding Agent Usage
- ·At least one AI coding assistant (Copilot, Cursor, Claude Code) is installed and active for at least one developer
- ·AI autocomplete or chat is used at least once per week by the team
Evidence
- ·IDE plugin install count or license allocation records
- ·Git history showing AI-assisted commits (Copilot attribution tags or similar)
- ·At least one agentic IDE (Cursor, Windsurf, or Claude Code) is used by 50%+ of the team
- ·CLAUDE.md, .cursorrules, or equivalent agent instruction file exists in 100% of active repositories
- ·Agents operate in agentic/YOLO mode (multi-step edits without per-step approval)
Evidence
- ·Agent instruction files committed in repository root
- ·IDE telemetry or license dashboard showing agentic mode usage
- ·PR descriptions referencing agent-assisted development
- ·CLI agents (Claude Code, Codex) are the primary coding interface for 50%+ of feature work
- ·Per-team or per-repo rules files exist and are maintained with code review
- ·Coding conventions are written as explicit, agent-parseable rules (not implicit tribal knowledge)
Evidence
- ·CLI agent session logs or telemetry showing primary usage
- ·Rules files in repository with commit history showing regular updates
- ·Coding conventions document cross-referenced from agent instruction files
- ·Unattended agents (Stripe Minions model, Cursor Automations) execute tasks without developer presence
- ·Agents are invocable from at least two channels (Slack, CLI, Web, PagerDuty)
- ·Each developer runs 3-5 parallel agent sessions concurrently
Evidence
- ·Agent invocation logs from multiple channels with timestamps
- ·Dashboard showing parallel agent session counts per developer
- ·PR history showing agent-authored PRs merged without synchronous developer oversight
- ·Multi-agent orchestration system (planner-worker hierarchy) is in production
- ·Agent fleet sustains 100+ concurrent agents on the codebase
- ·Agent fleet produces 1,000+ commits per week without manual dispatch
Evidence
- ·Orchestration system dashboard showing planner-worker task flow
- ·Git history showing 1,000+ weekly commits attributed to agent fleet
- ·Agent fleet monitoring showing concurrent agent count and error recovery rate
Context Engineering
- ·Agent sees only the currently open file (no project-wide context)
- ·No structured context files (CLAUDE.md, AGENTS.md) exist in the repository
Evidence
- ·Absence of agent instruction files in repository
- ·README.md with last-modified date older than 6 months
- ·CLAUDE.md or equivalent exists with project description, tech stack, and top conventions
- ·Written coding conventions document exists and is referenced from agent instruction files
- ·Agent instruction files are committed to the repository (not local-only)
Evidence
- ·CLAUDE.md, .cursorrules, or .github/copilot-instructions.md in repository root
- ·Coding conventions document accessible from agent instruction files
- ·Commit history showing agent instruction file updates
- ·MCP servers provide structured context (architecture, ownership, SLAs) to agents
- ·Context is organized across at least 3 of the 5 levels: System, Code, Org, Historical, Operational
- ·Token budget management is implemented (agents receive context within defined token limits)
Evidence
- ·MCP server configuration files listing active context sources
- ·Token budget configuration in agent settings
- ·Context coverage audit showing 3+ context levels populated
- ·Organization pushes context to agents automatically (BYOC - Bring Your Own Context)
- ·Knowledge graph (Graph Buddy, CodeTale, or equivalent) is integrated with agent context pipeline
- ·Ticket-to-spec automation generates acceptance tests from requirements without manual writing
Evidence
- ·BYOC pipeline configuration showing automated context push triggers
- ·Knowledge graph dashboard showing repository coverage percentage
- ·Sample ticket-to-spec outputs with auto-generated acceptance tests
- ·Agents maintain persistent identity and memory across sessions (Beads/Git-backed)
- ·Production telemetry feeds back into agent context automatically (deploy, error, performance data)
- ·Agents detect stale documentation and update it without human initiation
Evidence
- ·Agent memory store with session-spanning entries and timestamps
- ·Production telemetry-to-context pipeline configuration with update frequency
- ·Git history showing agent-authored documentation updates with passing CI
Code Review & Quality
- ·All code is reviewed by a human before merge
- ·No automated review tooling beyond basic CI checks
Evidence
- ·PR approval records showing human reviewer on every merged PR
- ·Average review turnaround time in PR analytics
- ·AI-assisted review tool (CodeRabbit, Qodo, or equivalent) is active on all repositories
- ·Linter rules are configured and run in CI on every PR
- ·PRs clearly indicate whether code is AI-generated or AI-assisted (labels, tags, or commit metadata)
Evidence
- ·AI review tool configuration in CI pipeline
- ·Linter configuration file in repository
- ·PR labels or commit metadata distinguishing AI-generated code
- ·AI review agent runs as a first-pass reviewer on every PR before human review
- ·Lint rules enforce architectural standards (not just style) - the "Bug to Codify to Lint Rule" pipeline is active
- ·At least 3 architectural guardrail rules have been created from past bugs or incidents
Evidence
- ·CI configuration showing AI review agent as required check
- ·Lint rule change history showing rules created from incident post-mortems
- ·AI review agent output logs with severity categories
- ·Automated Green/Yellow/Red classification runs on every PR
- ·Green-classified PRs auto-merge without human review
- ·Auto-approve rate target of 60%+ Green PRs is tracked and reported
Evidence
- ·Dashboard showing Green/Yellow/Red distribution across PRs
- ·Auto-merge logs for Green PRs with zero post-merge reverts
- ·Monthly auto-approve rate report showing 60%+ Green target tracking
- ·Agent fleet self-reviews code (error-fix-converge loop) before submitting for merge
- ·Human review is limited to Red-classified PRs (architectural decisions only)
- ·Continuous auto-refactoring runs in background without human initiation
Evidence
- ·Agent iteration logs showing error-fix-converge cycles before PR submission
- ·PR analytics showing human review only on Red-classified PRs
- ·Auto-refactoring PR history with associated quality metrics
Testing Strategy
- ·Test suite exists but coverage is below 40%
- ·Tests are written manually by developers
Evidence
- ·Coverage report showing sub-40% line coverage
- ·Test authorship in git history (manual, no agent attribution)
- ·Agents generate unit tests; humans write acceptance tests
- ·Flaky test quarantine process is active (flaky tests are isolated, not deleted)
- ·Test oracle stabilization is underway (deterministic expected values for AI-generated tests)
Evidence
- ·Test files with agent attribution alongside human-authored acceptance tests
- ·Quarantine list or label in test framework configuration
- ·Flaky test tracking dashboard or issue tracker labels
- ·TORS (Test Oracle Reliability Score) is measured and exceeds 90%
- ·Acceptance tests are auto-generated from ticket requirements (Autonomous Requirements pipeline)
- ·Incremental test selection runs only tests affected by changed code paths
Evidence
- ·TORS dashboard showing 90%+ score with per-service breakdown
- ·Ticket-to-test pipeline configuration with sample outputs
- ·CI configuration showing incremental test selection (e.g., Bazel test targeting, Jest --changedSince)
- ·TORS exceeds 95%
- ·Agents iterate tests to green in isolated sandbox CI without blocking team CI queue
- ·Mutation testing validates that tests catch real defects (not just achieve coverage)
Evidence
- ·TORS dashboard showing 95%+ with per-service breakdown
- ·Sandbox CI logs showing agent iteration cycles separate from team CI
- ·Mutation testing reports showing kill rate and surviving mutants
- ·Test suite is self-healing (agent detects broken tests, diagnoses root cause, fixes without human input)
- ·Production logs automatically generate regression tests for observed failures
- ·Agents detect edge cases, write tests, fix bugs, and ship - full autonomous loop
Evidence
- ·Self-healing test commit history showing agent-diagnosed and agent-fixed test failures
- ·Production log-to-test pipeline configuration with sample generated tests
- ·End-to-end autonomous bug fix PRs (edge case detected, test written, fix shipped)
Author Commentary
The April 2026 zeitgeist is sobriety. After six months of "AI makes everything faster," the data is in: AI code has 2.74x more security vulnerabilities, generates 30-41% more tech debt, and developers who feel 20% faster are actually 19% slower. This doesn't mean AI coding is wrong — it means Level 1-2 AI coding without review infrastructure is dangerous. The organizations winning are the ones at L3+: lint-as-architecture, AI review agents as first pass, compliance gates in CI. The model is better than ever (Claude 4.6 Opus: 80.8% SWE-bench). The tooling is better than ever (Cursor 3, Claude Code Computer Use, MCP universal). The gap is governance. Start there.