Agent-generated unit tests + human acceptance tests
A hybrid testing strategy at L2 that uses AI to generate unit test scaffolding at scale while keeping business-behavior verification in human hands.
- ·Agents generate unit tests; humans write acceptance tests
- ·Flaky test quarantine process is active (flaky tests are isolated, not deleted)
- ·Test oracle stabilization is underway (deterministic expected values for AI-generated tests)
- ·Flaky test count is tracked and reported weekly
- ·Quarantined tests have a resolution SLA (e.g., fix or delete within 30 days)
Evidence
- ·Test files with agent attribution alongside human-authored acceptance tests
- ·Quarantine list or label in test framework configuration
- ·Flaky test tracking dashboard or issue tracker labels
What It Is
The hybrid testing strategy at Level 2 (Guided) makes a deliberate division of labor: AI agents automatically generate unit tests for individual functions and components, while humans write acceptance tests that verify the system meets its business requirements. This isn't a compromise - it's a recognition that AI and humans are good at different things in the testing domain.
Unit tests are well-suited to AI generation. They test discrete, well-defined units of code - a function, a class, a module - against specific inputs and outputs. The structure is formulaic: setup, execute, assert. Given a clear function signature and type information, an AI agent can generate comprehensive unit tests covering happy paths, boundary values, null inputs, and error conditions faster and more exhaustively than a human would bother to do manually.
Acceptance tests are different. They verify that the system does what the product intended - and that intent lives in a ticket, a user story, a product specification, or the minds of the people who wrote the requirements. The AI doesn't have access to that information at L1-L2 (it learns to read requirements at L3). A human writing an acceptance test must understand what the feature is supposed to do and encode that understanding as an assertion. This is the one part of the testing workflow that cannot be delegated to an agent without first solving the requirements-comprehension problem.
The hybrid model at L2 captures most of the efficiency gains of AI test generation while preserving the correctness guarantees that only human-authored acceptance tests can provide. It also resolves the circular testing problem: unit tests may reflect what the code does, but acceptance tests verify what the code should do.
Why It Matters
The hybrid model matters because it operationalizes the division between two types of correctness:
- Mechanical correctness - Does the function handle null inputs? Does it throw the right exception? Does the loop terminate? AI is excellent at generating tests for these questions.
- Behavioral correctness - Does the feature do what the customer expected? Does the discount apply to the right tier? Is the permission model correct? Only humans can answer these questions without access to requirements.
- Coverage velocity - AI-generated unit tests can bring coverage from 35% to 75% within a sprint on a single service. Humans alone cannot move that fast. The hybrid model makes the coverage climb tractable.
- Risk-proportionate testing - High-risk business logic gets human-authored acceptance tests. Low-risk utility code gets AI-generated unit tests. Testing effort is proportionate to consequence, not distributed evenly.
- Scalable as code grows - As AI agents write more code at L3+, the unit test generation can scale automatically alongside code generation. The hybrid model is designed to scale with AI-assisted development.
Establish a naming convention to distinguish AI-generated unit tests from human-authored acceptance tests. A simple approach: *.unit.test.ts for AI-generated, *.acceptance.test.ts for human-authored. This makes the distinction visible in the codebase and enables tracking coverage by test type.
Getting Started
6 steps to get from here to the next level
Common Pitfalls
Mistakes teams actually make at this stage - and how to avoid them
How Different Roles See It
Bob wants to move the team from L1 to L2 testing practices and has been pitched the hybrid model. He's worried about the implementation cost: configuring AI tools, training the team on the new process, and maintaining the boundary between test types over time. He's not sure it's worth the setup cost given the team's existing backlog.
What Bob should do - role-specific action plan
Sarah needs to demonstrate that the hybrid model provides better value than the L1 baseline. Coverage numbers are one metric, but she wants to show that the tests are actually catching bugs - not just providing coverage.
What Sarah should do - role-specific action plan
Victor has already adopted a version of the hybrid model informally: he writes unit tests quickly (sometimes using AI) and carefully authors scenario-based tests for complex business logic. He wants to formalize this into team-wide practice but isn't sure how to enforce the boundary without creating bureaucratic friction.
What Victor should do - role-specific action plan
Further Reading
6 resources worth reading - hand-picked, not scraped