Tests written manually, coverage < 40%
The baseline testing state at L1 - manual test writing, chronic under-coverage, and the compounding debt that makes AI-generated code increasingly risky to ship.
- ·Test suite exists but coverage is below 40%
- ·Tests are written manually by developers
- ·Team is aware of flaky test impact (16% of dev time per Google data)
- ·AI-generated tests are reviewed for circular testing (testing what code does, not what it should do)
Evidence
- ·Coverage report showing sub-40% line coverage
- ·Test authorship in git history (manual, no agent attribution)
What It Is
At Level 1 (Ad-hoc), all tests are written by hand, one at a time, by developers who are also responsible for the code they're testing. Coverage is typically below 40% - not because developers don't know better, but because writing tests is slow, repetitive, and routinely deprioritized when deadlines approach. The result is a codebase where more than half the logic runs in production with no automated verification at all.
This isn't a discipline problem. It's a structural one. Manual test writing requires a developer to hold two mental models simultaneously: what the code is supposed to do, and how to express that in test code. Under deadline pressure, the second model is the first casualty. Tests get written for the happy path, edge cases get a comment that says "TODO: test this," and that comment stays in the codebase for years.
The sub-40% threshold is significant because it marks the point where the test suite provides the illusion of coverage rather than actual safety. A 38% coverage number sounds like meaningful progress until you realize the covered 38% is almost entirely trivial utility functions, while the business-critical payment logic, state machines, and integration paths have nothing guarding them.
At L1, this situation becomes a compounding problem as soon as AI agents enter the picture. AI-generated code is fast but needs verification. Without tests, you can't verify AI output at scale. The lack of coverage that was manageable when humans wrote code at human speed becomes a critical liability when an agent can produce 500 lines of unverified logic in minutes.
Why It Matters
Low test coverage at L1 is not just a quality issue - it's a rate limiter on every subsequent maturity level:
- AI verification gap - Without tests, you cannot safely accept AI-generated code. Every PR becomes a manual review exercise, erasing the speed gains from generation.
- Refactoring paralysis - Developers are afraid to touch legacy code because there's no safety net. Technical debt compounds and architecture degrades over time.
- Hidden regressions - Changes that break existing behavior go undetected until production. The cost of discovering bugs in production is 5-10x the cost of catching them in CI.
- False velocity - Teams feel productive (tickets are closing) but quality is silently degrading. The debt surfaces as a production incident, not a planning item.
- Blocker for automation - Automated merge decisions, incremental test selection, and self-healing test suites (L4-L5) are impossible without a reliable, comprehensive test foundation.
The path out of L1 is not to write more manual tests - it's to introduce AI-assisted test generation (L2) while simultaneously tracking coverage as a first-class metric. The goal at L2 is not perfection but momentum: every PR either maintains coverage or increases it.
Start by measuring what you actually have. Run a coverage report against your main branch and find the five highest-risk files with the lowest coverage. Make those the first targets for AI-generated tests at L2. You don't need to boil the ocean - you need a beachhead.
Getting Started
6 steps to get from here to the next level
Common Pitfalls
Mistakes teams actually make at this stage - and how to avoid them
How Different Roles See It
Bob's team has been shipping for two years without a systematic testing conversation. Coverage is estimated at around 35% - but it's never been formally measured. A recent production incident (a refactor broke an integration that had no tests) cost the team a full day of incident response and a difficult conversation with a customer.
What Bob should do - role-specific action plan
Sarah is trying to make the case for expanding AI tooling to include test generation, but her stakeholders keep asking: "If developers aren't writing tests now, why will AI tools fix that? Won't they just generate bad tests?" She doesn't have a great answer yet.
What Sarah should do - role-specific action plan
Victor has 85% coverage on his services because he practices TDD religiously. He's frustrated watching the rest of the team ship undertested code that he eventually gets paged about at 2am. He's suggested writing tests multiple times in code review and been told "we'll add them later."
What Victor should do - role-specific action plan
Further Reading
6 resources worth reading - hand-picked, not scraped
From the Field
Recent releases, projects, and discussions relevant to this maturity level.
Testing Strategy