PEV loop: Plan → Execute → Verify

The PEV loop - Plan, Execute, Verify - is the fundamental operating model for working with AI agents at high maturity.

·Agentic Engineer role combines orchestration, supervision, and architecture responsibilities
·PEV (Plan, Execute, Verify) loop is the standard workflow for all engineering tasks
·Non-coder contributors can produce software changes via agent interfaces

·Agentic Engineer career ladder exists with defined progression criteria
·Non-coder contribution rate is tracked as an organizational capability metric

Evidence

·Agentic Engineer role description with orchestration and supervision responsibilities
·PEV loop documentation and adoption evidence in team workflows
·Non-coder contributor logs showing software changes via agent interfaces

What It Is

The PEV loop - Plan, Execute, Verify - is the fundamental operating model for working with AI agents at high maturity. It provides a structured three-phase framework that applies at every scale: a single developer delegating one task to one agent, a fleet manager coordinating five parallel agents, and an Agentic Engineer operating an organizational-scale orchestration system. The loop's power is in its universality: the same three questions apply at every level of complexity. Plan: what exactly needs to be done, and what does the agent need to know to do it? Execute: how does the work get done with appropriate supervision? Verify: how do we know the output meets the requirements?

The Plan phase is where most of the work happens that determines quality. A well-planned task has: a clear scope (what's in and what's explicitly out), the context the agent needs (architecture, patterns, constraints), success criteria specified in terms the agent can verify (tests pass, specific behavior is present, specific file changes are made), and an explicit statement of what "done" looks like. The Plan phase is not a few seconds of writing a prompt - it is 10-20 minutes of structured thinking about the task, analogous to a tech lead doing careful task design before assigning work to a developer.

The Execute phase is not passive waiting - it is structured supervision. The developer (or orchestration system) monitors agent progress at defined intervals, intervenes when agents encounter decision points that require human judgment, and maintains the broader context that the agent can't hold in its context window. The Execute phase at the organizational level involves routing tasks to appropriate agent types, managing dependencies between parallel workstreams, and handling the exceptions that automated systems escalate.

The Verify phase closes the loop by validating that the Execute phase achieved what the Plan phase specified. This is not just running the tests (though that's part of it). At L5, Verify includes: automated quality checks calibrated to the specific task type, intent alignment validation (does the output match what was actually intended, not just what was formally specified?), side effect checking (what did the agent change beyond the explicit task scope?), and feedback capture (what in the Plan or Execute phase would have prevented any defects found in Verify?). The feedback from Verify feeds back into the Plan phase for the next iteration, making the loop truly cyclic rather than linear.

Why It Matters

The PEV loop creates discipline and reproducibility in agentic workflows that ad-hoc agent use cannot provide:

Makes quality predictable - teams with PEV discipline produce consistently high-quality AI outputs; teams without it produce highly variable outputs that depend on individual developer experience; PEV is what converts AI potential into reliable engineering practice
Identifies where quality problems originate - the three-phase structure makes it easy to diagnose failures: defects from poor planning (scope unclear, context missing), execution failures (insufficient supervision, wrong task routing), or verification gaps (tests didn't catch the real requirement, intent alignment wasn't checked); each diagnosis points to a specific improvement
Enables organizational-scale automation - at L5, the PEV loop can be partially automated: automated plan templates for standard task types, automated execution with built-in supervision checkpoints, automated verification using test suites and quality checks; the loop structure is what makes automation possible because it defines clear phase boundaries
Creates a common language for AI-augmented work - when all developers use the same three-phase mental model, communication improves: "I'm in plan phase for this feature" means something specific, "this failed at verify" points to a specific phase, "the plan was underspecified" is an actionable diagnosis
Scales from individual to organizational use - the same framework works for a single developer with one agent and for an Agentic Engineer designing a system that runs thousands of agent tasks per week; universal applicability makes it worth the investment in internalizing deeply

Tip

When an agent task fails, identify which phase the failure originated in. Most failures trace back to the Plan phase (underspecified context, unclear scope) rather than the Execute phase (agent behavior) or Verify phase (checking criteria). This means that improving Plan quality is the highest-leverage intervention for improving overall agent task success rates.

Getting Started

Formalize your Plan phase - create a task specification template that forces systematic planning: task description, context references, out-of-scope items, success criteria, test cases (or test plan), and expected output format. Using this template consistently for 2-3 weeks builds the planning habit; after that, the questions become instinctive even when you're not using the template.
Define your Execute phase supervision protocol - specify what supervision looks like for each task category in your team's work: what's the check-in frequency? What are the intervention criteria? What does "agent is on track" look like versus "agent needs redirection"? Write this down once as a team agreement, then apply it consistently.
Build the Verify checklist - for each major task type in your team's work, define what Verify means: which automated tests must pass, what code review checks apply, what intent-alignment questions the reviewer should ask, and what side effects to check. The checklist is not the full Verify process - it's the minimum floor below which no output should be merged.
Practice the loop on simple tasks first - before applying PEV to complex, high-stakes work, use it consistently on simple tasks where the failure cost is low. Building the habit on easy tasks means you have it naturally when the stakes are high.
Instrument the loop for organizational learning - track Plan-phase time, Execute-phase duration, and Verify-phase defect rate per task type. Over time, this data reveals which task types have the best-developed Plan templates (low Verify defects), which have execution problems (high supervision cost), and which have verification gaps (production defects not caught at Verify).
Design automated PEV systems - at L5, the PEV loop is partially automated. Design your organizational agent system with the three phases as explicit system components: plan templates that generate task specifications from requirements, execution infrastructure with built-in supervision checkpoints, and automated verification suites that run after each agent task completes.

6 steps to get from here to the next level

Common Pitfalls

Skipping the Plan phase under time pressure. The Plan phase is the first casualty when teams are under sprint pressure. "I don't have time to write a detailed spec, I'll just ask the agent" is the setup for a Verify failure that costs more time than the planning would have. Protect Plan time as a non-negotiable part of the task, not as optional setup overhead.

Treating Verify as "did the tests pass?" Running the automated test suite is necessary but not sufficient for Verify at L5. Intent alignment validation - does the output actually achieve what was intended, or does it achieve a plausible misinterpretation? - is the most commonly skipped Verify step and the source of the most expensive production defects. Verify must include a human (or sophisticated automated) judgment call about intent, not just a pass/fail against formal criteria.

Not capturing Verify feedback for Plan improvement. The PEV loop is only cyclic if Verify findings actually improve future Plan quality. When defects are found at Verify, the diagnosis should answer: "What should have been in the Plan to prevent this?" This diagnosis should update the Plan template for this task type. Without this feedback step, the loop is open-ended rather than closed-loop, and the same Planning deficiencies repeat indefinitely.

Applying the loop uniformly regardless of task risk. A simple refactoring task with high test coverage requires a lighter PEV loop than a new feature with business logic in an under-tested area. The PEV loop should scale with task risk: lighter planning, lighter supervision, and lighter verification for low-risk tasks; heavier investment in each phase for high-risk tasks. Uniform application of heavy PEV process to low-risk tasks creates overhead without proportional benefit.

Over-formalizing the loop at the individual level. The PEV loop is a mental model, not a form to fill out. For an experienced L5 practitioner, the three phases should be internalized enough that they run automatically - no template required. Over-formalizing the loop with mandatory documentation for every agent task creates bureaucratic overhead that crowds out the thinking that actually matters. Formalize the system-level PEV (the organizational orchestration architecture) and internalize the individual-level PEV (the developer's mental model for working with agents).

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob's team has high AI adoption and generally good agent usage, but the variance in output quality is still significant. Some developers produce excellent agent outputs consistently; others have boom-and-bust cycles: great results on some tasks, expensive messes on others. Bob suspects the inconsistency comes from inconsistent planning and verification practices.

What Bob should do: Bob should introduce PEV as a team standard and design a lightweight implementation that doesn't create overhead for experienced fleet managers while providing scaffolding for those who need it. He should create a minimal task specification template (10 fields, 5 minutes to complete), a supervision protocol document (three categories of agent tasks with corresponding check-in frequencies), and a verification checklist (per task type). He should then run a two-sprint experiment where all developers use the templates and observe whether variance decreases. The experiment is not about bureaucracy - it's about figuring out where the informal PEV loop is already working (experienced developers) and where it isn't (less experienced developers). The template data will show which phase is the weakest link for each developer, enabling targeted improvement.

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah is designing the L4 to L5 transition curriculum and needs to explain what changes when developers move from fleet management to agentic engineering. PEV is the answer - L4 developers use agent fleets with individual-level PEV loops; L5 Agentic Engineers design and operate organizational-scale PEV systems. But she needs to make this distinction concrete for people who haven't experienced both levels.

What Sarah should do: Sarah should design the L4-to-L5 curriculum around the PEV loop's scaling. The L4 module covers PEV as an individual practice: personal task specification discipline, personal supervision protocols, personal verification habits. The L5 module covers PEV as a system design problem: how do you design organizational infrastructure that automates the Plan phase (using templates and context delivery), structures the Execute phase (orchestration systems with built-in supervision), and systematizes the Verify phase (automated quality checks, observability, feedback loops)? The progression from personal practice to system design is the concrete description of what distinguishes L4 from L5.

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor has been using PEV implicitly for months - he just hasn't named it. When he reflects on his most effective agent workflows, they all have the same structure: careful task specification before launch, structured check-ins during execution, and a checklist-driven review of completed outputs. He hasn't articulated this to his colleagues, who don't have the same structure and produce less consistent results.

What Victor should do: Victor should name and share the loop. He should write a brief guide - one page - that describes his PEV approach in concrete terms: here's the 10-question task specification checklist I use before launching an agent, here's my supervision protocol (15-minute cycles, these intervention criteria), here's the verification checklist I use when reviewing agent outputs. He should share this as his personal workflow, not as a team mandate, and invite colleagues to try it. The naming is important: once people have a name for the framework, they can discuss it, adapt it, and build on it. Victor's informal documentation of his own practice is the seed of the team's shared agentic engineering methodology.

What Victor should do - role-specific action plan