One-shot unattended agents (Stripe Minions model)

How to launch AI agents that run to completion autonomously - writing code, running tests, fixing errors, and producing a PR without human supervision.

·Unattended agents (Stripe Minions model, Cursor Automations) execute tasks without developer presence
·Agents are invocable from at least two channels (Slack, CLI, Web, PagerDuty)
·Each developer runs 3-5 parallel agent sessions concurrently

·Agent task completion rate without human intervention exceeds 60%
·Agent invocation produces a PR within a defined SLA (e.g., under 30 minutes for standard tasks)

Evidence

·Agent invocation logs from multiple channels with timestamps
·Dashboard showing parallel agent session counts per developer
·PR history showing agent-authored PRs merged without synchronous developer oversight

What It Is

One-shot unattended agents are AI agents given a task and left to run until they complete it - no human supervision, no confirmation prompts, no mid-task steering. The agent writes code, runs the test suite, reads the output, fixes failures, iterates, and when it reaches a passing state, opens a pull request. The developer reviews the PR; they don't watch the work.

This pattern is named after Stripe's "Minions" system, described publicly in 2024-2025: a framework for dispatching AI agents to complete tasks in isolated sandboxes. Each "minion" gets a task description, a clean environment, and execution authority. When it succeeds (tests green, PR opened) or fails definitively (stuck in a loop, cannot fix the errors), it reports back. Stripe uses this pattern at scale for tasks like dependency upgrades, API compatibility fixes, and security patches. While Stripe Minions remains the canonical example, the pattern has since been commercialized. Cursor Automations (2026) provides always-on agents triggered from external systems - Slack messages, Linear tickets, GitHub issues, PagerDuty alerts - that spin up, complete a task, and produce a PR without developer initiation. Claude Code subagents with named @ mentions enable composing multi-agent workflows where a parent agent delegates subtasks to specialized child agents, each running in its own context. In May 2026 the scheduled and async variant became first-class: Anthropic Routines and Cursor's /loop skill dispatch work on a schedule so teams "wake up to PRs ready to merge," and CI auto-fix closes the loop by repairing failing checks before a human looks. This generalizes the Stripe Minions one-shot pattern well beyond a single company. The one-shot unattended pattern is no longer experimental infrastructure - it's becoming a standard product feature.

At L4 (Optimized), unattended agents are the default mode for well-defined tasks. The shift from L3 is the removal of the human loop: at L3, a developer runs Claude Code in YOLO mode and monitors the terminal; at L4, the agent runs in a sandbox and the developer checks their email for a PR notification. The developer's attention is decoupled from the agent's execution.

The sandbox is essential. Unattended agents must run in isolated environments - ephemeral VMs, Docker containers, or git worktrees - where their actions cannot affect production systems, other developers' work, or infrastructure outside the task scope. The sandbox provides the blast radius limit that makes autonomous execution safe.

Why It Matters

Unattended agents change the economics of software development fundamentally:

Decouples developer attention from task execution - a developer can have 5 agents running while they focus on architecture, review, or strategy; execution scales independently of human time
Enables overnight work - agents launched at end of day complete tasks while the developer sleeps; the next morning's inbox contains PRs, not partially finished work
Makes parallelism natural - one developer can launch multiple agents on different tasks without context switching overhead; each task progresses independently
Shifts human role to review - developers become reviewers and directors rather than implementers; this is a fundamentally different and more leveraged use of senior engineer time
Creates reproducible task execution - the same task specification produces the same kind of PR; this consistency enables measurement and optimization

The Stripe Minions model also demonstrates something important about L4 readiness: it requires L2-L3 investment to be safe. Unattended agents without a mature CLAUDE.md produce autonomous bad decisions at scale. The context engineering work of L2-L3 is what makes L4 viable, not just risky.

In June 2026 this practice got a name and a body of theory: loop engineering. After roughly a year of "AI-native engineering" (marked by the InfoQ retrospective "From MCP and Vibe Coding to Harness Engineering"), the field articulated a cascade - prompt engineering, then context engineering, then harness engineering, then loop engineering sitting one floor above - in Addy Osmani's "Loop Engineering", LangChain's "The Art of Loop Engineering," and Latent Space's "Loopcraft: The Art of Stacking Loops". A "minion launcher" is exactly a loop: a stacked play-test-fix-verify-improve cycle around a fixed model. June saw a rush of loop frameworks (OpenLoop) and meta-harnesses (Omnigent, which lets you swap or combine Claude Code, Codex and Pi behind one layer). The strategic point ties straight to sovereignty: when the model underneath is a swappable commodity, the loop you build around it is the part that is actually yours - so it is worth engineering deliberately, not improvising per task.

Tip

The measure of whether a task is suitable for unattended execution is: can you write acceptance criteria that an automated test suite can verify? If yes, the agent can work to those criteria autonomously. If not, the task requires human judgment that an agent cannot substitute.

Getting Started

Identify suitable task categories - Good first candidates: dependency upgrades, adding tests to untested functions, fixing a class of lint warnings, updating API endpoints to a new version, migrating from a deprecated library. These share the property: clear success criteria, verifiable by tests.
Build the sandbox - Set up ephemeral environments for agent execution. The simplest: git worktrees (each task gets its own worktree, isolated from your main branch). More robust: Docker containers or cloud VMs that are created per task and destroyed after completion.
Write task specifications as structured prompts - Unattended agents need precise task descriptions. Structure: what to do, where to do it, how to verify success (which tests must pass), what not to touch. Store these as templates for repeatable task types.
Configure automatic PR creation - Claude Code can be run with --output-format json and integrated with gh pr create in a wrapper script. The agent completes the task; the script creates the PR. Set standard PR templates for AI-generated PRs that include the task specification and agent run summary.
Establish a review protocol - AI-generated PRs from unattended agents need a consistent review approach. Create a checklist: does the PR address the task spec exactly? Do all tests pass? Are there changes outside the task scope? Is there anything the agent can't have known about (business logic, edge cases)?
Start with one task type - Pick one specific, repeatable task (e.g., "write unit tests for files with 0% coverage") and run 10 agents on it. Learn from those 10 what the failure modes are, what the task spec needs to include, and what the review checklist should cover. Then expand to other task types.

Common Pitfalls

Launching unattended agents without test coverage. The agent's primary feedback mechanism is the test suite. If your tests are sparse, the agent has no way to verify it hasn't broken something. A baseline of 60%+ test coverage is the minimum for safe unattended agent execution. Below that, the agent is flying blind.

Insufficiently scoped task specifications. Vague task specs produce agents that make reasonable but unintended decisions. "Refactor the auth module" is dangerous unattended; "extract the JWT validation logic from auth.ts into a separate jwt.ts file, updating all imports, with all existing tests passing" is safe. Scope determines blast radius.

Not isolating agents from each other. Two agents running simultaneously on the same codebase will conflict. Each agent must have its own sandbox (worktree, branch, or container). Shared state between concurrent agents produces merge conflicts and confusion that's harder to debug than either agent's individual output.

Skipping the review gate. The point of unattended execution is not "merge without review" - it's "complete without supervision." Every agent-generated PR still needs human review before merge. The efficiency gain is in execution (agent does this autonomously), not in quality assurance (human reviews, same as always). Teams that skip PR review for agent-generated code are absorbing unacceptable risk.

How Different Roles See It

BobHead of Engineering

Bob has been hearing about "autonomous agents" for months and is excited about the concept but nervous about the execution. His team is at L3 - they use CLI agents well, have mature CLAUDE.md files, and good test coverage. He wants to pilot unattended agents but doesn't know where to start safely.

What Bob should do: The safest starting point is dependency upgrades. Pick one repository with good test coverage, identify a dependency that's two major versions behind, and write the task spec: "Upgrade library X from version Y to version Z. Fix all type errors and test failures that result. Do not change business logic. PR title format: chore: upgrade X to Z." Run 3 agents across 3 different dependencies. Review the resulting PRs. Bob will discover the real failure modes - not catastrophic, because the task is scoped and the tests are the safety net. This pilot produces concrete learnings about task spec quality and sandbox configuration that generalize to more ambitious tasks.

SarahProductivity Lead

Sarah can see that unattended agents have significant ROI potential - if an agent can complete a task while a developer is doing something else, that's pure throughput multiplication. But she needs to measure this to justify the infrastructure investment (sandboxes, orchestration tooling).

What Sarah should do: The key metric for unattended agents is "agent success rate" - the percentage of agent runs that produce a mergeable PR without human intervention. Track this alongside task type, CLAUDE.md version, and test coverage. A high success rate on a task type means that type is ready for production-scale automation. Sarah should also track "developer time per PR" for unattended-agent vs. manually-implemented PRs: the comparison is total calendar time from task initiation to merge, not active developer time. Unattended agents will show dramatically shorter calendar time for suitable tasks, which is the business case for the infrastructure investment.

VictorStaff Engineer - AI Champion

Victor has already run unattended agents manually (launching Claude Code in YOLO mode and walking away). He's seen it work well for test generation and dependency upgrades. He wants to build the automation layer that makes this scalable and repeatable.

What Victor should do: Victor should build the "minion launcher" - a simple CLI tool that takes a task specification template, spins up a git worktree, runs Claude Code with the appropriate flags and context, and creates a PR when complete. Start simple: a bash script wrapping Claude Code and gh pr create. The key design decisions are: how to pass task context to the agent, how to detect success vs. failure, and how to structure the PR for easy review. Victor should also define the company's standard task specification format - a YAML or Markdown template that captures task scope, success criteria, and out-of-scope constraints. This format becomes the interface between humans (who write task specs) and agents (who execute them).

From the Field

Recent releases, projects, and discussions relevant to this maturity level.

discoveredL4

jonwiggins/optioWorkflow orchestration for AI coding agents, from task to merged PR.Optio automates the end-to-end software development lifecycle by orchestrating AI agents to transition tasks from GitHub Issues or Linear directly to squash-mergithub.com

discussionL4

r/cursorWeekly Cursor Project Showcase ThreadEngineering practitioners are shifting from ad-hoc prompting to developing meta-infrastructure, such as Galactic, which enables parallel Cursor agents via isolareddit.com

articleL4

blog.airplane.teamAI Lessons from the Cockpit: Takeaways from Our Midwest Air Tour with ShirleyShirley agents shift engineering maturity from ad-hoc autocomplete to autonomous issue-to-PR pipelines by utilizing a terminal-driven navigation strategy ratherblog.airplane.team

articleL4

noquiche.fyiThe Agent-Native Editor Was Invented in 1976Emacs serves as a superior agentic substrate compared to VS Code or Cursor due to its elisp-based architecture, which allows LLMs to interact with the editor asnoquiche.fyi

Where does your team actually sit on this?

This guide describes one level of one area. Run the assessment to place your team across all 16 areas, see which gates you have passed, and get a report you can take to your stakeholders.

Start the assessment

Coding Agent Usage

Agent-aware coding conventions (explicit > implicit)Slack/CLI/Web/PagerDuty invocation → PR