CLI agents (Claude Code, Codex) as primary

How shifting from IDE plugins to CLI-based agents makes AI a programmable, scriptable part of your development workflow rather than a typing assistant.

·CLI agents (Claude Code, Codex) are the primary coding interface for 50%+ of feature work
·Per-team or per-repo rules files exist and are maintained with code review
·Coding conventions are written as explicit, agent-parseable rules (not implicit tribal knowledge)

·Agent usage is tracked per developer and per repository
·Agent instruction files follow a standardized template across the organization

Evidence

·CLI agent session logs or telemetry showing primary usage
·Rules files in repository with commit history showing regular updates
·Coding conventions document cross-referenced from agent instruction files

What It Is

CLI agents - Claude Code, OpenAI Codex CLI, Aider, and similar tools - run in your terminal rather than inside an IDE plugin. This architectural difference is more significant than it appears. A CLI agent is a programmable tool: it can be invoked by scripts, chained into workflows, run in CI/CD pipelines, triggered by git hooks, and executed in remote environments without a graphical IDE. An IDE plugin is a convenience feature; a CLI agent is infrastructure.

At L3 (Systematic), CLI agents become primary - not supplementary. The developer's main AI interaction is no longer the chat sidebar or inline suggestions, but a terminal session where they describe tasks, the agent executes them, and the developer reviews results. The IDE still exists (with Copilot still running for inline assistance), but the high-leverage work happens in the CLI.

Claude Code is the canonical example: run claude in your project root, give it a task, and it uses tools to read files, make edits, run tests, and iterate until the task is complete. As of March 2026, Claude Code also supports Computer Use (interacting with GUIs, browsers, and desktop applications) and Auto Mode (dynamically choosing between tool use strategies), extending CLI agents beyond pure code tasks into full-environment automation. OpenAI's original Codex CLI is no longer actively developed - OpenAI shifted investment to Codex integrated within ChatGPT, which operates as a cloud-hosted agent rather than a local CLI tool. Aider supports multiple backends and is especially strong for pair-programming style interaction. Gemini Code Assist Agent Mode, now GA on IntelliJ (and other JetBrains IDEs), blurs the CLI/IDE boundary by offering terminal-grade agentic capabilities from within the IDE. All of these tools share the same core architecture: a language model with tool access, running autonomously.

By June 2026, the same CLI architecture also runs on self-hostable open-weight models, which makes a locally-run agent a credible primary path rather than a fallback. Z.ai's GLM-5.2 (MIT-licensed, ~June 13) beat GPT-5.5 on SWE-bench Pro at roughly one-sixth the cost; Kimi K2.7-Code (Moonshot, June 12) shipped with day-one vLLM/SGLang support; and DeepSeek V4 now leads open-weight SWE-bench. All of them run through llama.cpp, vLLM, or Ollama and plug into the same CLI-agent loop as hosted models. The sovereignty case got concrete the same week: Anthropic's hosted Fable 5 and Mythos 5 launched June 9 and were disabled worldwide June 12 under a US export-control order. Teams that depend on a CLI agent for primary work should keep a self-hostable model wired into the same workflow so a single vendor or policy change cannot stall delivery.

The "primary" designation at L3 reflects a workflow inversion. At L1-L2, developers write code and occasionally use AI to help. At L3, developers describe what they want and the agent writes code, with the developer reviewing and steering. The human role shifts from implementer to orchestrator.

Why It Matters

The move to CLI-first agents is the inflection point where AI assistance becomes systematically integrated rather than ad-hoc:

Scriptability - CLI agents can be invoked from Makefiles, shell scripts, CI/CD pipelines, and GitHub Actions; IDE plugins cannot
Automation foundation - the same CLI command you run manually can become an automated step in your workflow; this is the path to L4 unattended agents
Environment independence - CLI agents run anywhere a terminal runs: local machines, CI runners, remote servers, Docker containers
Composability - CLI agents can be chained with other CLI tools; claude "generate tests" | grep TODO is a legitimate workflow
Separation of concerns - your IDE focuses on editing; your terminal focuses on agentic tasks; concerns don't compete for the same interface

CLI agents also enable a crucial L3 practice: systematic measurement. When agent invocations are CLI commands, they can be logged, timed, and analyzed. You can measure how long tasks take, how often agents need corrections, and which task types produce the best results. This measurement is what makes L3 systematic rather than just guided.

Tip

Create shell aliases for your most common agent tasks. alias write-tests='claude "write unit tests for the file I just modified, following patterns in existing tests"' turns a multi-step interaction into a single keystroke. These aliases are also the seeds of your L4 automation scripts.

Getting Started

Install Claude Code globally - npm install -g @anthropic-ai/claude-code or follow the official installation instructions. Set your ANTHROPIC_API_KEY environment variable.
Run claude in your project root - Claude Code will read your CLAUDE.md and establish project context. Do this at the start of every development session.
Start with well-defined tasks - "Write unit tests for src/utils/parser.ts following the patterns in src/utils/formatter.test.ts" is a good first CLI agent task. Bounded, verifiable, and low-risk.
Build a personal task library - Keep a text file of agent prompts that worked well. Over time, this becomes a reusable library of task templates. Share it with your team as part of your L3 standardization.
Integrate with your Makefile or task runner - Add agent invocations to your development workflow: make test-generate could run Claude Code to generate tests for recently modified files. This is the first step toward automation.
Establish team conventions for agent usage - At L3, document which task types are suitable for CLI agents, what context to include in prompts, and how to review agent output. These conventions go in your team's CLAUDE.md or engineering handbook.

Tip

Use claude --print to get agent output as plain text that you can pipe to other tools or save to files. This is the composability feature that makes CLI agents genuinely different from IDE plugins.

Common Pitfalls

Treating CLI agents as a faster chat interface. The CLI isn't just a different UI for asking questions. It's a different paradigm: give a task, review a result. Developers who use Claude Code like a chat panel (asking questions, getting answers, asking follow-ups) get IDE-plugin-level value from a tool built for much more. Shift to task-oriented prompts with acceptance criteria.

Skipping the project context initialization. Always run your CLI agent from the project root, where it can read CLAUDE.md. Running from a subdirectory or a different working directory produces generic output that ignores your project conventions. Make cd project-root && claude a habit before anything else.

Not integrating with version control. CLI agents should always run on a clean branch or with a recent commit as a checkpoint. Agents running in the terminal can make changes as fast as you can approve them, and without version control checkpoints, mistakes are hard to isolate. git add -p after an agent run is good practice at L3.

Neglecting the measurement opportunity. CLI agents are instrumentable in ways IDE plugins are not. At L3, you should be logging agent invocations: task type, duration, number of iterations, human corrections required. This data drives the optimization that characterizes L3 and sets up L4. If you're not measuring, you're still at L2 behavior.

How Different Roles See It

BobHead of Engineering

Bob's team is using Copilot well (L2), but adoption of Claude Code has been slower. Developers say the CLI interface feels unfamiliar and they prefer the IDE plugins they're used to. Bob isn't sure whether to push adoption or let it happen organically.

What Bob should do: The resistance to CLI agents is workflow resistance, not tool resistance. Bob should run a structured "CLI agent sprint": for two weeks, every developer commits to using Claude Code for at least one substantial task per day, tracked in a shared log. After two weeks, review: what tasks did people use it for? What worked? What didn't? The sprint creates the shared experiential vocabulary that drives adoption better than any top-down mandate. Bob should also ensure CLAUDE.md is mature before pushing CLI agent adoption - an agent with good context produces better results, which creates better first impressions.

SarahProductivity Lead

Sarah can now see a cleaner measurement story at L3. CLI agents produce logs, and logs produce metrics. She wants to establish the measurement infrastructure before the team scales their agent usage.

What Sarah should do: Instrument Claude Code usage with the built-in logging and token tracking. Track: (1) tasks attempted per developer per week, (2) average iterations per task (lower is better context quality), (3) human correction rate (how often does the developer have to fix agent output before it's usable). These three metrics, tracked over time, tell the story of AI maturity progression: as CLAUDE.md improves, iterations per task decreases. As developers get better at task specification, correction rate decreases. Sarah should set quarterly targets for each metric and report progress to stakeholders as "AI workflow efficiency" rather than "AI adoption rate" - the former is a productivity story, the latter is a vanity metric.

VictorStaff Engineer - AI Champion

Victor has been using Claude Code as his primary development tool for months. He's written shell scripts that invoke it automatically on common task patterns and has integrated it into his personal Makefile. His iteration speed on new features is 3-4x what it was before.

What Victor should do: Victor should formalize his shell script library as team infrastructure. A shared repository of claude invocation scripts - generate-tests.sh, refactor-module.sh, implement-from-spec.sh - gives the whole team the productivity gains Victor has achieved individually. Victor should also propose adding a standard "AI task log" to the team's workflow: a simple shared doc where developers record the CLI agent tasks they ran that week and their quality assessment. This lightweight process creates the measurement data Sarah needs and surfaces the best practices that Bob should scale.

From the Field

Recent releases, projects, and discussions relevant to this maturity level.

articleL3

twitter.comUnder-utilized features in Claude CodeClaude Code enables terminal-native agentic workflows, moving beyond chat interfaces to autonomous file system and shell execution. Under-utilized capabilities twitter.com

discoveredL3

hangsman/claude-code-sourceclaude code source map v2.1.88Claude Code v2.1.88 shifts AI integration from IDE extensions to agentic terminal operations, requiring Node.js 18+ for CLI-based codebase interaction. It automgithub.com

discoveredL3

huggingface/hf-agentsHF CLI extension to run local coding agent powered by llmfit and llama.cppHugging Face’s hf-agents CLI extension automates local agentic coding by integrating llmfit for hardware-specific model recommendations with llama.cpp for localgithub.com

discoveredL3

slavingia/skillsClaude Code skills based on The Minimalist Entrepreneur by Sahil LavingiaClaude Code utilizes its `/plugin` architecture to facilitate framework-encoded engineering, allowing teams to inject business-aligned logic directly into the tgithub.com

Where does your team actually sit on this?

This guide describes one level of one area. Run the assessment to place your team across all 16 areas, see which gates you have passed, and get a report you can take to your stakeholders.

Start the assessment

Coding Agent Usage

Copilot + Claude Code in parallel Rules files per-team/per-repo

CLI agents (Claude Code, Codex) as primary

What It Is

Why It Matters

Getting Started

Common Pitfalls

How Different Roles See It

Further Reading

From the Field

Where does your team actually sit on this?