Agent instruction files in repo (60k+ repos on GitHub)
Agent instruction files - CLAUDE.md, .cursorrules, copilot-instructions.md - have become a standard software project artifact, with 60,000+ repositories on GitHub already containing them.
- ·CLAUDE.md or equivalent exists with project description, tech stack, and top conventions
- ·Written coding conventions document exists and is referenced from agent instruction files
- ·Agent instruction files are committed to the repository (not local-only)
- ·CLAUDE.md includes explicit prohibitions (banned libraries, anti-patterns)
- ·Agent instruction files are reviewed as part of the standard PR process
Evidence
- ·CLAUDE.md, .cursorrules, or .github/copilot-instructions.md in repository root
- ·Coding conventions document accessible from agent instruction files
- ·Commit history showing agent instruction file updates
What It Is
Agent instruction files are project-specific configuration documents that tell AI agents how to behave in a given codebase. They go by different names depending on the tool: CLAUDE.md for Claude Code, .cursorrules for Cursor, .github/copilot-instructions.md for GitHub Copilot, AGENTS.md for OpenAI Codex. But they share the same purpose: providing AI agents with the context they need to make useful, convention-respecting suggestions.
As of 2025, these files have proliferated dramatically. Over 60,000 repositories on GitHub contain CLAUDE.md or .cursorrules files - a number that grew by an order of magnitude in 18 months as AI coding tools moved from early adoption to mainstream use. The open-source community has effectively standardized on the concept: when you create a project that uses AI tools, you add an instruction file. It's become as natural as adding a .gitignore or a README.md.
The ecosystem is currently fragmented. Each tool has its own file format and discovery mechanism. Cursor reads .cursorrules from the repository root and project-level .cursor/rules/ directories. Claude Code reads CLAUDE.md files at the root and in subdirectories, and merges them hierarchically. GitHub Copilot reads .github/copilot-instructions.md. Projects serious about AI tooling often maintain multiple files or a single canonical file that they reference from the tool-specific locations.
What these files contain ranges from minimal (tech stack + run commands) to comprehensive (architectural patterns, forbidden practices, per-module conventions, test requirements). The most effective instruction files are neither too short (missing critical context) nor too long (agents struggle to attend to instructions buried in thousands of lines of text).
Why It Matters
The proliferation of agent instruction files reflects a broader shift in how software projects are organized. Documentation used to be for humans; agent instruction files are documentation for AI. As AI agents become standard collaborators, projects without instruction files are at a structural disadvantage - their agents make more mistakes, require more correction, and produce less consistent code.
- 60k+ repos demonstrates this is now a standard practice, not an experimental edge case
- Open-source examples provide templates - you can study how successful projects structure their instruction files before writing your own
- Tool ecosystem is converging - despite different file names, the structure and content of effective instruction files is becoming standardized
- First-mover effect in your organization - teams with good instruction files pull ahead of teams without them; the gap compounds over time as agents generate more code
- Cross-tool compatibility matters - as developers switch tools or use multiple AI tools, a well-structured instruction file that can be referenced from multiple locations reduces maintenance burden
The 60k+ number also serves as a practical benchmark. If you're evaluating whether to invest in writing a CLAUDE.md file, the answer from the open-source community is clear: organizations that take AI-assisted development seriously have already made this investment.
Before writing your instruction file from scratch, search GitHub for CLAUDE.md or .cursorrules files in repositories using your tech stack. The open-source community has done substantial experimentation with what works - use their examples as a starting point.
Getting Started
6 steps to get from here to the next level
Common Pitfalls
Mistakes teams actually make at this stage - and how to avoid them
How Different Roles See It
Bob read about the 60k+ stat in a newsletter and asked his leads how many of their repositories have a CLAUDE.md file. The answer was two out of thirty-seven. Bob knows his team is behind the curve, but a one-time mandate to "add CLAUDE.md files to all repos" isn't the right approach - it'll produce low-quality files that don't help anyone.
What Bob should do - role-specific action plan
Sarah wants to benchmark her organization's AI context engineering maturity against industry practice. She's been using the "60k+ repos" data point to argue that agent instruction files are table stakes, not a nice-to-have, but she needs more than a number to make the case for investment.
What Sarah should do - role-specific action plan
Victor is curious about how other teams structure their instruction files. He's written a CLAUDE.md for his main repository but isn't sure if he's capturing the right things, or if there are structural patterns he's missing. He wants to learn from the broader ecosystem.
What Victor should do - role-specific action plan
Further Reading
6 resources worth reading - hand-picked, not scraped
From the Field
Recent releases, projects, and discussions relevant to this maturity level.