Platform Engineer (AI tooling)

The Platform Engineer specializing in AI tooling is the person who builds the infrastructure that makes AI agents effective at scale.

·Platform Engineer role with AI tooling responsibility exists on the platform team
·Context Engineer is a full dedicated role (not part-time, not combined with other duties)
·Team's primary activity has shifted from writing code to evaluating and reviewing AI-generated code

·Role definitions are updated to reflect AI-augmented responsibilities
·Hiring criteria include AI tool proficiency

Evidence

·Platform Engineer job description including AI tooling responsibilities
·Context Engineer role as a dedicated position (headcount or full-time allocation)
·Time tracking showing majority of developer time on review/evaluation vs. writing

What It Is

The Platform Engineer specializing in AI tooling is the person who builds the infrastructure that makes AI agents effective at scale. Where the context engineer makes agents smart (by providing codebase knowledge), the AI tooling platform engineer makes agents safe, consistent, and operationally reliable (by building the systems they run in). This includes: MCP server infrastructure that gives agents access to internal tools and APIs, isolated sandbox environments where agents can execute code safely, shared authentication patterns for agent tool use, observability systems that make agent behavior auditable, and the internal developer platform integrations that connect agents to existing engineering workflows.

The role emerges at L3 because the infrastructure needs of a mature AI-augmented engineering organization outgrow what individual champions or context engineers can provide. At L2, each team has an ad-hoc collection of MCP configurations, local sandbox setups, and individually-maintained agent configurations. This works for a team of 8 but doesn't scale to a department of 80 or an organization of 800. The L3 platform engineer creates the shared layer: one MCP server implementation that all teams use, one sandbox environment standard that all agents run in, one monitoring system that gives security and engineering leadership visibility into what agents are doing.

The technical scope of the AI platform engineer is broad. It overlaps with DevEx (the agent experience parallels developer experience), DevSecOps (agent tool access needs to be secured and audited), and infrastructure engineering (sandbox environments, compute provisioning, network access controls). The defining characteristic of the AI tooling platform specialization is that the "developer" they are building for is an AI agent, not a human. This difference matters: agents need different things from infrastructure than humans do. They need deterministic tool interfaces, reliable context delivery, isolated execution environments, and programmatic interfaces - not UIs.

At L4 and L5, the platform engineer's role evolves toward orchestration infrastructure: the systems that manage fleets of agents, route tasks to appropriate agent types, collect outcomes, and feed results back into the system. But at L3, the core contribution is solving the "why does AI work on some teams but not others" infrastructure problem by building shared foundations that make the answer consistent.

Why It Matters

The AI tooling platform engineer role creates leverage that no other role provides at this stage:

Solves the inconsistency problem at scale - when each team builds its own AI infrastructure, quality varies widely; the platform engineer's shared infrastructure means every team gets the quality of the best individual team's setup
Unblocks context engineers - context engineers want to connect agents to internal tools, databases, and APIs via MCP; without platform support, they have to build their own MCP servers; with platform support, they configure existing shared MCP infrastructure
Enables security and compliance - agents accessing production databases, internal APIs, and sensitive code repositories need audit trails, access controls, and isolation; these are infrastructure concerns that the platform engineer addresses before they become incidents
Creates the foundation for L4 agent fleets - running 3-5 parallel agents per developer requires stable sandbox environments, reliable worktree management, and consistent agent tooling across the team; the platform engineer builds this so developers don't have to
Makes AI adoption measurable - the observability systems the platform engineer builds provide the data that shows whether AI tooling is working: agent task success rates, error categorization, time-to-completion, and context utilization patterns

Tip

The first MCP server the AI platform engineer should build is almost always an internal documentation MCP. Every team has internal wikis, runbooks, and architecture documents that agents need but can't access. An MCP server that provides read access to Confluence, Notion, or the internal wiki is the highest-leverage first infrastructure investment.

Getting Started

Audit the current agent infrastructure fragmentation - survey all teams to document their current agent configurations: what MCP servers are they running? What sandbox environments? What authentication approaches? This audit reveals both the fragmentation problem and the best existing implementations to standardize on.
Standardize the sandbox environment first - isolated execution environments (git worktrees, Docker containers, or ephemeral VMs) are the safety layer for agent code execution; before standardizing anything else, define the standard sandbox configuration that all agents will run in; this is the foundation everything else builds on.
Build the internal documentation MCP - implement a read-only MCP server that gives agents access to the organization's internal documentation, architecture records, and runbooks; this single investment benefits every team that deploys it and is the highest-confidence first MCP project.
Create agent authentication patterns - define how agents authenticate to internal services: service accounts, OAuth flows, token management; agents need reliable, auditable access to internal APIs; the platform engineer designs the patterns that both agents and the security team can rely on.
Implement basic agent observability - deploy logging that captures agent task inputs, tool calls, and outputs in a structured format; at minimum, this enables debugging of agent failures; over time, it enables the analytics that drive AI tooling investment decisions.
Create a platform adoption playbook - document how new teams onboard to the shared AI infrastructure; what MCP servers are available, how to configure them, what the sandbox environment provides, and how to get help when something doesn't work; this playbook is the onboarding for teams moving from L2 to L3.

6 steps to get from here to the next level

Common Pitfalls

Building AI infrastructure without security involvement. Agents that access internal APIs, databases, and code repositories are a security surface that needs to be addressed from day one. The platform engineer who builds AI infrastructure without security review will create systems that either get shut down after a security review or that have unaudited access to sensitive resources. Include security in the infrastructure design from the beginning.

Over-engineering the first MCP servers. The first MCP implementations should be simple, read-only integrations with high-value information sources. Over-engineering them with complex authentication, caching layers, and dynamic content generation creates maintenance burden and deployment delays. Start with the minimum useful MCP - read access to internal documentation - and add sophistication based on demonstrated need.

Creating platform lock-in that individual teams can't escape. Shared infrastructure is valuable, but individual teams sometimes have legitimate reasons to deviate from the standard. The platform engineer should build infrastructure that teams can adopt gradually rather than mandate that all teams use the shared platform immediately. Voluntary adoption with visible quality advantages is more sustainable than mandated adoption with resentment.

Neglecting the developer experience of using the platform. The platform engineer's "customer" is other developers. If the shared agent infrastructure is complex to configure, opaque to debug, or inconsistent in behavior, developers will work around it rather than with it. Invest as much effort in the adoption experience as in the infrastructure itself.

Missing the observability opportunity. The platform engineer's observability systems are the organization's primary source of data on AI adoption effectiveness. If these systems are built as debugging tools rather than analytics platforms, the organization misses the opportunity to measure AI ROI, identify the highest-value use cases, and guide future investment. Design observability with both debugging and analytics in mind from the start.

Mistakes teams actually make at this stage - and how to avoid them

How Different Roles See It

BobHead of Engineering

Bob's organization is at L3 with 40 developers across five teams. Each team has set up its own AI tooling - different MCP configurations, different sandbox approaches, different authentication patterns. One team's setup is excellent and three teams' setups are mediocre. Bob wants to systematize the good setup across all teams but doesn't have the right person for the job.

What Bob should do: Bob should hire or designate a Platform Engineer with explicit AI tooling responsibility. The best candidate is the developer who built the excellent AI infrastructure on the high-performing team - they have proven they can do the work, they understand the problems, and they have credibility with the other team champions. Bob should give this person a three-month mandate: standardize the sandbox environment, build a shared MCP server for internal documentation, and document the patterns for teams to adopt. The mandate should include explicit cross-team access: this person needs to be able to deploy infrastructure, configure security policies, and work with each team's champion.

What Bob should do - role-specific action plan

SarahProductivity Lead

Sarah needs to write the job description for the first formal AI Platform Engineer hire. She's not sure whether to position it as a DevEx role, a DevSecOps role, or something new. The confusion is creating delays in the hiring process.

What Sarah should do: Sarah should frame the role around the infrastructure problems it solves, not the organizational category it fits in. The job description should list the specific deliverables: MCP server infrastructure, sandbox environments, agent observability, authentication patterns for agent tool access. The required skills are: infrastructure engineering (containers, orchestration, security), developer tooling experience (understands the developer workflow being enhanced), and AI system knowledge (understands how agents use tools and context). This is a legitimate specialization that sits at the intersection of DevEx and DevSecOps. Sarah should price it accordingly - this is a senior engineering role, not a junior support position.

What Sarah should do - role-specific action plan

VictorStaff Engineer - AI Champion

Victor has become the de facto platform engineer for his team's AI infrastructure, but it wasn't his original focus. He built their MCP setup, their sandbox configuration, and their agent observability - all in addition to his context engineering work. He's now being asked by other teams to help them set up similar infrastructure, which is taking more and more of his time.

What Victor should do: Victor should make the case for a dedicated Platform Engineer hire by documenting his time allocation. If he's spending 40% of his time on infrastructure work that applies to multiple teams, the organization should hire someone to do that work as their primary job - which frees Victor to focus on the context engineering and architectural work that is his primary value. Victor should also write up the infrastructure as documentation rather than implementing it for each team: an architecture document, a setup guide, and a configuration reference. This positions him as the architect who defined the standard rather than the implementer who has to do it for everyone.

What Victor should do - role-specific action plan