MCP security best practices for Kubernetes teams

Every MCP server you deploy is an access point into your internal systems. Leave one ungoverned and you have a blast radius with no defined boundary. The good news for platform and security teams already running Kubernetes is that they don’t have to start from scratch. The primitives for isolation, identity, and observability already exist in your cluster. You just need to apply them to MCP.

Here’s what you’ll learn in this post:

  • How to contain the blast radius of a compromised MCP server using Kubernetes-native isolation
  • Why credential-based authentication is the wrong model for MCP, and how workload identity fixes it
  • How a single enforcement point at the gateway gives you access control, rate limiting, and audit logging without per-server configuration
  • What full observability looks like for MCP tool calls, and how to get it with OpenTelemetry
  • How a governed registry stops MCP sprawl before it starts

Isolate every MCP server

Run each MCP server in its own container. This is the minimum unit of isolation that makes every other security control meaningful.

When a server is compromised, the attacker’s immediate goal is lateral movement: reach adjacent services, escalate privileges, exfiltrate data. Container isolation, enforced with Kubernetes network policy and namespace boundaries, breaks that path. A compromised MCP server cannot reach adjacent servers without explicit policy that you authored and can audit.

The mechanics are straightforward. Assign each MCP server to its own namespace. Write NetworkPolicy resources that default to deny-all ingress and egress, then open only the ports and destinations each server legitimately needs. Set resources.limits on CPU and memory to prevent a runaway process from becoming a noisy neighbor or a stepping stone for a resource exhaustion attack.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: mcp-server-isolation
  namespace: mcp-payments
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: mcp-gateway
      ports:
        - port: 8443

Enforce identity, not just credentials

Hardcoded API keys tell you that something accessed a system. They do not tell you who authorized that access, which agent invoked the call, or whether the action was within policy. That gap is an identity problem.

The right model for MCP is the same model Kubernetes already uses for workload identity: SPIFFE/SPIRE. Every MCP server gets a cryptographically verifiable identity (a SPIFFE Verifiable Identity Document, or SVID) that is automatically rotated, bound to the workload, and not stored as a secret that can leak.

On the user side, connect your existing identity provider (Okta, Microsoft Entra ID, or any OAuth 2.0-compliant IdP) and use token exchange to pass a verified user identity through to every tool call. The Stacklok gateway handles token passthrough, so you do not need to retrofit each MCP server with its own authentication layer.

As a result, every tool call carries two verified identities: (1) the workload that made the call and (2) the human who authorized it. Your audit log moves from “API key X was used at 14:32” to “user alice@corp.com, authenticated via Entra ID, invoked the query_database tool on the payments MCP server via agent session 7f3a.”

That is the difference between a credential trail and an identity trail.

Enforce policy at the gateway

Without a single intercept point, you have no governance. Each MCP server becomes its own policy island, enforced differently (or not at all), audited separately, and invisible to the rest of your security tooling.

A gateway gives you one place to enforce access control, rate limiting, and audit logging across every MCP server in your environment. Stacklok’s gateway uses Cedar and Open Policy Agent (OPA) with Rego-based policies, which means your access rules are declarative, version-controlled, and auditable — not buried in per-server configuration files.

Tool-level access control is worth calling out specifically. MCP exposes capabilities at the tool level, not just the server level. A policy that says “alice can access the database MCP server” is underspecified. A policy that says “alice can call read_schema but not execute_query” is actionable and our gateway enforces at that granularity.

package mcp.authz

default allow = false

allow if {
    input.user.role == "analyst"
    input.tool.name in {"read_schema", "list_tables"}
}

allow if {
    input.user.role == "engineer"
    input.tool.name in {"read_schema", "list_tables", "execute_query"}
    not input.tool.args.destructive
}

Rate limiting at the gateway also gives you a lever for cost control. LLM-driven agents can generate tool call volume that is hard to predict. Setting per-user or per-agent limits at the gateway keeps runaway agents from becoming runaway invoices.

Make every tool call observable

A complete audit trail for an MCP tool includes which user, which agent session, which tool, what inputs, what outputs, at what time, with what latency, with what result.

Structured OpenTelemetry instrumentation on every MCP call gives you this trace without custom logging in each server. Route the telemetry to whatever observability platform your team already uses.

Two things to get right when implementing this:

First, include the agent session ID in every span. Without it, you can see that a tool was called but you cannot reconstruct the chain of reasoning that led to the call. With it, you can trace from an anomalous action back to the agent prompt that triggered it.

Second, log inputs and outputs at the tool level, not just at the HTTP level.

Use a registry to stop sprawl

MCP server sprawl follows the same pattern as container sprawl, npm package sprawl, and every other “easy to install, hard to govern” technology. Someone in a developer channel posts a link to a useful MCP server. Twelve engineers install it by end of day. None of them went through security review. Six months later, three of those servers are running outdated images with known CVEs, and nobody knows who owns them.

A governed registry makes the approved path the default path. Security review happens before deployment, not after an incident. Teams browse a curated catalog of vetted servers, deploy from there, and stay on an update cadence that the platform team controls.

The Stacklok Registry Server implements the official MCP Registry API specification. You curate the catalog; your teams discover and deploy from it. Servers that are not in the registry are not the approved path, and with network policy in place, they cannot reach the gateway anyway.

Where to start

If you are already running Kubernetes, you have the foundation: namespaces, network policy, RBAC, and a workload identity system. Stacklok connects those primitives to MCP and adds the gateway, registry, and policy enforcement layer that turns them into a coherent security posture.

You do not need a full platform procurement to get started. ToolHive, our open source project, runs today and gives you container isolation, a governed registry, and the gateway out of the box. Bring your own cluster; ToolHive handles the rest.

Want to see what Stacklok can do for your organization? Book a demo or get started right away with ToolHive, our open source project. Join the conversation and engage directly with our team on Discord.

June 02, 2026

How-To

Scott Buchanan

CMO

Scott Buchanan is the Chief Marketing Officer at Stacklok. Scott leads the company's first-party research efforts that define benchmarks for AI agent and MCP adoption. He's also an example of how a non-developer can lean into MCP and agentic workflows to increase productivity.

More by Scott Buchanan