Why Kubernetes Is the Right Platform for Running MCP Servers in Production

Ten years ago, containers solved the application packaging problem. Every application could be bundled with its dependencies and run consistently anywhere. That was genuinely useful. But packaging is not operations. The hard problems of isolation, networking, identity, ingress, and observability remained unsolved until Kubernetes gave platform teams one place to address all of them at once.

Model Context Protocol (MCP) servers face the same problems today. Most teams cannot tell you which user authorized which tool call, whether a compromised MCP server can reach adjacent systems, or what exactly happened during a failed multi-step workflow. The protocol solved the integration problem. The production problems remain.

The same Kubernetes building blocks of namespace isolation, NetworkPolicy, RBAC, OIDC federation, OpenTelemetry, Helm, and GitOps apply directly to MCP server deployments. Platform engineering teams that are already operating Kubernetes do not need a new control plane. They need to bring MCP servers under the one they already have.

This post explains why Kubernetes is the best substrate for production MCP deployments, what each layer of the Kubernetes stack contributes, and what it looks like in practice with ToolHive’s Kubernetes Operator.

The Five Production Problems MCP Shares with Containers

When containers first became ubiquitous, the problems that followed were predictable: How do you isolate workloads from each other? How do you control what they can reach on the network? How do you know which workload is acting as which identity? How do you route traffic to them? How do you know what they did?

Kubernetes answered all five. The same five questions apply to MCP servers, and the answers are the same.

Isolation: An MCP server with access to a production database should not share a runtime surface with an MCP server that handles user-facing document processing. Container escapes, dependency vulnerabilities, and misbehaving tools can propagate laterally if servers share a process or host without namespace boundaries. Kubernetes namespaces and pod-level isolation provide the correct runtime boundary.

Networking: Which MCP server should be able to reach which backend system? Without network policy, a compromised MCP server is an open path to every system the cluster can reach. Kubernetes NetworkPolicy enforces allowlisted egress at the cluster level, not by developer convention.

Identity: Which agent is acting on behalf of which user, with what permissions? MCP servers that authenticate with shared service account tokens cannot answer this question. Kubernetes service accounts, combined with OIDC federation to enterprise identity providers, enable per-workload identity that is verifiable, rotatable, and auditable.

Ingress and gateway: How does an agent connect to an MCP server? How is policy enforced at the edge before a request reaches the server? Kubernetes-native gateway infrastructure (whether an ingress controller, a service mesh, or a purpose-built MCP gateway) provides the centralized enforcement point that production workloads require.

Observability: When an agent makes a decision you did not expect, what trace exists? Per-server logs in isolation are not an answer. OpenTelemetry, already deployed for most Kubernetes workloads, provides the distributed tracing infrastructure that makes MCP tool invocations visible as part of end-to-end application traces.

Isolation: Namespaces, Pods, and Container-Level Runtime Boundaries

The MCP specification does not enforce isolation between servers. A naive deployment means that a supply chain compromise in one server’s dependency, or a successful prompt injection that achieves code execution, can reach every other server’s credentials and network access.

Kubernetes provides three layers of isolation that apply directly to MCP servers.

Namespace isolation partitions MCP servers by team, environment, or trust level. A development team’s MCP servers do not share a namespace with production servers. Namespace-scoped RBAC means that a service account authorized to manage MCP servers in namespace/team-a cannot modify resources in namespace/team-b. This is not a security boundary on its own — containers in different namespaces can still communicate across the cluster — but it is the correct organizational boundary for policy, RBAC, and resource quota enforcement.

Pod-level isolation gives each MCP server its own network identity, its own set of environment variables, and its own lifecycle. A pod running one MCP server does not share a process namespace with a pod running a different server. A crash, a dependency failure, or an exploited vulnerability in one pod does not directly affect the runtime of another.

Container security contexts enforce the correct execution posture. For MCP servers, this means:

securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop:
      - ALL
  seccompProfile:
    type: RuntimeDefault

These are not optional hardening measures. An MCP server that runs as root with unrestricted capabilities and a writable root filesystem provides essentially no runtime containment if its code or dependencies are compromised. Running as non-root with all capabilities dropped and a read-only filesystem limits what an attacker can do even after a successful exploit.

The Stacklok Kubernetes Operator, part of our ToolHive open source project, enforces this security context automatically for every MCPServer resource it manages.

Networking: NetworkPolicy as the Egress Allowlist for MCP Servers

The “NeighborJack” vulnerability class (MCP servers bound to 0.0.0.0 that accept unauthenticated connections from any device on the same network) is a symptom of a missing network control layer. But the more consequential risk is not inbound exposure; it is unrestricted outbound access.

An MCP server that can reach any system the cluster can reach is a significant blast radius if compromised. A filesystem MCP server has no legitimate reason to make outbound HTTPS calls to arbitrary internet endpoints. A database query MCP server should reach the specific database host on the specific port it requires.

Kubernetes NetworkPolicy enforces these constraints declaratively. Each MCPServer can be paired with a NetworkPolicy that allowlists exactly the egress destinations its tools require:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: mcp-analytics-server-egress
  namespace: team-data
spec:
  podSelector:
    matchLabels:
      toolhive: "true"
      app: mcp-analytics-server
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: toolhive-system
  egress:
    - ports:
        - protocol: TCP
          port: 5432
      to:
        - ipBlock:
            cidr: 10.0.2.10/32   # Analytics DB only

This policy permits inbound connections only from the ToolHive proxy namespace and permits outbound connections only to a specific database host on the Postgres port. The analytics MCP server cannot reach the internet, cannot reach adjacent internal services, and cannot be reached directly by anything other than the ToolHive gateway. Any code execution within the server’s container operates within this network boundary.

A service mesh adds a further layer. With Istio in ambient mode, for example, every pod in a namespace automatically receives a SPIFFE workload identity and mutual TLS enforcement without container restarts or sidecar injection. A policy that states “only the orchestrator’s SPIFFE identity may call this MCP server” is enforced cryptographically by the mesh’s data plane.

Identity: OIDC Federation, Service Accounts, and Per-Request Attribution

The dominant identity failure in MCP deployments is the shared service account token. A single credential is embedded in the MCP server’s environment at startup, used for all requests regardless of which user or agent initiated them, and rotated on a schedule that has nothing to do with the actual risk. Every tool invocation is attributed to the same identity. The audit log says “the analytics server called the database.” It does not say which user’s agent did, which workflow triggered it, or whether the invocation was within the scope of what that user was authorized to request.

Kubernetes provides the infrastructure to solve this properly.

Kubernetes service accounts give each MCP server its own identity within the cluster. The Kubernetes Operator automatically provisions a dedicated ServiceAccount for each MCPServer resource with minimal permissions, and that is namespace-scoped with no shared credentials between servers. This is not just organizational hygiene; it is the prerequisite for meaningful audit attribution. When a specific MCP server’s service account appears in a log, you know exactly which server generated the entry.

OIDC federation to enterprise identity providers extends this to human and agent identity. Kubernetes projects service account tokens as OIDC JWTs, which can federate with Okta, Entra ID, Google, and other enterprise identity providers. This is the mechanism that enables per-request identity: when a user authenticates through the enterprise IdP, and the resulting token flows through the MCP platform to the tool invocation, the tool invocation carries the verified identity of the originating user, not a shared service account.

ToolHive’s embedded authorization server implements this in-process within the ToolHive proxy. Users authenticate through their enterprise IdP, ToolHive issues tokens scoped to the specific MCP servers and tools the user is authorized to access, and every tool invocation carries a verified, attributable identity. The authorization endpoint runs inside your cluster. No external vendor endpoint is in the critical path.

GKE Workload Identity, AWS IRSA, and Azure Workload Identity extend this pattern to cloud API access. An MCP server that needs to call AWS APIs does not store an AWS_ACCESS_KEY_ID in an environment variable. It authenticates as a Kubernetes service account federated to an IAM role through the cluster’s OIDC provider. The credential is short-lived, automatically rotated, and scoped to exactly the permissions the server requires.

For organizations that want the strongest possible workload identity (cryptographic identity verifiable across cluster boundaries and cloud environments) SPIFFE/SPIRE provides this natively in Kubernetes. Every workload receives a short-lived X.509 certificate embedding a SPIFFE ID (e.g., spiffe://your-org/ns/team-data/sa/mcp-analytics-server). Certificates are rotated automatically. Policy is expressed in terms of SPIFFE identities, not IP addresses or network locations. A tool invocation from a compromised MCP server carries that server’s SPIFFE identity and a policy that says “only the orchestrator may call the analytics server” rejects it at the transport layer before application code ever runs.

GitOps and the Operator Pattern: MCP Servers as Declarative Infrastructure

The operational model that made Kubernetes tractable for large organizations was the Operator pattern combined with GitOps workflows. Instead of imperative scripts that provision infrastructure through sequences of commands, infrastructure is declared as Kubernetes resources. Manifests are checked into version control, applied through CI/CD pipelines, and reconciled by controllers that keep the actual state aligned with the desired state.

MCP servers belong in this model. An MCP server that was installed manually, with credentials configured by hand and no record of what version is running, is not production infrastructure. It is technical debt that will fail in a way that cannot be diagnosed.

The Kubernetes Operator implements the Operator pattern for MCP servers. Each server is declared as an MCPServer Custom Resource:

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
  name: github-mcp
  namespace: team-platform
spec:
  image: ghcr.io/stackloklabs/github-mcp/server:v0.4.1
  transport: streamable-http
  mcpPort: 8080
  proxyPort: 8080
  resources:
    limits:
      cpu: "200m"
      memory: "256Mi"
    requests:
      cpu: "100m"
      memory: "128Mi"

When this manifest is applied through a GitOps pipeline, a CI/CD system, or kubectl apply, the Operator automatically creates the corresponding pod, service, ServiceAccount, Role, and RoleBinding. The security context is enforced without any per-server configuration. The server is registered with the ToolHive Registry Server, making it discoverable through the Portal. When the manifest is deleted, the Operator cleans up all associated resources.

The practical consequences of this model are significant for platform engineering teams:

Auditability: Every MCP server in production has a corresponding manifest in version control. The manifest records what image version is running, what resources are allocated, what namespace it lives in, and when it was last modified. A security incident that requires identifying all MCP servers with access to a specific backend is a kubectl get mcpservers -A and a grep, not an archaeology project.

Consistency: Because the Operator enforces security context, RBAC, and resource limits for every server it manages, there is no “we forgot to drop capabilities on that one”. Controls are applied uniformly by the platform, not by individual server developers.

Upgrades and rollbacks: Updating a server image version is a one-line change to a manifest and a kubectl apply. Rolling back is git revert and a push. The Kubernetes control plane handles pod replacement, readiness checking, and traffic cutover. The same upgrade and rollback mechanics that platform teams use for application workloads apply without modification to MCP servers.

Observability: OpenTelemetry, Prometheus, and Trace Continuity Across Agent Hops

The observability problem for MCP servers is that without a standardized telemetry format, the data is not correlated. Per-server logs tell you that a tool was called. They do not tell you which user’s workflow called it, which agent in a multi-step workflow triggered the invocation, or how the invocation contributed to an outcome that was reported to the user twenty seconds later.

Kubernetes-native observability infrastructure solves this when the MCP platform emits telemetry in compatible formats.

OpenTelemetry is the standard for distributed tracing, metrics, and logs across cloud-native workloads. The OTel MCP semantic conventions (merged into the official OTel specification in January 2026) define standard attribute names for MCP tool invocations: mcp.server.name, mcp.tool.name, mcp.method, and related fields. An MCP platform that emits telemetry using these attribute names produces spans that correlate automatically with the application traces already in your observability stack.

ToolHive’s telemetry aligns with the OTel MCP semantic conventions as of March 2026. When an agent invokes a tool through ToolHive, the resulting span uses the same attribute names as every other OTel-instrumented component in the cluster. A Grafana dashboard that shows request latency across microservices can include MCP tool invocation latency in the same view. A Datadog trace for a user-facing request can include the MCP tool calls that contributed to the response. No custom integration layer required.

Prometheus metrics from ToolHive expose per-server tool invocation counts, latency histograms, error rates, and circuit breaker state. These integrate with existing Prometheus deployments without modification; the same alerting rules and dashboards used for application workloads apply directly to MCP server behavior.

Audit continuity is the specific requirement that generic observability infrastructure does not address by itself. An audit trail for MCP tool invocations must record the authenticated identity of the invoking principal, not just the server identity. Kubernetes RBAC and the ToolHive embedded authorization server together provide this: the authenticated user identity flows from the enterprise IdP through ToolHive’s token validation into the tool invocation span, producing an audit record that answers “who, on behalf of what workflow, invoked which tool, with what parameters, and what was the result.”

Portability and the Vendor Lock-in Argument

There is a structural argument for Kubernetes as the MCP control plane that goes beyond the individual capabilities it provides.

When Kubernetes was donated to the CNCF in 2015, the intent was vendor neutrality at the orchestration layer. The outcome validated the argument decisively: every major cloud provider built a managed Kubernetes service. Docker deprecated its own competing orchestrator in favor of Kubernetes. The open, neutral option became the only option because no organization building on a proprietary orchestration layer wanted to find out what happened when that orchestrator’s roadmap diverged from their needs.

The same structural argument applies to the MCP control plane now. Major model providers are shipping full-stack agent platforms that bundle hosting, orchestration, tool integration, identity, and policy into a single offering. Getting started is fast. But when your isolation, identity, and observability layers are all owned by your model provider, switching models or frameworks means rearchitecting everything underneath. The integration layer should not be captive to any single provider above or below it.

A Kubernetes-native MCP control plane is portable by definition. The MCPServer CRDs deploy to any Kubernetes cluster (EKS, GKE, AKS, on-premises, air-gapped, etc.) The observability integrates with whichever OTel backend you use. The identity federates with whichever IdP you operate. The networking policies apply regardless of which cloud the cluster runs on. When a new model, framework, or agent runtime becomes the better choice for a specific use case, the MCP infrastructure underneath does not need to change.

What ToolHive Adds on Top of Kubernetes

Stacklok’s Kubernetes Operator, which is also part of our open source project, ToolHive, does not replace any of the Kubernetes capabilities described in this post. It assembles them into a coherent MCP-specific control plane.

The Operator manages the full lifecycle of MCPServer resources, provisioning pods, services, service accounts, and RBAC resources automatically, enforcing the security context uniformly, and cleaning up when servers are removed. Platform teams define servers in manifests; the Operator handles the implementation details.

The Registry Server implements the official MCP Registry API and provides a curated catalog of approved servers. Administrators control which servers are available in the organization. Cluster-wide namespace scanning (introduced in the February 2026 release) enables multi-tenant deployments where the Registry Server watches MCPServer resources across multiple namespaces from a single deployment.

The vMCP gateway provides the policy enforcement layer, including tool scoping per agent role, the embedded authorization server for per-request identity, circuit breakers for backend resilience, and the MCP Optimizer for on-demand tool discovery (60–85% per-request token reduction as of March 2026). vMCP is itself a Kubernetes workload, declared as a VirtualMCPServer CRD and deployed through the same GitOps pipeline as every other resource.

The Portal provides the browser-based interface through which developers discover approved servers, request access, and connect their agents without needing to understand the underlying Kubernetes infrastructure.

None of these components require modifying your existing Kubernetes infrastructure. They deploy into a dedicated namespace alongside your existing workloads, integrate with your existing observability stack through standard OTel and Prometheus interfaces, and federate with your existing identity provider through standard OIDC.

Frequently asked questions

Want to run MCP on your existing Kubernetes control plane? Here are some additional questions to consider:

Docker provides container isolation for a single server on a single host. It does not provide NetworkPolicy for egress allowlisting, OIDC federation for per-request identity, Operator-managed lifecycle with automatic RBAC provisioning, cluster-wide registry and portal, or the GitOps deployment model that makes a fleet of servers auditable and consistently configured. Docker is the right starting point for local development. It is not a substitute for the production control plane that Kubernetes provides.

Kubernetes projects service account tokens as OIDC JWTs, which can be validated by enterprise identity providers such as Okta, Entra ID, and Google. Stacklok’s open source project, ToolHive, includes an embedded authorization server that uses this mechanism to handle the full OAuth flow in-process: users authenticate through their enterprise IdP, ToolHive validates the token and issues a scoped MCP access token, and every tool invocation carries the verified identity of the originating principal.

The Registry Server’s cluster-wide namespace scanning feature (introduced in the February 2026 release) enables a single Registry Server deployment to watch MCPServer resources across multiple Kubernetes namespaces. Teams deploy their MCP servers into their own namespaces. The Registry Server aggregates them into a unified catalog. Administrators control which servers are visible in the Portal through curation policies. The THV_REGISTRY_WATCH_NAMESPACE environment variable configures which namespaces are monitored. See docs.stacklok.com for the full configuration reference.

March 25, 2026

Comparisons

How-To

MCP server authorization for downstream access