Introducing the Stacklok AI Gateway

Why Enterprises Move Beyond LiteLLM: The Case for an Enterprise MCP Platform in 2026

LiteLLM gives engineering teams a unified OpenAI-compatible proxy in front of multiple LLM providers, with basic key management and spend tracking. For teams standing up their first LLM infrastructure, that is often enough. The challenge comes when an enterprise needs per-team cost attribution enforced at the infrastructure level, when a CISO asks how credentials are stored, when a platform team needs to govern dozens of AI agents across Kubernetes namespaces, or when an audit requires a complete record of every model call. Those are the inflection points where LiteLLM’s architecture is out of runway.

This post examines the specific capabilities enterprises require beyond what LiteLLM provides, which platforms address those requirements, and how to evaluate fit based on your actual infrastructure constraints.

What LiteLLM Does Well (and Where Its Limits Appear)

LiteLLM is an open-source LLM proxy that normalizes calls across providers — OpenAI, Anthropic, Azure OpenAI, Bedrock, Cohere, and others — behind a single OpenAI-compatible API. Its core strengths are developer experience and provider portability.

Where LiteLLM’s architecture creates friction for enterprises:

Credential storage model: API keys for upstream providers are stored in a database or passed through environment variables. There is no per-request identity model or integration with enterprise IdP systems (Okta, Entra ID) at the request authorization level.
Cost attribution granularity: Spend tracking works at the virtual key level. Enforcing hard budget limits per team or per agent with automatic cutoff requires custom instrumentation on top of the base system.
Agent and tool governance: LiteLLM routes LLM calls. It has no native concept of MCP servers, tool registries, or governing which tools an agent is permitted to invoke. As enterprises shift from prompt-based workflows to agent-based workflows using the Model Context Protocol, an LLM proxy addresses only part of the infrastructure stack.
Container isolation: LiteLLM runs as a single proxy process. It does not provide workload isolation between agent sessions, teams, or tool servers. In multi-tenant environments, this creates a shared blast radius for any credential or session issue.
Kubernetes operational model: LiteLLM can run on Kubernetes, but it does not ship as a Kubernetes Operator with CRDs. GitOps deployment, declarative configuration, and platform-team-friendly lifecycle management require manual effort to build.

These are not bugs in LiteLLM. They reflect the scope it was designed to cover. The question is whether that scope still matches your requirements.

The Inflection Points: When Enterprises Outgrow an LLM Proxy

Inflection Point 1: The CISO Asks How Credentials Are Managed

LiteLLM’s virtual key system creates API keys that proxy to upstream provider keys. Those upstream keys must be stored somewhere, typically a database or secrets manager that LiteLLM accesses at runtime. When a security team asks “which credentials touched which model call, and can you prove no lateral movement occurred between sessions,” a virtual key model does not produce that audit record.

Enterprise-grade platforms address this with per-request identity. The request carries an identity token (via OIDC or OAuth 2.0) issued by an enterprise IdP. The platform validates that token on every call, logs the resolved identity alongside the request, and never stores upstream credentials in a location the calling agent can access. The credential lifecycle is managed by the platform’s authorization server, not by the proxy’s database.

Inflection Point 2: Finance Asks for Team-Level LLM Spend

“We spent $180,000 on OpenAI last quarter” is not useful to a CFO allocating infrastructure costs to business units. Enterprises need to know what the data platform team spent versus the customer success team’s AI features versus the internal developer tooling. They also need budget enforcement: when the customer success team hits its monthly token budget, their requests should fail with a clear error, not silently accrue overages.

LiteLLM provides spend tracking per virtual key. Mapping those keys to organizational units, enforcing limits with automatic cutoff, and producing per-team cost reports requires building and maintaining tooling on top of LiteLLM’s database. In practice, most platform teams reach this point and realize they are building a budget enforcement service.

Inflection Point 3: Agents Proliferate Across Teams

The shift from “call the LLM with a prompt” to “run an agent that invokes tools via MCP” is the architectural transition where an LLM proxy becomes insufficient on its own. An MCP platform governs not just what model the agent calls, but which tools it can access, which data sources those tools can reach, and whether each request is authorized under a defined policy.

As of mid-2026, the Model Context Protocol has become the standard mechanism for connecting LLM agents to external tools and data. A gateway that only proxies LLM API calls addresses one layer of the agent stack. The tool invocation layer requires a separate governance surface and that is where the design space diverges significantly from LiteLLM’s scope.

Inflection Point 4: Platform Engineering Takes Ownership

When a single team’s experiment becomes company-wide infrastructure, the platform engineering team inherits it. Their requirements differ from the original builders’: GitOps compatibility, Kubernetes Operators with declarative CRDs, Helm charts that fit existing CI/CD pipelines, Prometheus metrics that feed existing dashboards, and a support model that does not require reading a GitHub issue backlog. LiteLLM can run on Kubernetes; operating it at enterprise scale with platform-team tooling requires significant custom work.

How Stacklok Addresses These Inflection Points

Stacklok includes an enterprise MCP platform built on ToolHive, the most widely adopted open-source MCP platform (Apache 2.0 licensed). Stacklok is not a drop-in replacement for LiteLLM in the sense of a one-to-one feature swap. Stacklok governs the agent and tool layer, which is adjacent to and extends beyond LLM routing. For enterprises running agent-based workflows on Kubernetes that need to govern MCP tool access with enterprise identity, Stacklok addresses requirements that LiteLLM was never designed to meet.

Per-Request Identity Without Stored Credentials

Stacklok’s embedded authorization server supports OIDC and OAuth 2.0 SSO, with native integrations for Okta, Microsoft Entra ID, and Google Workspace. Every request carries a resolved identity. Credentials for upstream resources (databases, APIs, internal services) are never stored on the agent or in a proxy database accessible to the calling session. The platform manages credential issuance per-request, scoped to the identity and policy of the requester.

This architecture directly addresses the question a CISO asks: “What identity authorized this tool call, and where is the proof?” The audit log answers that question for every MCP server invocation.

Container Isolation by Default

Stacklok runs every MCP server in an isolated container with minimal permissions. Network access and filesystem permissions are configurable via JSON profiles. In a multi-tenant Kubernetes environment, this means the data team’s MCP servers share no process space, filesystem, or network scope with the customer success team’s servers. A credential issue or runaway process in one tenant does not propagate.

LiteLLM runs as a single proxy process. Stacklok’s container-per-server model provides a fundamentally different blast radius boundary.

Token Budget Enforcement at the Platform Layer

Stacklok’s vMCP component (the virtualized MCP gateway) includes an MCP Optimizer that reduces token usage 60 to 85 percent per request via on-demand tool discovery using hybrid semantic and keyword search. Rather than passing the full tool manifest to the model on every call, the optimizer surfaces only the tools relevant to the current request.

Token budget enforcement is applied at the platform layer, before requests reach the model. Policy can be configured per team, per agent identity, or per MCP server. Enforcement does not require custom application code or post-hoc spend reconciliation.

Kubernetes-Native Architecture with GitOps Support

Stacklok ships as a Kubernetes Operator with CRDs for the Runtime, Registry Server, Gateway (vMCP), and Portal components. Platform teams deploy and manage Stacklok through the same GitOps pipelines they use for other infrastructure. Helm charts, declarative configuration, and Kubernetes-native lifecycle management are first-class capabilities, not afterthoughts.

As of May 2026, a Fortune 500 financial services enterprise deployed Stacklok on Kubernetes, integrated it with their existing New Relic observability stack via OpenTelemetry, enforced data retention policies at the platform layer, and built a curated internal MCP server registry for their developer teams. That deployment runs without Stacklok SaaS dependencies.

OpenTelemetry Observability Out of the Box

Stacklok emits OpenTelemetry traces and Prometheus metrics aligned with the OTel MCP semantic conventions. Integration with Grafana, Datadog, Honeycomb, Splunk, and New Relic requires no custom instrumentation. Platform teams get the same observability model for MCP tool calls that they already use for application services.

Open-Source Auditability

ToolHive, Stacklok’s open-source foundation, is Apache 2.0 licensed and fully auditable on GitHub. Enterprises that require open-source review before production deployment can evaluate the full codebase. Stacklok’s provenance attestation and server signing capabilities allow platform teams to verify the integrity of every MCP server in their registry. Stacklok was founded by Craig McLuckie, co-founder of Kubernetes, and the supply chain security architecture reflects that lineage.

Which Platform Should You Choose?

Your developers need multi-provider LLM routing with minimal infrastructure overhead. LiteLLM is the right tool. It stands up quickly, supports every major provider, and has the largest community for this specific use case. Add Stacklok when your agent workloads require governed MCP tool access.

Your platform team has taken ownership of AI infrastructure and needs a Kubernetes-native governance layer for MCP tool access. Stacklok is built for this. The Kubernetes Operator, GitOps-compatible CRDs, and OpenTelemetry integration align with how platform teams already operate infrastructure. For this use case, LiteLLM’s manual Kubernetes setup creates ongoing operational debt.

Your CISO needs a complete audit trail of every agent action, including tool invocations, with per-request identity resolution. Stacklok’s authorization server and per-request OIDC identity model address this directly. LiteLLM’s virtual key system provides spend tracking but not resolved-identity audit logs for tool calls.

Your organization prohibits SaaS data processing and requires a fully self-hosted, open-source-auditable deployment. Stacklok’s ToolHive is Apache 2.0 licensed and designed for private cloud. MintMCP and Portkey are SaaS-only and do not support this requirement.

Your engineering teams are running Claude Code or Cursor at scale and need centralized policy governance. Stacklok provides centralized policy enforcement for coding agents across Kubernetes namespaces. No equivalent governance layer exists in LiteLLM for MCP tool access by coding agents.

Frequently Asked Questions

Q: Is Stacklok a replacement for LiteLLM?

Stacklok and LiteLLM address adjacent but distinct layers of AI infrastructure. LiteLLM routes LLM API calls across providers. Stacklok governs MCP tool access, agent identity, and container-isolated tool server execution on Kubernetes. Enterprises with mature agent workloads typically need both layers addressed, though not necessarily by the same tool.

Q: Can Stacklok run in a private cloud without any SaaS dependency?

Yes. Stacklok is designed for self-hosted, private cloud deployment. The Kubernetes Operator runs all components — Runtime, Registry Server, vMCP Gateway, and Portal — within your own infrastructure. There is no required call-home to Stacklok SaaS. ToolHive, the open-source foundation, is Apache 2.0 licensed and fully auditable at the source level before deployment.

Q: What credentials does Stacklok store, and where?

Stacklok’s embedded authorization server issues per-request credentials scoped to the resolved identity of the requesting agent. Upstream resource credentials (API keys, database tokens) are not stored in a proxy database accessible to calling agents. The authorization model is per-request OIDC/OAuth 2.0, integrated with enterprise IdP systems including Okta, Microsoft Entra ID, and Google Workspace.

Q: How does Stacklok handle token cost attribution across teams?

Stacklok enforces token budget policies at the platform layer through vMCP, before requests reach the model. The MCP Optimizer reduces token usage 60 to 85 percent per request via on-demand tool discovery, and budget limits are configurable per team or agent identity. Cost attribution is tied to the resolved identity on every request, producing team-level spend records without requiring custom application instrumentation.

May 28, 2026

How-To

Product Updates

Stacklok now supports Enterprise-Managed Authorization