Introducing the Stacklok AI Gateway

MCP access governance starts with RBAC

Hand-written Cedar policies got you authorization with your first few MCP servers. They will not get you to your next twenty. Here’s how Stacklok Enterprise brings role-based access control to ToolHive, and what it looks like in practice.

ToolHive is Stacklok’s open-source platform for running MCP servers on Kubernetes. Out of the box, it ships with a Cedar-based authorization engine that lets your team write fine-grained policies for each MCP server: who can call which tools, who can read which prompts, who can list which resources. That model works well when you have one or two servers. This post is about what happens next, and how Stacklok Enterprise makes ToolHive’s authorization scale to a real enterprise fleet by adding role-based access control on top of the same Cedar engine.

Here’s what you’ll learn in this post:

Why per-server permissions quietly become the bottleneck for every enterprise MCP rollout
Why Stacklok Enterprise defines its own roles instead of leaning on the ones in your IdP
How Stacklok Enterprise brings role-based access control to ToolHive, with the manifests your platform team will actually write
What a real access decision looks like end-to-end, from IdP login to MCP request

Most platform leaders believe MCP authorization is a configuration detail they can sort out later. They’re usually wrong, and by the time they realize it, they’re hand-editing policy files at 11 p.m. before an audit.

It starts innocently. One MCP server for one team. Then the data science team wants their own. Then platform engineering stands up a few. Eventually marketing brings on a vendor-supplied server, and you’re up to twenty. Six months in, you have three hundred employees who could use those servers, a security team asking who has access to what, and a single Cedar policy file per server that someone wrote by hand and nobody quite trusts.

The technology is new. The problem isn’t. Every previous wave of infrastructure went through this exact moment. The teams that got ahead of it ran for a decade with clean access governance. The teams that didn’t spent years untangling permissions they could no longer explain.

The bottleneck nobody sees coming

Authorization for MCP servers today is per-server and hand-written. That’s fine when you have one or two of them. It is not fine when you have twenty, and it’s not fine when the people asking “who can access what?” are auditors, security leaders, or your CISO.

Onboarding slows down first. A new team wants a new server, but someone has to write the policy, and that someone is usually a senior engineer who is not in a hurry to context-switch. Tickets pile up and adoption stalls. The AI tools that were supposed to make your company faster end up gated behind a config bottleneck.

Audits get painful next. When your security team asks who can call destructive tools on the GitHub MCP server, the answer lives across a dozen handwritten policy files that nobody can collate in an afternoon. The compliance review slips, and the engineers writing the answer aren’t the ones who wrote the original policies.

The expensive mistakes come last. Hand-written policies drift. Two teams write subtly different rules for similar servers. An overly permissive policy gets copy-pasted, and six months later the wrong group has access to a tool that can move money or delete production data. The cost of that one incident dwarfs the cost of solving the problem properly upfront.

The fix is the same one that worked for the cloud

Every previous platform shift solved this with role-based access control. You don’t manage cloud permissions by writing IAM JSON for every user. You define roles, bind them to groups, and let the system enforce the rest. Kubernetes works the same way. Every identity provider in your stack (Okta, Entra, Keycloak, Auth0) is already organized around groups your IT team maintains.

MCP authorization should work the same way. That’s what Stacklok Enterprise delivers for ToolHive.

Why roles live in Stacklok, not your IdP

A fair question at this point: your IdP already has groups, and many of them already have roles. Why introduce a second role concept in Stacklok Enterprise instead of just using what’s in Okta or Entra?

The short answer is downscoping. IdP groups model your organization. They describe who works where, like the platform team or the data science team. They don’t, and shouldn’t, describe what specific tools on a specific MCP server a person is allowed to call. The moment you try to encode that level of detail in your IdP, you end up creating a new group every time you add a server or carve out a tool, and your identity team becomes the bottleneck for every MCP access change.

Keeping roles in Stacklok Enterprise separates the two concerns. Your IdP stays clean and describes people. Stacklok Enterprise describes what people are allowed to do with MCP, at whatever level of granularity the product requires. The same engineering group can be a writer on a development server and a reader on a production one without anyone touching the IdP.

That separation also makes downscoping cheap. You can pre-build a broad role like writer, then narrow what it means on individual servers with an authorization policy, restrict it to specific tools, or block dangerous actions outright. None of that lands as a change request in your identity team’s queue. It lands as a small YAML edit your platform team owns end-to-end.

What this looks like in practice

Stacklok Enterprise provides three primitives, each answering a different question. Together they replace dozens of handwritten policy files with a small set of declarative manifests your platform team can review, diff, and ship through GitOps.

What can someone do? Define a role.

A role is a named bundle of MCP capabilities. Two roles ship pre-built and cover the common cases: a safe reader that grants browsing and read-only tool calls, and a full-access writer.

Custom roles use the same shape:

  apiVersion: platform.enterprise.stacklok.com/v1alpha1
  kind: ClusterPlatformRole
  metadata:
    name: security-auditor
  spec:
    description: "..."
    productActions:
      - apiGroup: toolhive.enterprise.stacklok.com
        actions: [call_tool, list_tools]

  apiVersion: platform.enterprise.stacklok.com/v1alpha1
  kind: ClusterPlatformRole
  metadata:
    name: security-auditor
  spec:
    description: "..."
    productActions:
      - apiGroup: toolhive.enterprise.stacklok.com
        actions: [call_tool, list_tools]

This means a security auditor can call and list tools but cannot reach the prompt library or pull resources. The action list is the whole permission surface.

Who gets that role? Bind it to an IdP group.

A role binding connects an identity-provider group to a role. The simplest case is one group, one role:

 apiVersion: platform.enterprise.stacklok.com/v1alpha1
 kind: ClusterPlatformRoleBinding
 spec:
   bindings:
     - roleRef:
         kind: ClusterPlatformRole
         name: writer
       from:
         - groups: [platform-eng]

 apiVersion: platform.enterprise.stacklok.com/v1alpha1
 kind: ClusterPlatformRoleBinding
 spec:
   bindings:
     - roleRef:
         kind: ClusterPlatformRole
         name: writer
       from:
         - groups: [platform-eng]

Anyone in platform-eng gets writer. When someone joins or leaves the group in your IdP, their MCP access changes automatically without any per-user maintenance.

Compound conditions are first-class, so policies like “engineering team-leads only” are expressible without escape hatches:

- platformRole: writer
  from:
    - groups: [engineering]
      roles: [team-lead]

- platformRole: writer
  from:
    - groups: [engineering]
      roles: [team-lead]

By default, unmapped groups grant nothing. A new IdP group appearing tomorrow confers zero access until someone explicitly binds it. Permissions are opt-in.

Where, and with what limits? Add an authorization policy.

Roles are organization-wide. When you need to narrow a role on a specific server, or block a specific tool outright, that’s the third layer:

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: ToolhiveAuthorizationPolicy
metadata:
  name: github-restricted
spec:
  targetRef:
    name: github
  bindings:
    - platformRole: writer
      ruleRestrictions:
        - tools: [create_pr, list_issues]

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: ToolhiveAuthorizationPolicy
metadata:
  name: github-restricted
spec:
  targetRef:
    name: github
  bindings:
    - platformRole: writer
      ruleRestrictions:
        - tools: [create_pr, list_issues]

On the GitHub server, writer is restricted to two tools. The role doesn’t change; only its scope on this server narrows.

The same CRD is where your security team plants safety rails. Deny rules override everything, including writer roles and admin grants:

spec:
  deny:
    - actions: [call_tool]
      tools: [delete_repo, force_push, transfer_repo]

spec:
  deny:
    - actions: [call_tool]
      tools: [delete_repo, force_push, transfer_repo]

Nothing can override a deny rule. That gives your security team a firm place to stand.

What a real request looks like

When Alice from platform engineering signs in and asks an MCP server to do something:

Her token arrives carrying her IdP groups: [platform-eng, engineering].
Stacklok’s controller has already compiled the role bindings. platform-eng resolves to writer.
The Cedar authorization engine, the same one ToolHive uses today, evaluates the request: it checks whether writer permits the action, whether any authorization policy narrows it, and whether a deny rule blocks it.
Allow or deny. The decision is auditable.

Alice never sees Cedar, and your platform team never wrote any. Everything load-bearing fits in a few short manifests, and when an auditor asks who can do what, the answer is one query away.

Make the safe path the easy path

The pattern that wins is the one that makes the secure choice the easy one. Hand-written per-server policies make safety hard. Role-based access tied to your IdP makes it the default.

The organizations that put access governance under their MCP rollout now will be the ones scaling AI agents safely a year from now. The ones that don’t will spend that year auditing their environment after the fact, by which point unwinding mistakes is much harder than preventing them.

If your MCP fleet is growing faster than your access controls, that’s the moment to talk to us.

Want to see what Stacklok can do for your organization? Book a demo or get started right away with ToolHive, our open source project. Join the conversation and engage directly with our team on Discord.

May 20, 2026

Last modified on May 19, 2026

Insights