MCP Optimizer is now built into the Stacklok platform

When we shipped the Stacklok MCP Optimizer as an experimental feature in our open source ToolHive desktop app, we wanted to answer one question: does cutting tool metadata from the context window actually make a meaningful difference? The answer came back clearly. Teams running the Optimizer saw 60–85% token reductions per request and better model performance. The experiment is over. It’s time to move the Optimizer somewhere it can do more.

Here’s what you’ll learn in this post:

Why token waste from large MCP tool catalogs is a real cost problem
How the Optimizer reduces tokens by 60–85% per request
Why Optimizer is now built into the capability that combines multiple MCP servers
How to enable team-wide optimization with two configuration fields

What the Optimizer proved

The problem the Optimizer targets is subtle but expensive. When your AI assistant connects to multiple MCP servers, every prompt carries the full metadata for every available tool, regardless of whether those tools are relevant to the task. A simple “list the latest issues from my GitHub repo” doesn’t need Grafana’s dashboard tools or Notion’s page search. But without the Optimizer, the model receives all of them anyway.

The numbers from our benchmarks make the savings concrete:

Prompt	Without Optimizer	With Optimizer	Reduction
Hello	46.8k tokens, 114 tools	11.2k tokens, 3 tools	76%
List latest 10 issues from a repo	102k tokens, 114 tools	32.4k tokens, 11 tools	68%
Summarize meeting notes	240.6k tokens, 114 tools	86.8k tokens, 11 tools	64%
Search Grafana dashboards	93.6k tokens, 114 tools	13.7k tokens, 11 tools	85%

For full methodology and per-prompt breakdown, see Cut token waste from your AI workflow with the ToolHive MCP Optimizer.

The Optimizer’s two primitives — find_tool for hybrid semantic and keyword search, and call_tool for routing — replace the full tool catalog with a short, relevant list. The model sees fewer options, picks more accurately, and hallucinates tool calls less often.

The insight that shapes where it goes next: the Optimizer’s impact scales directly with the number of MCP servers behind it. The more servers, the larger the catalog it collapses, and the more tokens it saves. We built the Virtual MCP Server (vMCP) to combine multiple MCP servers, and that makes it an ideal home for the Optimizer.

What’s changing

Starting on April 22, 2026, the Optimizer will no longer be available as an experimental feature in the ToolHive Desktop UI.

If you’re currently using the Optimizer in the UI, the CLI path remains available while you evaluate alternatives. See the Optimizer CLI guide for setup instructions.

The Optimizer will be built into vMCP.

Why vMCP is the right home

vMCP is Stacklok’s unified gateway for multi-server MCP deployments. It aggregates multiple MCP servers behind a single endpoint, handles authentication and authorization centrally, and resolves tool name conflicts automatically when two servers expose a tool with the same name. Today, vMCP is intended to run in your Kubernetes environment, with local support coming in the future.

The Optimizer is most valuable precisely where vMCP already operates — across many MCP servers, serving many clients, at team scale.

Running the Optimizer inside vMCP means:

No per-person setup. Users point their MCP client at the vMCP endpoint. The Optimizer is transparent to them, never requires them to configure it, and they benefit from it immediately on connection.
No configuration drift. One shared embedding server, one set of search parameters, consistent behavior for everyone. No more wondering why one team member’s token bill is 3x everyone else’s.
Better results as you add servers. Every new MCP server added to the gateway gets indexed automatically. The Optimizer’s coverage grows with your toolset without any additional work.
Optimization plus everything else vMCP provides. Centralized auth, scoped tool access per team or role, GitOps-friendly configuration, and audit logging, all in the same deployment.

The configuration reflects how straightforward this is in practice. If you’re already running a VirtualMCPServer, enabling optimization requires two additions: an EmbeddingServer resource and a single reference field.

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: EmbeddingServer
metadata:
  name: optimizer-embedding
spec: {}

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: EmbeddingServer
metadata:
  name: optimizer-embedding
spec: {}

# In your VirtualMCPServer spec
embeddingServerRef:
  name: optimizer-embedding

# In your VirtualMCPServer spec
embeddingServerRef:
  name: optimizer-embedding

See the Optimizer configuration reference and the vMCP quickstart for the full setup guide.

The operator handles the rest, indexing tools across all connected MCP servers, auto-populating search defaults, and wiring the embedding server URL. One shared instance serves every vMCP in the namespace.

This isn’t a migration that requires rethinking your setup. If you’re already using vMCP, you’re two fields away from team-wide token optimization.

Where to go from here

Currently using the Optimizer in the UI: The CLI guide documents the equivalent setup while you evaluate vMCP.
New to vMCP: Start with the vMCP quickstart and the Optimizer configuration reference.
Want the full Optimizer benchmark details? The original Optimizer post walks through the methodology and per-prompt numbers. See how it stacks up against other optimization techniques in our head-to-head comparison with Anthropic’s Tool Search Tool.

Want to see what Stacklok can do for your organization? Book a demo or get started right away with ToolHive, our open source project. Join the conversation and engage directly with our team on Discord.

April 06, 2026

Product Updates

Alejandro Ortiz

Senior Product Manager

Alejandro Ortiz is a Senior Product Manager at Stacklok, where he works at the forefront of enterprise AI infrastructure, specifically in the rapidly growing Model Context Protocol (MCP) ecosystem. Stacklok is the company behind ToolHive, widely recognized as the most-used open source MCP platform, and Alejandro plays a key role in shaping how organizations securely deploy and manage MCP servers at scale.

More by Alejandro Ortiz

Insights

Tool annotations are becoming the risk vocabulary for agentic systems. That matters more than it might seem.

How-To