MCP Optimizer is now built into the Stacklok platform
When we shipped the Stacklok MCP Optimizer as an experimental feature in our open source ToolHive desktop app, we wanted to answer one question: does cutting tool metadata from the context window actually make a meaningful difference? The answer came back clearly. Teams running the Optimizer saw 60–85% token reductions per request and better model performance. The experiment is over. It’s time to move the Optimizer somewhere it can do more.
Here’s what you’ll learn in this post:
- Why token waste from large MCP tool catalogs is a real cost problem
- How the Optimizer reduces tokens by 60–85% per request
- Why Optimizer is now built into the capability that combines multiple MCP servers
- How to enable team-wide optimization with two configuration fields
What the Optimizer proved
The problem the Optimizer targets is subtle but expensive. When your AI assistant connects to multiple MCP servers, every prompt carries the full metadata for every available tool, regardless of whether those tools are relevant to the task. A simple “list the latest issues from my GitHub repo” doesn’t need Grafana’s dashboard tools or Notion’s page search. But without the Optimizer, the model receives all of them anyway.
The numbers from our benchmarks make the savings concrete:
| Prompt | Without Optimizer | With Optimizer | Reduction |
| Hello | 46.8k tokens, 114 tools | 11.2k tokens, 3 tools | 76% |
| List latest 10 issues from a repo | 102k tokens, 114 tools | 32.4k tokens, 11 tools | 68% |
| Summarize meeting notes | 240.6k tokens, 114 tools | 86.8k tokens, 11 tools | 64% |
| Search Grafana dashboards | 93.6k tokens, 114 tools | 13.7k tokens, 11 tools | 85% |
For full methodology and per-prompt breakdown, see Cut token waste from your AI workflow with the ToolHive MCP Optimizer.
The Optimizer’s two primitives — find_tool for hybrid semantic and keyword search, and call_tool for routing — replace the full tool catalog with a short, relevant list. The model sees fewer options, picks more accurately, and hallucinates tool calls less often.
The insight that shapes where it goes next: the Optimizer’s impact scales directly with the number of MCP servers behind it. The more servers, the larger the catalog it collapses, and the more tokens it saves. We built the Virtual MCP Server (vMCP) to combine multiple MCP servers, and that makes it an ideal home for the Optimizer.
What’s changing
Starting on April 22, 2026, the Optimizer will no longer be available as an experimental feature in the ToolHive Desktop UI.
If you’re currently using the Optimizer in the UI, the CLI path remains available while you evaluate alternatives. See the Optimizer CLI guide for setup instructions.
The Optimizer will be built into vMCP.
Why vMCP is the right home
vMCP is Stacklok’s unified gateway for multi-server MCP deployments. It aggregates multiple MCP servers behind a single endpoint, handles authentication and authorization centrally, and resolves tool name conflicts automatically when two servers expose a tool with the same name. Today, vMCP is intended to run in your Kubernetes environment, with local support coming in the future.
The Optimizer is most valuable precisely where vMCP already operates — across many MCP servers, serving many clients, at team scale.
Running the Optimizer inside vMCP means:
- No per-person setup. Users point their MCP client at the vMCP endpoint. The Optimizer is transparent to them, never requires them to configure it, and they benefit from it immediately on connection.
- No configuration drift. One shared embedding server, one set of search parameters, consistent behavior for everyone. No more wondering why one team member’s token bill is 3x everyone else’s.
- Better results as you add servers. Every new MCP server added to the gateway gets indexed automatically. The Optimizer’s coverage grows with your toolset without any additional work.
- Optimization plus everything else vMCP provides. Centralized auth, scoped tool access per team or role, GitOps-friendly configuration, and audit logging, all in the same deployment.
The configuration reflects how straightforward this is in practice. If you’re already running a VirtualMCPServer, enabling optimization requires two additions: an EmbeddingServer resource and a single reference field.
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: EmbeddingServer
metadata:
name: optimizer-embedding
spec: {}# In your VirtualMCPServer spec
embeddingServerRef:
name: optimizer-embeddingSee the Optimizer configuration reference and the vMCP quickstart for the full setup guide.
The operator handles the rest, indexing tools across all connected MCP servers, auto-populating search defaults, and wiring the embedding server URL. One shared instance serves every vMCP in the namespace.
This isn’t a migration that requires rethinking your setup. If you’re already using vMCP, you’re two fields away from team-wide token optimization.
Where to go from here
- Currently using the Optimizer in the UI: The CLI guide documents the equivalent setup while you evaluate vMCP.
- New to vMCP: Start with the vMCP quickstart and the Optimizer configuration reference.
- Want the full Optimizer benchmark details? The original Optimizer post walks through the methodology and per-prompt numbers. See how it stacks up against other optimization techniques in our head-to-head comparison with Anthropic’s Tool Search Tool.
Want to see what Stacklok can do for your organization? Book a demo or get started right away with ToolHive, our open source project. Join the conversation and engage directly with our team on Discord.
April 06, 2026