May 2026
Search and Execute: Semantic Tool Discovery
As agent tool catalogs grow, loading every available action into context becomes impractical. Search and Execute solves this with two meta-tools: tool_search and tool_execute. Instead of loading hundreds of definitions upfront, the agent searches for what it needs and calls it.
This release upgrades the search layer from lexical matching to a fine-tuned semantic model, building on Agent Tools Discovery. In practice: a query like “enroll new hire in training” now correctly finds workday_create_learning_enrollment with no keyword overlap. The model was trained on SaaS actions specifically, so it handles the vocabulary gap between how people describe tasks and how APIs name them.
On accuracy, it hits 92.8% Hit@1 on scoped search and ranks #1 on ToolRet-full (a public benchmark of 44,453 tools), ahead of Qwen3-8B and NV-Embed-v1, up to 73x larger. Globally across all connectors, it reaches 57.3% Hit@1 versus 44% for Anthropic’s native tool use.
What’s New
- Fine-tuned semantic model: BGE-base-en-v1.5 (109M params), trained on SaaS actions with cross-connector hard negatives and LLM-generated query paraphrases. Replaces the previous lexical approach and generalises to connectors it hasn’t seen in training
- Three search modes:
semantic(API-backed, highest accuracy),local(in-process BM25+TF-IDF, no network calls), andauto(semantic with local fallback). Available across the TypeScript and Python SDKs, the live MCP, and the StackOne playground - Available via MCP: Add
?tool-mode=search_executeto the MCP URL. Works with any MCP client without custom headers. Omit the parameter and the server returns the full tool list as before tool_searchandtool_execute: The two meta-tools — renamed frommeta_search_toolsandmeta_execute_tool.tool_searchacceptsquery,topK, andminScore;tool_executeresolves account IDs from connector type automatically and supports multi-execute arrays in a single call
Setup
Search and Execute is available via MCP using tool-mode=search_execute, or through the TypeScript and Python SDKs.
MCP
https://api.stackone.com/mcp?tool-mode=search_execute&x-account-id=your_account_id
TypeScript
npm install @stackone/ai
const tools = await toolset.searchTools("manage employee time off", {
topK: 5,
search: "auto",
});
const openAITools = tools.toOpenAI();
// Or use meta-tools in an agent loop
const { tool_search, tool_execute } = await toolset.getTools();
Python
pip install stackone-ai
from stackone_ai import StackOneToolSet
toolset = StackOneToolSet()
# Search for tools
tools = toolset.search_tools("manage employee time off", top_k=5)
openai_tools = tools.to_openai()
# Or use meta-tools in an agent loop
tools = toolset.openai(mode="search_and_execute")
Benchmark
We benchmarked v2 against the previous synonym-enrichment approach (v1), BM25, and Anthropic’s native tool use. We also ran against ToolRet-full, a public benchmark across 44,453 tools.
Scoped search: 998 tools, 1,843 queries (connector known)
| Model | Hit@1 | Hit@10 | MRR |
|---|---|---|---|
| v2 (fine-tuned BGE-base) | 92.8% | 100% | 0.957 |
| v1 (synonym enrichment) | 59.0% | 96.8% | 0.712 |
| BM25 | 36.5% | 91.6% | 0.537 |
Global search: 998 tools, 1,843 queries (no connector hint)
| Model | Hit@1 | Hit@10 | MRR |
|---|---|---|---|
| v2 (fine-tuned BGE-base) | 57.3% | 82.3% | 0.661 |
| Anthropic Tool Use (Claude Sonnet) | 44.0% | 69.3% | 0.517 |
| v1 (synonym enrichment) | 27.6% | 58.7% | 0.377 |
ToolRet-full: 44,453 tools, 7,961 queries
| Model | nDCG@10 | Params |
|---|---|---|
| v2 (fine-tuned BGE-base) | 0.544 | 109M |
| Qwen3-Embedding-8B | 0.462 | 8,000M |
| NV-Embed-v1 | 0.427 | 7,000M |
| GritLM-7B | 0.411 | 7,000M |
| v1 (synonym enrichment) | ~0.26 | 22M |
For the methodology behind the v2 model, read How a 109M Embedding Model Beats 8B on Tool Retrieval.
Resources
- Search and Execute docs
- Tool Discovery
- Questions? Reach out at ai@stackone.com