-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Is your feature request related to a problem? Please describe.
When building agents with a non-trivial number of tools (dozens → hundreds/thousands), passing the full tool schema set to the LLM on every request becomes impractical: context gets wasted, latency/cost increase, and tool selection quality can degrade because the model has too much irrelevant tool context for given intent. Users need a scalable way for agents to discover the right tools without bloating every LLM call. Tool Search Tool should be LLM provider agnostic.
Describe the solution you'd like
Add first-class Tool Search Tool support in Haystack: a standardized mechanism for tool discovery that lets an agent retrieve a small set of relevant tool candidates (top-K) from a large tool library at runtime, and then proceed with tool use based on those results.
The feature should support at least:
- BM25 / lexical search over tool metadata (name, description, tags, parameters).
- Embedding-based / semantic search over tool descriptions (and optionally richer metadata).
- A consistent interface/contract so agents can use the same feature regardless of the underlying LLM provider.
Nice-to-have (not required for initial release):
- Hybrid retrieval (BM25 + embeddings) and configurable ranking/fusion.
- Extensibility hooks to plug in custom indices / retrieval backends.
Describe alternatives you've considered
- Passing all tool schemas on every request (doesn’t scale well).
- Provider-specific features for tool discovery (limits portability and makes cross-provider behavior inconsistent).
- Keyword-only heuristics or hard-coded matching (poor recall on paraphrases and ambiguous phrasing).
Additional context
This request is inspired by the general “tool discovery / tool search” direction popularized by Anthropic and others, where large tool catalogs are searched at runtime and only a small candidate set is considered for a given user query.
References: