v0.11.0
🧪 New Experiments
Query Expander component
We are introducing a component that generates a list of semantically similar queries to improve retrieval recall in RAG systems.
from haystack.components.generators.chat.openai import OpenAIChatGenerator
from haystack_experimental.components.query import QueryExpander
expander = QueryExpander(
chat_generator=OpenAIChatGenerator(model="gpt-4.1-mini"),
n_expansions=3
)
result = expander.run(query="green energy sources")
print(result["queries"])
# Output: ['alternative query 1', 'alternative query 2', 'alternative query 3', 'green energy sources']
# Note: Up to 3 additional queries + 1 original query (if include_original_query=True)
# To control total number of queries:
expander = QueryExpander(n_expansions=2, include_original_query=True) # Up to 3 total
# or
expander = QueryExpander(n_expansions=3, include_original_query=False) # Exactly 3 total- feat: add QueryExpander component by @mpangrazzi in #331
🔀 New Document Routers
We're introducing two new Routers: DocumentTypeRouter and DocumentLengthRouter.
🖼️ New Multimodal Features
We introduced several new multimodal features, mostly focused on indexing and retrieval.
A notebook will be published soon to show practical usage examples.
- multimodal support in
AmazonBedrockChatGenerator - new image Converters
SentenceTransformersDocumentImageEmbedder: a component to compute embeddings for image-based documentsLLMDocumentContentExtractor: a component to extract textual content from image-based documents using a vision-enabled LLM
Related PRs
- refactor: adopt pypdfium2 for PDF to image conversion by @anakin87 in #308
- feat: multimodal support in
AmazonBedrockChatGeneratorby @anakin87 in #307 - test: Fix mypy typing by @sjrl in #309
- feat: Add
DocumentToImageConentcomponent to help enable RAG with image Documents by @sjrl in #311 - chore: fix format for
DocumentToImageContentby @anakin87 in #318 - chore: ignore type errors in Bedrock monkey patches by @anakin87 in #322
- feat: add
SentenceTransformersDocumentImageEmbedderby @anakin87 in #319 - feat: Add
DocumentTypeRouterby @sjrl in #321 - refactor: refactor multimodal components and utility functions by @anakin87 in #324
- fix: Fix storage of file path in ImageContent by @sjrl in #325
- refactor: Refactor converters to follow embedders directory structure by @sjrl in #333
- feat: Add
normalize_embeddingstoSentenceTransformersDocumentImageEmbedderto match signature of other embedders by @sjrl in #335 - feat: add
DocumentLengthRoutercomponent by @anakin87 in #334 - feat: Add ImageFileToDocument converter by @sjrl in #336
- feat: Add
LLMDocumentContentExtractorto enable Vision-based LLMs to describe/convert an image into text by @sjrl in #338 - docs: add usage examples to docstrings of multimodal components by @anakin87 in #340
Other Updates
- refactor: synchronising/merging all pipeline related code with haystack main repository by @davidsbatista in #312
- chore: align Haystack experimental Hatch scripts by @anakin87 in #315
- chore: align experimental type checking with Haystack by @anakin87 in #320
- refactor: Refactor experimental Pipeline to use inheritancee by @sjrl in #323
- fix: refactor code and update
init_paramsindebug_stateby @Amnah199 in #317 - chore: fix
rufflinting error by @Amnah199 in #329 - fix: Fix logger message for pipeline breakpoints by @sjrl in #327
- fix: Fix validate_input becoming public method by @sjrl in #337
- Refactor serialization of breakpoints by @Amnah199 in #332
New Contributors
- @mpangrazzi made their first contribution in #331
Full Changelog: v0.10.0...v0.11