Release v0.13.0 · deepset-ai/haystack-experimental

🧪 New Experiments

Semantic Chunking based on Sentence Embeddings

We added a new EmbeddingBasedDocumentSplitter component that splits longer texts based on sentences that are semantically related. Users benefit from Documents that are more semantically coherent. The component is initialized with a TextEmbedder. PR #353

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack_experimental.components.preprocessors import EmbeddingBasedDocumentSplitter
doc = Document(
    content="This is a first sentence. This is a second sentence. This is a third sentence. "
    "Completely different topic. The same completely different topic."
)
embedder = SentenceTransformersDocumentEmbedder()
splitter = EmbeddingBasedDocumentSplitter(
    document_embedder=embedder,
    sentences_per_group=2,
    percentile=0.95,
    min_length=50,
    max_length=1000
)
splitter.warm_up()
result = splitter.run(documents=[doc])

Hallucination Risk Assessment for LLM Answers

The OpenAIChatGenerator can now estimate the risk of hallucinations in generated answers. You can configure a risk threshold and the OpenAIChatGenerator will refuse to return an answer if the risk of hallucination is above the threshold. Refer to research paper and repo for technical details on risk calculation. PR #359

👉 Try out the component here!

from haystack.dataclasses import ChatMessage
from haystack_experimental.components.generators.chat.openai import HallucinationScoreConfig, OpenAIChatGenerator

llm = OpenAIChatGenerator(model="gpt-4o")

rag_result = llm.run(
    messages=[
        ChatMessage.from_user(
            text="Task: Answer strictly based on the evidence provided below.\n"
                 "Question: Who won the Nobel Prize in Physics in 2019?\n"
                 "Evidence:\n"
                 "- Nobel Prize press release (2019): James Peebles (1/2); Michel Mayor & Didier Queloz (1/2).\n"
                 "Constraints: If evidence is insufficient or conflicting, refuse."
        )
    ],
    hallucination_score_config=HallucinationScoreConfig(skeleton_policy="evidence_erase"),
)
print(f"Decision: {rag_result['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {rag_result['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {rag_result['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{rag_result['replies'][0].text}")

Multi-Query Retrieval for Query Expansion

Two newly introduced components MultiQueryKeywordRetriever and MultiQueryEmbeddingRetriever enable concurrent processing of multiple queries. It works best in combination with a QueryExpander component, which given a single query of a user, generates multiple queries. You can learn more about query expansion in this jupyter notebook. PR #358

✅ Adopted Experiments

chore: removing code related with breakpoints #349
chore: remove multimodal experiment #350

Full Changelog: v0.12.0...v0.13.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.13.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🧪 New Experiments

Semantic Chunking based on Sentence Embeddings

Hallucination Risk Assessment for LLM Answers

Multi-Query Retrieval for Query Expansion

✅ Adopted Experiments

Uh oh!