Skip to content

Releases: deepset-ai/haystack

v1.17.0-rc1

23 May 16:14
b8911df

Choose a tag to compare

v1.17.0-rc1 Pre-release
Pre-release
v1.17.0-rc1

v1.16.1

28 Apr 14:41

Choose a tag to compare

What's changed

Full Changelog: v1.16.0...v1.16.1

v1.16.0

27 Apr 18:04

Choose a tag to compare

⭐️ Highlights

Using GPT-4 through PromptNode and Agent

Haystack now supports GPT-4 through PromptNode and Agent. This means you can use the latest advancements in large language modeling to make your NLP applications more accurate and efficient.

To get started, create a PromptModel for GPT-4 and plug it into your PromptNode. Just like with ChatGPT, you can use GPT-4 in a chat scenario and ask follow-up questions, as shown in this example:

prompt_model = PromptModel("gpt-4", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)

More flexible routing of Documents with RouteDocuments

This release includes an enhancement to the RouteDocuments node, which makes Document routing even more flexible.

The RouteDocuments node now not only returns Documents matched by the split_by or metadata_values parameter, but also creates an extra route for unmatched Documents. This means that you won't accidentally filter out any Documents due to missing metadata fields. Additionally, the update adds support for using List[List[str]] as input type to metadata_values, so multiple metadata values can be grouped into a single output.

Deprecating RAGenerator and Seq2SeqGenerator

RAGenerator and Seq2SeqGenerator are deprecated and will be removed in version 1.18. We advise using the more powerful PromptNode instead, which can use RAG and Seq2Seq models as well. The following example shows how to use PromptNode as a replacement for Seq2SeqGenerator:

p = PromptNode("vblagoje/bart_lfqa")

# Start by defining a question/query
query = "Why does water heated to room temperature feel colder than the air around it?"

# Given the question above, suppose the documents below were found in some document store
documents = [
    "when the skin is completely wet. The body continuously loses water by...",
    "at greater pressures. There is an ambiguity, however, as to the meaning of the terms 'heating' and 'cooling'...",
    "are not in a relation of thermal equilibrium, heat will flow from the hotter to the colder, by whatever pathway...",
    "air condition and moving along a line of constant enthalpy toward a state of higher humidity. A simple example ...",
    "Thermal contact conductance. In physics, thermal contact conductance is the study of heat conduction between solid ...",
]


# Manually concatenate the question and support documents into BART input
# conditioned_doc = "<P> " + " <P> ".join([d for d in documents])
# query_and_docs = "question: {} context: {}".format(query, conditioned_doc)

# Or use the PromptTemplate as shown here
pt = PromptTemplate("lfqa", "question: {query} context: {join(documents, delimiter='<P>')}")

res = p.prompt(prompt_template=pt, query=query, documents=[Document(d) for d in documents])

⚠️ Breaking Changes

Refactoring of our dependency management

We added the following extras as optional dependencies for Haystack: stats, metrics, preprocessing, file-conversion, and elasticsearch. To keep using certain components, you need to install farm-haystack with these new extras:

Component Installation extra
PreProcessor farm-haystack[preprocessing]
DocxToTextConverter farm-haystack[file-conversion]
TikaConverter farm-haystack[file-conversion]
LangdetectDocumentLanguageClassifier farm-haystack[file-conversion]
ElasticsearchDocumentStore farm-haystack[elasticsearch]

Dropping support for Python 3.7

Since Python 3.7 will reach end of life in June 2023, we will no longer support it as of Haystack version 1.16.

Smaller Breaking Changes

  • Using TableCell instead of Span to indicate the coordinates of a table cell (#4616)
  • Default save_dir for FARMReader's train method changed to f"./saved_models/{self.inferencer.model.language_model.name}" (#4553)
  • Using PreProcessor with split_respect_sentence_boundary set to True might return a different set of Documents than in v1.15 (#4470)

What's Changed

Breaking Changes

  • feat: Deduplicate duplicate Answers resulting from overlapping Documents in FARMReader by @bogdankostic in #4470
  • feat: Change default save_dir for FARMReader.train by @GitIgnoreMaybe in #4553
  • feat!: drop Python3.7 support by @ZanSara in #4421
  • refactor!: extract evaluation and statistical dependencies by @ZanSara in #4457
  • refactor!: extract preprocessing and file conversion deps by @ZanSara in #4605
  • feat: Implementation of Table Cell Proposal by @sjrl in #4616

Pipeline

DocumentStores

  • fix: Check for date fields in weaviate meta update by @joekitsmith in #4371
  • chore: skip Milvus tests by @ZanSara in #4654
  • docs: Add deprecation information to doc string of MilvusDocumentStore by @bogdankostic in #4658
  • Ignore cross-reference properties when loading documents by @masci in #4664
  • fix: PineconeDocumentStore error when delete_documents right after initialization by @Namoush in #4609
  • fix: remove warnings from the more recent Elasticsearch client by @masci in #4602
  • fix: Fixing the Weaviate BM25 query builder bug by @zoltan-fedor in #4703

Documentation

Read more

v1.15.1

12 Apr 10:49
48b4b99

Choose a tag to compare

What's Changed

  • fix: provide a fallback for PyMuPDF by @masci in #4564
  • refactor: Adjust WhisperTranscriber to pipeline run methods by @vblagoje in #4510

Full Changelog: v1.15.0...v1.15.1

v1.15.1-rc1

31 Mar 13:31
8f70519

Choose a tag to compare

v1.15.1-rc1 Pre-release
Pre-release
v1.15.1-rc1

v1.15.0

30 Mar 09:02
1ed4caf

Choose a tag to compare

⭐ Highlights

Build Agents Yourself with Open Source

Exciting news! Say hello to LLM-based Agents, the new decision makers for your NLP applications! These agents have the power to answer complex questions by creating a dynamic action plan and using a variety of Tools in a loop. Picture this: your Agent decides to tackle a multi-hop question by retrieving pieces of information through a web search engine again and again. That's just one of the many feats these Agents can accomplish. Excited about the recent ChatGPT plugins? Agents allow you to build similar experiences in an open source way: your own environment, full control and transparency.
But how do you get started? First, wrap your Haystack Pipeline in a Tool and give your Agent a description of what that Tool can do. Then, initialize your Agent with a list of Tools and a PromptNode that decides when to use each Tool.

web_qa_tool = Tool(
    name="Search",
    pipeline_or_node=WebQAPipeline(retriever=web_retriever, prompt_node=web_qa_pn),
    description="useful for when you need to Google questions.",
    output_variable="results",
)

agent = Agent(
    prompt_node=agent_pn,
    prompt_template=prompt_template,
    tools=[web_qa_tool],
    final_answer_pattern=r"Final Answer\s*:\s*(.*)",
)
agent.run(query="<Your question here!>")

Check out the full example, a stand-alone WebQAPipeline, our new tutorials and the documentation!

Flexible PromptTemplates

Get ready to take your Pipelines to the next level with the revamped PromptNode. Now you have more flexibility when it comes to shaping the PromptNode outputs and inputs to work seamlessly with other nodes. But wait, there's more! You can now apply functions right within prompt_text. Want to concatenate the content of input documents? No problem! It's all possible with the PromptNode. And that's not all! The output_parser converts output into Haystack Document, Answer, or Label formats. Check out the AnswerParser in action, fully loaded and ready to use:

PromptTemplate(
            name="question-answering",
            prompt_text="Given the context please answer the question.\n" 
            			"Context: {join(documents)}\n"
            			"Question: {query}\n"
            			"Answer: ",
            output_parser=AnswerParser(),
        )

More details here.

Using ChatGPT through PromptModel

A few lines of code are all you need to start chatting with ChatGPT through Haystack! The simple message format distinguishes instructions, user questions, and assistant responses. And with the chat functionality you can ask follow-up questions as in this example:

prompt_model = PromptModel("gpt-3.5-turbo", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)

Haystack Extras

We now have another repo haystack-extras with extra Haystack components, like audio nodes AnswerToSpeech and DocumentToSpeech. For example, these two can be installed via:

pip install farm-haystack-text2speech

What's Changed

Breaking Changes

  • feat!: Increase Crawler standardization regarding Pipelines by @danielbichuetti in #4122
  • feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation by @danielbichuetti in #4226
  • build: Use uvicorn instead of gunicorn as server in REST API's Dockerfile by @bogdankostic in #4304
  • chore!: remove deprecated OpenDistroElasticsearchDocumentStore by @masci in #4361
  • refactor: Remove AnswerToSpeech and DocumentToSpeech nodes by @silvanocerza in #4391
  • fix: Fix debug on PromptNode by @recrudesce in #4483
  • feat: PromptTemplate extensions by @tstadel in #4378

Pipeline

  • feat: Add JsonConverter node by @bglearning in #4130
  • fix: Shaper store all outputs from function by @sjrl in #4223
  • refactor: Isolate PDF OCR converter from PDF text converter by @danielbichuetti in #4193
  • fix: add option to not override results by Shaper by @tstadel in #4231
  • feat: reduce and focus telemetry by @ZanSara in #4087
  • refactor: Remove deprecated nodes EvalDocuments and EvalAnswers by @anakin87 in #4194
  • refact: mark unit tests under the test/nodes/** path by @masci in #4235
  • fix: FARMReader produces Answers with negative start and end position by @julian-risch in #4248
  • test: replace ElasticsearchDS with InMemoryDS when it makes sense; support scale_score in InMemoryDS by @anakin87 in #4283
  • test: mock all Translator tests and move one to e2e by @ZanSara in #4290
  • fix: Prevent going past token limit in OpenAI calls in PromptNode by @sjrl in #4179
  • feat: Add Azure OpenAI embeddings support by @danielbichuetti in #4332
  • test: move tests on standard pipelines in e2e/ by @ZanSara in #4309
  • fix: EvalResult load migration by @tstadel in #4289
  • feat: Report execution time for pipeline components in _debug by @zoltan-fedor in #4197
  • refactor: Use TableQuestionAnsweringPipeline from transformers by @sjrl in #4303
  • fix: hf-tiny-roberta model loading from disk and mypy errors by @mayankjobanputra in #4363
  • docs: TransformersImageToText- inform about supported models, better exception handling by @anakin87 in #4310
  • fix: check that answer is not None before accessing it in table.py by @culms in #4376
  • feat: add automatic OCR detection mechanism and improve performance by @danielbichuetti in #4329
  • Add Whisper node by @vblagoje in #4335
  • tests: Mark Crawler tests correctly by @silvanocerza in #4435
  • test: Skip flaky test_multimodal_retriever_query by @silvanocerza in #4444
  • fix: issue evaluation check for content type by @ju-gu in #4181
  • feat: break retry loop for 401 unauthorized errors in promptnode by @FHardow in #4389
  • refactor: Remove retry_with_exponential_backoff in favor of tenacity by @silvanocerza in #4460
  • refactor: Remove ElasticsearchRetriever and ElasticsearchFilterOnlyRetriever by @silvanocerza in #4499
  • refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever by @silvanocerza in #4500
  • refactor: remove telemetry v1 by @ZanSara in #4496
  • feat: expose prompts to Answer and EvaluationResult by @tstadel in #4341
  • feat: Add agent tools by @vblagoje in #4437
  • refactor: reduce telemetry events count by @ZanSara in #4501

DocumentStores

  • fix: OpenSearchDocumentStore.delete_index doesn't raise by @tstadel in #4295
  • fix: increase MetaDocumentORM value length in SQLDocumentStore by @anakin87 in #4333
  • fix: when using IVF* indexing, ensure the index is trained frist by @kaixuanliu in #4311
  • refactor: Mark MilvusDocumentStore as deprecated by @silvanocerza in #4498

Documentation

Read more

v1.15.0-rc2

28 Mar 08:52
22942ca

Choose a tag to compare

v1.15.0-rc2 Pre-release
Pre-release
v1.15.0-rc2

v1.15.0-rc1

28 Mar 06:59

Choose a tag to compare

v1.15.0-rc1 Pre-release
Pre-release
v1.15.0-rc1

v1.14.0

28 Feb 13:59

Choose a tag to compare

⭐ Highlights

PromptNode enhancements

PromptNode just rolled out prompt logging (pipeline debug), run_batch, and model_kwargs support. More updates to PromptNode and PromptTemplates coming soon!

Shaper

We're introducing the Shaper, PromptNode's helper. Shaper unlocks the full potential of PromptNode and ensures its seamless integration with Haystack. But Shaper's scope and functionality are not limited to PromptNode; you can also use it independently, opening up a whole new world of possibilities.

IVF and Product Quantization support for OpenSearchDocumentStore

We've added support for IVF and IVF with Product Quantization to OpenSearchDocumentStore. You can train the IVF index by calling train_index method (same as in FAISSDocumentStore) or by setting ivf_train_size when initializing OpenSearchDocumentStore and take your search to the next level.

What's Changed

Breaking Changes

  • refactor: Updated rest_api schema for tables to be consistent with Document.to_dict by @sjrl in #3872
  • feat: Support multiple document_ids in Answer object (for generative QA) by @tstadel in #4062
  • feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode by @sjrl in #4038
  • build: cache nltk models into the docker image by @mayankjobanputra in #4118
  • feat: Add IVF and Product Quantization support for OpenSearchDocumentStore by @bogdankostic in #3850

Pipeline

DocumentStores

  • refactor: use weaviate client to build BM25 query by @hsm207 in #3939
  • fix: fixed InMemoryDocumentStore.get_embedding_count to return correct number by @sjrl in #3980
  • fix: Add inner query for mysql compatibility by @julian-risch in #4068
  • feat: add support for custom headers by @hsm207 in #4040
  • feat: Add BM25 support for tables in InMemoryDocumentStore by @bogdankostic in #4090
  • refactor: InMemoryDocumentStore - manage documents without embedding & fix mypy errors by @anakin87 in #4113
  • refactor: complete the document stores test refactoring by @masci in #4125
  • feat: include testing facilities into haystack package by @masci in #4182

Documentation

  • Align with the docs install guide + correct lg by @agnieszka-m in #3950
  • docs: Update Crawler docstring for correct usage in Google colab by @silvanocerza in #3979
  • Docs: Update docstrings by @agnieszka-m in #4119
  • docs: Update Annotation Tool README.md by @bogdankostic in #4123
  • feat: Add model_kwargs option to PromptNode by @sjrl in #4151
  • fix: Remove logging statement of setting ID manually in Document by @bogdankostic in #4129
  • chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option by @TuanaCelik in #4135
  • chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used by @TuanaCelik in #4155
  • feat: Implement run_batch for PromptNode by @sjrl in #4072

Other Changes

Read more

v1.14.0rc2

22 Feb 17:20

Choose a tag to compare

v1.14.0rc2 Pre-release
Pre-release

What's Changed

  • fix: add option to not override results by Shaper #4231
  • fix: Shaper store all outputs from function #4223
  • fix: allowing file-upload api to write files to disk #4221
  • fix: Fix bug in prompt template check of OpenAIAnswerGenerator #4220
  • feat: add top_k to PromptNode #4159
  • feat: Add JsonConverter node #4130