23 May 16:14

masci

b8911df

v1.17.0-rc1 Pre-release

Pre-release

v1.17.0-rc1

Assets 2

28 Apr 14:41

bogdankostic

v1.16.1

0b5d680

v1.16.1

What's changed

fix: update ImportError for 'metrics' dependency by @bilgeyucel in #4778

Full Changelog: v1.16.0...v1.16.1

Contributors

bilgeyucel

Assets 2

27 Apr 18:04

bogdankostic

v1.16.0

d72cf07

v1.16.0

⭐️ Highlights

Using GPT-4 through `PromptNode` and `Agent`

Haystack now supports GPT-4 through PromptNode and Agent. This means you can use the latest advancements in large language modeling to make your NLP applications more accurate and efficient.

To get started, create a PromptModel for GPT-4 and plug it into your PromptNode. Just like with ChatGPT, you can use GPT-4 in a chat scenario and ask follow-up questions, as shown in this example:

prompt_model = PromptModel("gpt-4", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)

Deprecating `RAGenerator` and `Seq2SeqGenerator`

RAGenerator and Seq2SeqGenerator are deprecated and will be removed in version 1.18. We advise using the more powerful PromptNode instead, which can use RAG and Seq2Seq models as well. The following example shows how to use PromptNode as a replacement for Seq2SeqGenerator:

p = PromptNode("vblagoje/bart_lfqa")

# Start by defining a question/query
query = "Why does water heated to room temperature feel colder than the air around it?"

# Given the question above, suppose the documents below were found in some document store
documents = [
    "when the skin is completely wet. The body continuously loses water by...",
    "at greater pressures. There is an ambiguity, however, as to the meaning of the terms 'heating' and 'cooling'...",
    "are not in a relation of thermal equilibrium, heat will flow from the hotter to the colder, by whatever pathway...",
    "air condition and moving along a line of constant enthalpy toward a state of higher humidity. A simple example ...",
    "Thermal contact conductance. In physics, thermal contact conductance is the study of heat conduction between solid ...",
]


# Manually concatenate the question and support documents into BART input
# conditioned_doc = "<P> " + " <P> ".join([d for d in documents])
# query_and_docs = "question: {} context: {}".format(query, conditioned_doc)

# Or use the PromptTemplate as shown here
pt = PromptTemplate("lfqa", "question: {query} context: {join(documents, delimiter='<P>')}")

res = p.prompt(prompt_template=pt, query=query, documents=[Document(d) for d in documents])

⚠️ Breaking Changes

Refactoring of our dependency management

We added the following extras as optional dependencies for Haystack: stats, metrics, preprocessing, file-conversion, and elasticsearch. To keep using certain components, you need to install farm-haystack with these new extras:

Component	Installation extra
`PreProcessor`	`farm-haystack[preprocessing]`
`DocxToTextConverter`	`farm-haystack[file-conversion]`
`TikaConverter`	`farm-haystack[file-conversion]`
`LangdetectDocumentLanguageClassifier`	`farm-haystack[file-conversion]`
`ElasticsearchDocumentStore`	`farm-haystack[elasticsearch]`

Dropping support for Python 3.7

Since Python 3.7 will reach end of life in June 2023, we will no longer support it as of Haystack version 1.16.

Smaller Breaking Changes

Using TableCell instead of Span to indicate the coordinates of a table cell (#4616)
Default save_dir for FARMReader's train method changed to f"./saved_models/{self.inferencer.model.language_model.name}" (#4553)
Using PreProcessor with split_respect_sentence_boundary set to True might return a different set of Documents than in v1.15 (#4470)

What's Changed

Breaking Changes

feat: Deduplicate duplicate Answers resulting from overlapping Documents in FARMReader by @bogdankostic in #4470
feat: Change default save_dir for FARMReader.train by @GitIgnoreMaybe in #4553
feat!: drop Python3.7 support by @ZanSara in #4421
refactor!: extract evaluation and statistical dependencies by @ZanSara in #4457
refactor!: extract preprocessing and file conversion deps by @ZanSara in #4605
feat: Implementation of Table Cell Proposal by @sjrl in #4616

Pipeline

fix: Fix pipeline config and agent tools hashing for telemetry by @silvanocerza in #4508
refactor: Adjust WhisperTranscriber to pipeline run methods by @vblagoje in #4510
Adding filtering support for Weaviate when used for BM25 querying by @zoltan-fedor in #4385
test: Remove duplicate whisper test by @julian-risch in #4567
fix: provide a fallback for PyMuPDF by @masci in #4564
Docs: Shaper API update by @agnieszka-m in #4542
Docs: Update Whisper API. by @agnieszka-m in #4539
refactor: remove variadic parameters in WebSearch initialization; make new nodes directly importable by @anakin87 in #4581
test: Add pytest fixture to block requests in unit tests by @silvanocerza in #4433
test: Rework conftest by @silvanocerza in #4614
feat: arbitrary crawler_depth for Crawler class by @benheckmann in #4623
fix: ParsrConverter list element added by @Namoush in #4562
fix: make langdetect truly optional by @ZanSara in #4686
feat: More flexible routing for RouteDocuments node by @sjrl in #4690
docs: Adapt Shaper docstrings regarding dropping metadata by @bogdankostic in #4655

DocumentStores

fix: Check for date fields in weaviate meta update by @joekitsmith in #4371
chore: skip Milvus tests by @ZanSara in #4654
docs: Add deprecation information to doc string of MilvusDocumentStore by @bogdankostic in #4658
Ignore cross-reference properties when loading documents by @masci in #4664
fix: PineconeDocumentStore error when delete_documents right after initialization by @Namoush in #4609
fix: remove warnings from the more recent Elasticsearch client by @masci in #4602
fix: Fixing the Weaviate BM25 query builder bug by @zoltan-fedor in #4703

Documentation

Docs: Update Seq2SeqGen models and docstrings lg by @agnieszka-m in #4595
feat: Load documents from remote - helper function by @TuanaCelik in #4545
refactor: Remove unecessary literal_eval when parsing env var by @silvanocerza in #4570
Docs: Fix QuestionGenerator and Summarizer docstrings by @agnieszka-m in #4594
refactor: Rework prompt tests by @silvanocerza in #4600
feat: Add util method to make HTTP requests with configurable retry by @silvanocerza in #4627
refactor: Rework invocation layers by @silvanocerza in #4615
refactor: Add 503 as status code that triggers retry in request_with_retry by @silvanocerza in #4640
feat: initial implementation of MemoryDocumentStore for new Pipelines by @ZanSara in #4447
docs: Add PDFToTextOCRConverter to API Docs by @bogdankostic in #4656
Docs: Add max length unit to PromptNode API docs by @agnieszka-m in #4601
fix: Add model_max_length model_kwargs parameter to HF PromptNode by @vblagoje in #4651
feat: Add chatgpt streaming by @vblagoje in #4659
feat: Add Hugging Face inferencing PromptNode layer by @vblagoje in #4641
refactor: node->component by @ZanSara in #4687
feat: Add AzureChatGPT Capability using new InvocationLayer style by @recrudesce in #4675
...

Contributors

masci, vblagoje, and 18 other contributors

Assets 2

12 Apr 10:49

julian-risch

v1.15.1

48b4b99

v1.15.1

What's Changed

fix: provide a fallback for PyMuPDF by @masci in #4564
refactor: Adjust WhisperTranscriber to pipeline run methods by @vblagoje in #4510

Full Changelog: v1.15.0...v1.15.1

Contributors

masci and vblagoje

Assets 2

31 Mar 13:31

julian-risch

v1.15.1-rc1

8f70519

v1.15.1-rc1 Pre-release

Pre-release

v1.15.1-rc1

Assets 2

30 Mar 09:02

julian-risch

v1.15.0

1ed4caf

v1.15.0

⭐ Highlights

Build Agents Yourself with Open Source

Exciting news! Say hello to LLM-based Agents, the new decision makers for your NLP applications! These agents have the power to answer complex questions by creating a dynamic action plan and using a variety of Tools in a loop. Picture this: your Agent decides to tackle a multi-hop question by retrieving pieces of information through a web search engine again and again. That's just one of the many feats these Agents can accomplish. Excited about the recent ChatGPT plugins? Agents allow you to build similar experiences in an open source way: your own environment, full control and transparency.
But how do you get started? First, wrap your Haystack Pipeline in a Tool and give your Agent a description of what that Tool can do. Then, initialize your Agent with a list of Tools and a PromptNode that decides when to use each Tool.

web_qa_tool = Tool(
    name="Search",
    pipeline_or_node=WebQAPipeline(retriever=web_retriever, prompt_node=web_qa_pn),
    description="useful for when you need to Google questions.",
    output_variable="results",
)

agent = Agent(
    prompt_node=agent_pn,
    prompt_template=prompt_template,
    tools=[web_qa_tool],
    final_answer_pattern=r"Final Answer\s*:\s*(.*)",
)
agent.run(query="<Your question here!>")

Check out the full example, a stand-alone WebQAPipeline, our new tutorials and the documentation!

Flexible PromptTemplates

Get ready to take your Pipelines to the next level with the revamped PromptNode. Now you have more flexibility when it comes to shaping the PromptNode outputs and inputs to work seamlessly with other nodes. But wait, there's more! You can now apply functions right within prompt_text. Want to concatenate the content of input documents? No problem! It's all possible with the PromptNode. And that's not all! The output_parser converts output into Haystack Document, Answer, or Label formats. Check out the AnswerParser in action, fully loaded and ready to use:

PromptTemplate(
            name="question-answering",
            prompt_text="Given the context please answer the question.\n" 
            			"Context: {join(documents)}\n"
            			"Question: {query}\n"
            			"Answer: ",
            output_parser=AnswerParser(),
        )

More details here.

Using ChatGPT through PromptModel

A few lines of code are all you need to start chatting with ChatGPT through Haystack! The simple message format distinguishes instructions, user questions, and assistant responses. And with the chat functionality you can ask follow-up questions as in this example:

prompt_model = PromptModel("gpt-3.5-turbo", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)

Haystack Extras

We now have another repo haystack-extras with extra Haystack components, like audio nodes AnswerToSpeech and DocumentToSpeech. For example, these two can be installed via:

pip install farm-haystack-text2speech

What's Changed

Breaking Changes

feat!: Increase Crawler standardization regarding Pipelines by @danielbichuetti in #4122
feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation by @danielbichuetti in #4226
build: Use uvicorn instead of gunicorn as server in REST API's Dockerfile by @bogdankostic in #4304
chore!: remove deprecated OpenDistroElasticsearchDocumentStore by @masci in #4361
refactor: Remove AnswerToSpeech and DocumentToSpeech nodes by @silvanocerza in #4391
fix: Fix debug on PromptNode by @recrudesce in #4483
feat: PromptTemplate extensions by @tstadel in #4378

Pipeline

feat: Add JsonConverter node by @bglearning in #4130
fix: Shaper store all outputs from function by @sjrl in #4223
refactor: Isolate PDF OCR converter from PDF text converter by @danielbichuetti in #4193
fix: add option to not override results by Shaper by @tstadel in #4231
feat: reduce and focus telemetry by @ZanSara in #4087
refactor: Remove deprecated nodes EvalDocuments and EvalAnswers by @anakin87 in #4194
refact: mark unit tests under the test/nodes/** path by @masci in #4235
fix: FARMReader produces Answers with negative start and end position by @julian-risch in #4248
test: replace ElasticsearchDS with InMemoryDS when it makes sense; support scale_score in InMemoryDS by @anakin87 in #4283
test: mock all Translator tests and move one to e2e by @ZanSara in #4290
fix: Prevent going past token limit in OpenAI calls in PromptNode by @sjrl in #4179
feat: Add Azure OpenAI embeddings support by @danielbichuetti in #4332
test: move tests on standard pipelines in e2e/ by @ZanSara in #4309
fix: EvalResult load migration by @tstadel in #4289
feat: Report execution time for pipeline components in _debug by @zoltan-fedor in #4197
refactor: Use TableQuestionAnsweringPipeline from transformers by @sjrl in #4303
fix: hf-tiny-roberta model loading from disk and mypy errors by @mayankjobanputra in #4363
docs: TransformersImageToText- inform about supported models, better exception handling by @anakin87 in #4310
fix: check that answer is not None before accessing it in table.py by @culms in #4376
feat: add automatic OCR detection mechanism and improve performance by @danielbichuetti in #4329
Add Whisper node by @vblagoje in #4335
tests: Mark Crawler tests correctly by @silvanocerza in #4435
test: Skip flaky test_multimodal_retriever_query by @silvanocerza in #4444
fix: issue evaluation check for content type by @ju-gu in #4181
feat: break retry loop for 401 unauthorized errors in promptnode by @FHardow in #4389
refactor: Remove retry_with_exponential_backoff in favor of tenacity by @silvanocerza in #4460
refactor: Remove ElasticsearchRetriever and ElasticsearchFilterOnlyRetriever by @silvanocerza in #4499
refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever by @silvanocerza in #4500
refactor: remove telemetry v1 by @ZanSara in #4496
feat: expose prompts to Answer and EvaluationResult by @tstadel in #4341
feat: Add agent tools by @vblagoje in #4437
refactor: reduce telemetry events count by @ZanSara in #4501

DocumentStores

fix: OpenSearchDocumentStore.delete_index doesn't raise by @tstadel in #4295
fix: increase MetaDocumentORM value length in SQLDocumentStore by @anakin87 in #4333
fix: when using IVF* indexing, ensure the index is trained frist by @kaixuanliu in #4311
refactor: Mark MilvusDocumentStore as deprecated by @silvanocerza in #4498

Documentation

feat: add top_k to PromptNode by @tstadel in #4159
feat: Add Agent by @julian-risch in #4148
ci: Automate OpenAPI specs upload to Readme.io by @silvanocerza in #4228
ci: Refactor docs config and generation by @silvanocerza in #4280
feat: Add Azure as OpenAI endpoint by @vblagoje in #4170
refactor: Allow flexible document id generation by @danielbichuetti in https://github.com/deepset-a...

Contributors

masci, vblagoje, and 25 other contributors

Assets 2

28 Mar 08:52

julian-risch

v1.15.0-rc2

22942ca

v1.15.0-rc2 Pre-release

Pre-release

v1.15.0-rc2

Assets 2

28 Mar 06:59

silvanocerza

v1.15.0-rc1

4dc5abf

v1.15.0-rc1 Pre-release

Pre-release

v1.15.0-rc1

Assets 2

28 Feb 13:59

vblagoje

v1.14.0

9b380cf

v1.14.0

⭐ Highlights

PromptNode enhancements

PromptNode just rolled out prompt logging (pipeline debug), run_batch, and model_kwargs support. More updates to PromptNode and PromptTemplates coming soon!

Shaper

We're introducing the Shaper, PromptNode's helper. Shaper unlocks the full potential of PromptNode and ensures its seamless integration with Haystack. But Shaper's scope and functionality are not limited to PromptNode; you can also use it independently, opening up a whole new world of possibilities.

IVF and Product Quantization support for OpenSearchDocumentStore

We've added support for IVF and IVF with Product Quantization to OpenSearchDocumentStore. You can train the IVF index by calling train_index method (same as in FAISSDocumentStore) or by setting ivf_train_size when initializing OpenSearchDocumentStore and take your search to the next level.

What's Changed

Breaking Changes

refactor: Updated rest_api schema for tables to be consistent with Document.to_dict by @sjrl in #3872
feat: Support multiple document_ids in Answer object (for generative QA) by @tstadel in #4062
feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode by @sjrl in #4038
build: cache nltk models into the docker image by @mayankjobanputra in #4118
feat: Add IVF and Product Quantization support for OpenSearchDocumentStore by @bogdankostic in #3850

Pipeline

feat: add frontmatter to meta in MarkdownConverter by @TuanaCelik in #3953
fix: removing code block in MarkdownConverter by @TuanaCelik in #3960
feat: Add page range support to PDF converters. by @danielbichuetti in #3965
fix: Update telemetry to not serialize Pipeline if disabled. by @sjrl in #4000
feat: add Shaper by @ZanSara in #3880
fix: Event sending for RayPipeline crashing Haystack by @zoltan-fedor in #3971
fix: document retrieval metrics for non-document_id document_relevance_criteria by @tstadel in #3885
fix: make the crawler more robust on Windows by @anakin87 in #4049
fix: use correct count of outgoing edges in RayPipeline by @zoltan-fedor in #4066
feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever by @sjrl in #4026
refactor: replace mutable default arguments by @julian-risch in #4070
feat: Support multiple RayPipelines by @zoltan-fedor in #4078
Remove double batching in retrieve_batch by @sjrl in #4014
style: Update black by @silvanocerza in #4101
fix: Fix TableTextRetriever for input consisting of tables only by @jackapbutler in #4048
fix: Deduplicate same Documents in isolated evaluation of Reader by @bogdankostic in #4114
Docs: Fix code block formatting by @agnieszka-m in #4162
refactor: Remove the pin from the espnet module and fix the audio node tests. by @danielbichuetti in #4128
fix: change tiktoken fallback mechanism to support Windows amd64 by @danielbichuetti in #4175
feat: Add OpenAIError to retry mechanism by @sjrl in #4178

DocumentStores

refactor: use weaviate client to build BM25 query by @hsm207 in #3939
fix: fixed InMemoryDocumentStore.get_embedding_count to return correct number by @sjrl in #3980
fix: Add inner query for mysql compatibility by @julian-risch in #4068
feat: add support for custom headers by @hsm207 in #4040
feat: Add BM25 support for tables in InMemoryDocumentStore by @bogdankostic in #4090
refactor: InMemoryDocumentStore - manage documents without embedding & fix mypy errors by @anakin87 in #4113
refactor: complete the document stores test refactoring by @masci in #4125
feat: include testing facilities into haystack package by @masci in #4182

Documentation

Align with the docs install guide + correct lg by @agnieszka-m in #3950
docs: Update Crawler docstring for correct usage in Google colab by @silvanocerza in #3979
Docs: Update docstrings by @agnieszka-m in #4119
docs: Update Annotation Tool README.md by @bogdankostic in #4123
feat: Add model_kwargs option to PromptNode by @sjrl in #4151
fix: Remove logging statement of setting ID manually in Document by @bogdankostic in #4129
chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option by @TuanaCelik in #4135
chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used by @TuanaCelik in #4155
feat: Implement run_batch for PromptNode by @sjrl in #4072

Other Changes

fix: add option to not override results by Shaper #4231
fix: Shaper store all outputs from function #4223
fix: allowing file-upload api to write files to disk #4221
fix: Fix bug in prompt template check of OpenAIAnswerGenerator #4220
feat: add top_k to PromptNode #4159
feat: Add JsonConverter node #4130
feat: adding secure loading of models by default for haystack by @mayankjobanputra in #3901
fix: add tiktoken fallback mechanism. by @danielbichuetti in #3929
fix: change model in distillation test by @ZanSara in #3944
feat: Expose output_variable in PromptNode result, adjust unit tests by @vblagoje in #3892
fix: Fix type in FARMReader's save_to_remote by @bogdankostic in #3952
refactor: Remove PromptNode hash and equality functions by @vblagoje in #3923
ci: Remove mypy deps install step in python_cache action by @silvanocerza in #3956
fix: overwrite params with environment variables even if there are no params in the pipeline definition; make mypy ignore REST API tests by @anakin87 in #3930
Docs: Update ImageToText docstrings by @agnieszka-m in #3963
Docs: Add TransformersImageToText API doc by @agnieszka-m in #3966
ci: Add Docker images testing by @silvanocerza in #3943
feat: Allow users to set a timeout for remote APIs by @danielbichuetti in #3949
ci: Fix docker image testing on release by @silvanocerza in #3976
Fix: Fix quotation marks by @agnieszka-m in #3973
fix: PromptNode doesn't have run_batch support (yet) by @vblagoje in #3972
chore: increased timeout for loading pipelines through API by @mayankjobanputra in #3977
Missing import for TransformersImageToText by @ZanSara in #3984
test: CI on py3.8 by @ZanSara in #3926
Simplifies and fix docker images tests on release by @silvanocerza in #3982
feat: Add use_prefiltering parameter to DeepsetCloudDocumentStore by @bogdankostic in #3969
ci: Delete Docker images after testing to prevent workflow failure by @silvanocerza in #4004
fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 by @zoltan-fedor in #3898
fix: prevent posthog from sending errors to stderr by @julian-risch in #4008
fix: extend schema for prompt node results by @tstadel in #3891
proposal: TableCell by @sjrl in #3875
refactor: In PromptNode reuse tokenizer instead of loading new one for stop words by @sjrl in #4016
ci: Automate release on PyPi by @silvanocerza in https://github.co...

Contributors

masci, vblagoje, and 16 other contributors

Assets 2

0 Join discussion

22 Feb 17:20

vblagoje

v1.14.0rc2

4504d73

v1.14.0rc2 Pre-release

Pre-release

What's Changed

fix: add option to not override results by Shaper #4231
fix: Shaper store all outputs from function #4223
fix: allowing file-upload api to write files to disk #4221
fix: Fix bug in prompt template check of OpenAIAnswerGenerator #4220
feat: add top_k to PromptNode #4159
feat: Add JsonConverter node #4130

Assets 2

Releases: deepset-ai/haystack

v1.17.0-rc1

Uh oh!

v1.16.1

What's changed

Contributors

Uh oh!

v1.16.0

⭐️ Highlights

Using GPT-4 through PromptNode and Agent

More flexible routing of Documents with RouteDocuments

Deprecating RAGenerator and Seq2SeqGenerator

⚠️ Breaking Changes

Refactoring of our dependency management

Dropping support for Python 3.7

Smaller Breaking Changes

What's Changed

Breaking Changes

Pipeline

DocumentStores

Documentation

Contributors

Uh oh!

v1.15.1

What's Changed

Contributors

Uh oh!

v1.15.1-rc1

Uh oh!

v1.15.0

⭐ Highlights

Build Agents Yourself with Open Source

Flexible PromptTemplates

Using ChatGPT through PromptModel

Haystack Extras

What's Changed

Breaking Changes

Pipeline

DocumentStores

Documentation

Contributors

Uh oh!

v1.15.0-rc2

Uh oh!

v1.15.0-rc1

Uh oh!

v1.14.0

⭐ Highlights

PromptNode enhancements

Shaper

IVF and Product Quantization support for OpenSearchDocumentStore

What's Changed

Breaking Changes

Pipeline

DocumentStores

Documentation

Other Changes

Contributors

Uh oh!

v1.14.0rc2

What's Changed

Uh oh!

Using GPT-4 through `PromptNode` and `Agent`

More flexible routing of Documents with `RouteDocuments`

Deprecating `RAGenerator` and `Seq2SeqGenerator`