Fix bedrock integration and remove incorrect part regarding ELSER (#249)

yansavitski · web-flow · commit bb87e8f11a78 · 2024-05-29T14:31:14.000+02:00
diff --git a/notebooks/integrations/amazon-bedrock/langchain-qa-example.ipynb b/notebooks/integrations/amazon-bedrock/langchain-qa-example.ipynb
@@ -9,10 +9,7 @@
     "# Use Amazon Bedrock\n",
     "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/integrations/amazon-bedrock/langchain-qa-example.ipynb)\n",
     "\n",
-    "This workbook demonstrates how to work with Langchain [Amazon Bedrock](https://aws.amazon.com/bedrock/). Amazon Bedrock is a managed service that makes foundation models from leading AI startup and Amazon's own Titan models available through APIs.\n",
-    "\n",
-    "\n",
-    "\n"
+    "This workbook demonstrates how to work with Langchain [Amazon Bedrock](https://aws.amazon.com/bedrock/). Amazon Bedrock is a managed service that makes foundation models from leading AI startup and Amazon's own Titan models available through APIs."
    ]
   },
   {
@@ -33,13 +30,13 @@
    "outputs": [],
    "source": [
     "# install packages\n",
-    "!python3 -m pip install -qU langchain langchain-elasticsearch boto3\n",
+    "!python3 -m pip install -qU langchain langchain-elasticsearch langchain_community boto3 tiktoken\n",
     "\n",
     "# import modules\n",
     "from getpass import getpass\n",
     "from urllib.request import urlopen\n",
     "from langchain_elasticsearch import ElasticsearchStore\n",
-    "from langchain.embeddings.bedrock import BedrockEmbeddings\n",
+    "from langchain_community.embeddings.bedrock import BedrockEmbeddings\n",
     "from langchain.llms import Bedrock\n",
     "from langchain.chains import RetrievalQA\n",
     "import boto3\n",
@@ -102,9 +99,7 @@
     "We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n",
     "\n",
     "\n",
-    "We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment. This would help create and index data easily. In the ElasticsearchStore instance, will set embedding to [BedrockEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.bedrock.BedrockEmbeddings.html) to embed the texts and elasticsearch index name that will be used in this example. In the instance, we will set `strategy` to [ElasticsearchStore.SparseVectorRetrievalStrategy()](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.SparseRetrievalStrategy.html#langchain.vectorstores.elasticsearch.SparseRetrievalStrategy) as we use this strategy to split documents.\n",
-    "\n",
-    "As we're using [ELSER](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html) we use [SparseVectorRetrievalStrategy](https://python.langchain.com/docs/integrations/vectorstores/elasticsearch#sparsevectorretrievalstrategy-elser) strategy. This strategy uses Elasticsearch's sparse vector retrieval to retrieve the top-k results. There is more other [strategies](https://python.langchain.com/docs/integrations/vectorstores/elasticsearch#approxretrievalstrategy) in langchain that might be used base on your needs."
+    "We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment. This would help create and index data easily. In the ElasticsearchStore instance, will set embedding to [BedrockEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.bedrock.BedrockEmbeddings.html) to embed the texts and elasticsearch index name that will be used in this example."
    ]
   },
   {
@@ -121,14 +116,13 @@
     "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n",
     "ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n",
     "\n",
-    "embeddings = BedrockEmbeddings(client=bedrock_client)\n",
+    "bedrock_embedding = BedrockEmbeddings(client=bedrock_client)\n",
     "\n",
     "vector_store = ElasticsearchStore(\n",
     "    es_cloud_id=ELASTIC_CLOUD_ID,\n",
     "    es_api_key=ELASTIC_API_KEY,\n",
     "    index_name=\"workplace_index\",\n",
-    "    embedding=embeddings,\n",
-    "    strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(),\n",
+    "    embedding=bedrock_embedding,\n",
     ")"
    ]
   },
@@ -210,11 +204,7 @@
    "source": [
     "## Index data into elasticsearch\n",
     "\n",
-    "Next, we will index data to elasticsearch using [ElasticsearchStore.from_documents](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents). We will use Cloud ID,  Password and Index name values set in the `Create cloud deployment` step.\n",
-    "\n",
-    "In the instance, we will set `strategy` to [ElasticsearchStore.SparseVectorRetrievalStrategy()](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.SparseRetrievalStrategy.html#langchain.vectorstores.elasticsearch.SparseRetrievalStrategy)\n",
-    "\n",
-    "Note: Before we begin indexing, ensure you have [downloaded and deployed ELSER model](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html#download-deploy-elser) in your deployment and is running in ml node.\n"
+    "Next, we will index data to elasticsearch using [ElasticsearchStore.from_documents](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents). We will use Cloud ID,  Password and Index name values set in the `Create cloud deployment` step."
    ]
   },
   {
@@ -230,7 +220,7 @@
     "    es_cloud_id=ELASTIC_CLOUD_ID,\n",
     "    es_api_key=ELASTIC_API_KEY,\n",
     "    index_name=\"workplace_index\",\n",
-    "    strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(),\n",
+    "    embedding=bedrock_embedding,\n",
     ")"
    ]
   },
@@ -265,7 +255,7 @@
    },
    "source": [
     "## Asking a question\n",
-    "Now that we have the passages stored in Elasticsearch and llm is initialized, we can now ask a question to get the relevant passages.\n"
+    "Now that we have the passages stored in Elasticsearch and llm is initialized, we can now ask a question to get the relevant passages."
    ]
   },
   {
@@ -333,8 +323,7 @@
       "\n",
       "Employees working from home are responsible for creating a comfortable and safe workspace that is conducive to productivity. This includes ensuring that their home office is ergonomically designed, well-lit, and free from distractions.\n",
       "Communication\n",
-      "-------\n",
-      "\n"
+      "-------"
      ]
     }
    ],
@@ -351,7 +340,7 @@
     "    \"How does compensation work?\",\n",
     "]\n",
     "question = questions[1]\n",
-    "print(f\"Question: {question}\\n\")\n",
+    "print(f\"Question: {question}\")\n",
     "\n",
     "ans = qa({\"query\": question})\n",
     "\n",
@@ -361,7 +350,7 @@
     "for doc in ans[\"source_documents\"]:\n",
     "    print(\"Name: \" + doc.metadata[\"name\"])\n",
     "    print(\"Content: \" + doc.page_content)\n",
-    "    print(\"-------\\n\")"
+    "    print(\"-------\")"
    ]
   }
  ],