PR for merging content on Notebook for RAG with Llama 3 open-source and Elastic #260 (#266)

rishikeshradhakrishnan · web-flow · commit 4c66913f81ce · 2024-06-20T22:50:49.000+05:30
* Readme Updated

* Cleared Cell Outputs

* Added notebooks to skip list

* Updated as per PR comments.

---------

Co-authored-by: rishikeshradhakrishnan &lt;rishikesh.radhakrishnan@elastic.co&gt;
diff --git a/bin/find-notebooks-to-test.sh b/bin/find-notebooks-to-test.sh
@@ -23,6 +23,8 @@ EXEMPT_NOTEBOOKS=(
     "notebooks/integrations/openai/openai-KNN-RAG.ipynb"
     "notebooks/integrations/openai/function-calling.ipynb"
     "notebooks/integrations/gemma/rag-gemma-huggingface-elastic.ipynb"
+    "notebooks/integrations/llama3/rag-elastic-llama3-elser.ipynb"
+    "notebooks/integrations/llama3/rag-elastic-llama3.ipynb"
     "notebooks/integrations/azure-openai/vector-search-azure-openai-elastic.ipynb"
     "notebooks/enterprise-search/app-search-engine-exporter.ipynb"
 )
diff --git a/notebooks/integrations/llama3/README.md b/notebooks/integrations/llama3/README.md
@@ -0,0 +1,11 @@
+# RAG-Elastic-Llama3
+
+Building RAG with Llama 3 open-source and Elastic. 
+
+The notebooks in this repo are companion to the blog content [Rag with Llama3 and Elastic](https://www.elastic.co/search-labs/blog/rag-with-llama3-elastic)
+
+Two examples of RAG using Llama3 as an LLM. Llama3 will be hosted using Ollama locally.
+ 
+1. Using Llama3 as an LLM and use it to generate embeddings.
+2. Using Llama3 as an LLM and use Elastic ELSER for embeddings and semantic search.
+
diff --git a/notebooks/integrations/llama3/rag-elastic-llama3-elser.ipynb b/notebooks/integrations/llama3/rag-elastic-llama3-elser.ipynb
@@ -0,0 +1,304 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "430bd0198f228af0",
+   "metadata": {},
+   "source": [
+    "# RAG with Elastic ELSER and Llama3 using Langchain\n",
+    "\n",
+    "This interactive notebook uses `Langchain` to process fictional workplace documents and uses `ELSER v2` running in `Elasticsearch` to transform these documents into embeddings and store them into `Elasticsearch`. We then ask a question, retrieve the relevant documents from `Elasticsearch` and use `Llama3` running locally using `Ollama` to provide a response. \n",
+    "\n",
+    "**_Note_** : _`Llama3` is expected to be running using `Ollama` on the same machine where you will be running this notebook._"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc4d3a839afa5bf1",
+   "metadata": {},
+   "source": [
+    "## Requirements\n",
+    "\n",
+    "For this example, you will need:\n",
+    "\n",
+    "- An Elastic deployment\n",
+    "  - We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) for this example (available with a [free trial](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook))\n",
+    "  - For LLM we will be using [Ollama](https://ollama.com/) and [Llama3](https://ollama.com/library/llama3) configured locally.  \n",
+    "\n",
+    "### Use Elastic Cloud\n",
+    "\n",
+    "If you don't have an Elastic Cloud deployment, follow these steps to create one.\n",
+    "\n",
+    "1. Go to [Elastic cloud Registration](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) and sign up for a free trial\n",
+    "2. Select **Create Deployment** and follow the instructions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7497ed11046b9a21",
+   "metadata": {},
+   "source": [
+    "## Install required dependencies\n",
+    "First we install the packages we need for this example."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a0c049f18215f065",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install langchain langchain-elasticsearch langchain-community tiktoken"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e97921e3cf79cc71",
+   "metadata": {},
+   "source": [
+    "## Import packages\n",
+    "Next we import the required packages as required. The imports are placed in the cells as required."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96d6fab134fd493d",
+   "metadata": {},
+   "source": [
+    "## Prompt user to provide Cloud ID and API Key\n",
+    "We now prompt the user to provide us Cloud ID and API Key using `getpass`. We get these details from the deployment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5be75ecc855e59b9",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-06-04T11:36:07.879351Z",
+     "start_time": "2024-06-04T11:35:57.006555Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from getpass import getpass\n",
+    "\n",
+    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n",
+    "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
+    "\n",
+    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n",
+    "ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84f107432173d8df",
+   "metadata": {},
+   "source": [
+    "## Prepare documents for chunking and ingestion\n",
+    "We now prepare the data to be ingested into `Elasticsearch`. We use `LangChain`'s `RecursiveCharacterTextSplitter` and split the documents' text at 512 characters with an overlap of 256 characters. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4f949859a20b22d6",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-06-04T11:36:11.046835Z",
+     "start_time": "2024-06-04T11:36:10.641480Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from urllib.request import urlopen\n",
+    "import json\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "\n",
+    "\n",
+    "url = \"https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/datasets/workplace-documents.json\"\n",
+    "\n",
+    "response = urlopen(url)\n",
+    "\n",
+    "workplace_docs = json.loads(response.read())\n",
+    "metadata = []\n",
+    "content = []\n",
+    "for doc in workplace_docs:\n",
+    "    content.append(doc[\"content\"])\n",
+    "    metadata.append(\n",
+    "        {\n",
+    "            \"name\": doc[\"name\"],\n",
+    "            \"summary\": doc[\"summary\"],\n",
+    "            \"rolePermissions\": doc[\"rolePermissions\"],\n",
+    "        }\n",
+    "    )\n",
+    "\n",
+    "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
+    "    chunk_size=512, chunk_overlap=256\n",
+    ")\n",
+    "docs = text_splitter.create_documents(content, metadatas=metadata)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a52ec8a0a663f581",
+   "metadata": {},
+   "source": [
+    "## Define Elasticsearch Vector Store\n",
+    "We define `ElasticsearchStore` as the vector store with [SparseVectorStrategy](https://python.langchain.com/v0.2/docs/integrations/vectorstores/elasticsearch/#sparsevectorstrategy-elser).`SparseVectorStrategy` converts each document into tokens and would be stored in vector field with datatype `rank_features`.\n",
+    "We will be using text embedding from [ELSER v2](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html#elser-v2) model `.elser_model_2_linux-x86_64`\n",
+    "\n",
+    "Note: Before we begin indexing, ensure you have [downloaded and deployed ELSER v2 model](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html#download-deploy-elser) in your deployment and is running in ml node. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fc52a85c69f6e9fb",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-06-04T11:36:33.046663Z",
+     "start_time": "2024-06-04T11:36:32.381802Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from langchain_elasticsearch import ElasticsearchStore\n",
+    "from langchain_elasticsearch import SparseVectorStrategy\n",
+    "\n",
+    "es_vector_store = ElasticsearchStore(\n",
+    "    es_cloud_id=ELASTIC_CLOUD_ID,\n",
+    "    es_api_key=ELASTIC_API_KEY,\n",
+    "    index_name=\"workplace_index_elser\",\n",
+    "    strategy=SparseVectorStrategy(model_id=\".elser_model_2_linux-x86_64\"),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f38b91b2a358994",
+   "metadata": {},
+   "source": [
+    "## Add docs processed above. \n",
+    "The document has already been chunked. We do not use any specific embedding function here, since the tokens are inferred at index time and at query time within Elasticsearch. \n",
+    "This requires that the `ELSER v2` model to be loaded and running in Elasticsearch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5ee38329856ea47c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "es_vector_store.add_documents(documents=docs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a5a628abfb6a5a",
+   "metadata": {},
+   "source": [
+    "## LLM Configuration\n",
+    "This connects to your local LLM. Please refer to https://ollama.com/library/llama3 for details on steps to run Llama3 locally. \n",
+    "\n",
+    "_If you have sufficient resources (atleast >64 GB Ram and GPU available) then you could try the 70B parameter version of Llama3_\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "23e35e152e9a2714",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-06-04T11:36:40.306996Z",
+     "start_time": "2024-06-04T11:36:40.302460Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from langchain_community.llms import Ollama\n",
+    "\n",
+    "llm = Ollama(model=\"llama3\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbca801f2aae187",
+   "metadata": {},
+   "source": [
+    "## Semantic Search using Elasticsearch ELSER v2 and Llama3\n",
+    "\n",
+    "We will perform a semantic search on query with `ELSER v2` as the model. The contextually relevant answer is then composed into a template along with the users original query. \n",
+    "\n",
+    "We then user `Llama3` to answer your questions with contextually relevant data fetched earlier from Elasticsearch using the retriever.   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d348766295f19e35",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.prompts import ChatPromptTemplate\n",
+    "from langchain.schema.output_parser import StrOutputParser\n",
+    "from langchain.schema.runnable import RunnablePassthrough\n",
+    "\n",
+    "\n",
+    "def format_docs(docs):\n",
+    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
+    "\n",
+    "\n",
+    "retriever = es_vector_store.as_retriever()\n",
+    "template = \"\"\"Answer the question based only on the following context:\\n\n",
+    "\n",
+    "                {context}\n",
+    "                \n",
+    "                Question: {question}\n",
+    "               \"\"\"\n",
+    "prompt = ChatPromptTemplate.from_template(template)\n",
+    "chain = (\n",
+    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")\n",
+    "\n",
+    "a = chain.invoke(\"What are the organizations sales goals?\")\n",
+    "\n",
+    "print(a)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e855e64da97b052",
+   "metadata": {},
+   "source": [
+    "_You could now try experimenting with other questions._\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/notebooks/integrations/llama3/rag-elastic-llama3.ipynb b/notebooks/integrations/llama3/rag-elastic-llama3.ipynb

Original file line number	Diff line number	Diff line change
`@@ -23,6 +23,8 @@ EXEMPT_NOTEBOOKS=(`
`23`	`23`	`"notebooks/integrations/openai/openai-KNN-RAG.ipynb"`
`24`	`24`	`"notebooks/integrations/openai/function-calling.ipynb"`
`25`	`25`	`"notebooks/integrations/gemma/rag-gemma-huggingface-elastic.ipynb"`
	`26`	`+ "notebooks/integrations/llama3/rag-elastic-llama3-elser.ipynb"`
	`27`	`+ "notebooks/integrations/llama3/rag-elastic-llama3.ipynb"`
`26`	`28`	`"notebooks/integrations/azure-openai/vector-search-azure-openai-elastic.ipynb"`
`27`	`29`	`"notebooks/enterprise-search/app-search-engine-exporter.ipynb"`
`28`	`30`	`)`