|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "c52e30d1-cb29-4e70-af4a-9c953fcb0f2e", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Quickstart: Vector search using Azure OpenAI Embeddings and Elasticsearch\n", |
| 9 | + "\n", |
| 10 | + "This tutorial demonstrates how to use the [Azure OpenAI API](https://azure.microsoft.com/en-in/products/ai-services/openai-service) to create [embeddings](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=console) and store them in Elasticsearch. Elasticsearch will enable us to perform vector search (Knn) to find similar documents." |
| 11 | + ] |
| 12 | + }, |
| 13 | + { |
| 14 | + "cell_type": "markdown", |
| 15 | + "id": "88303061-f357-43d8-8b63-c4f79e9a1746", |
| 16 | + "metadata": {}, |
| 17 | + "source": [ |
| 18 | + "## setup\n", |
| 19 | + "\n", |
| 20 | + "* Elastic Credentials - Create [Cloud deployment](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud) to get all Elastic credentials (`ELASTIC_CLOUD_ID`, `ELASTIC_API_KEY`).\n", |
| 21 | + "\n", |
| 22 | + "* `AZURE_OPENAI_API_KEY` - To use the Azure OpenAI API, you need an API key. [Follow](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython-new&pivots=programming-language-python#retrieve-key-and-endpoint) to create a key.\n", |
| 23 | + "* `AZURE_OPENAI_ENDPOINT` - Endpoint for your Azure OpenAI Resource.\n", |
| 24 | + "* `AZURE_DEPLOYMENT_ID` - The deployment name you chose when you deployed the model.\n", |
| 25 | + "* `AZURE_OPENAI_API_VERSION` - The API version to use for this operation. This follows the YYYY-MM-DD format." |
| 26 | + ] |
| 27 | + }, |
| 28 | + { |
| 29 | + "cell_type": "markdown", |
| 30 | + "id": "76ca723c-6148-4682-a5ae-486e73cb2b94", |
| 31 | + "metadata": {}, |
| 32 | + "source": [ |
| 33 | + "## Install packages" |
| 34 | + ] |
| 35 | + }, |
| 36 | + { |
| 37 | + "cell_type": "code", |
| 38 | + "execution_count": null, |
| 39 | + "id": "ef1f1e52-f892-489f-8947-3e4698f5f5c3", |
| 40 | + "metadata": {}, |
| 41 | + "outputs": [], |
| 42 | + "source": [ |
| 43 | + "pip install -q -U openai elasticsearch" |
| 44 | + ] |
| 45 | + }, |
| 46 | + { |
| 47 | + "cell_type": "markdown", |
| 48 | + "id": "3d86d3fa-4ca0-41b6-a4bc-81bacf26bf02", |
| 49 | + "metadata": {}, |
| 50 | + "source": [ |
| 51 | + "## Import packages and credentials" |
| 52 | + ] |
| 53 | + }, |
| 54 | + { |
| 55 | + "cell_type": "code", |
| 56 | + "execution_count": null, |
| 57 | + "id": "bb62d8fb-6c34-44fd-bc94-18b644422ee8", |
| 58 | + "metadata": {}, |
| 59 | + "outputs": [], |
| 60 | + "source": [ |
| 61 | + "from openai import AzureOpenAI\n", |
| 62 | + "from elasticsearch import Elasticsearch, helpers\n", |
| 63 | + "from getpass import getpass\n", |
| 64 | + "import os\n", |
| 65 | + "\n", |
| 66 | + "ELASTIC_API_KEY = getpass(\"Elastic API Key :\")\n", |
| 67 | + "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID :\")\n", |
| 68 | + "\n", |
| 69 | + "AZURE_OPENAI_API_KEY = getpass(\"Azure OpenAI API Key :\")\n", |
| 70 | + "AZURE_OPENAI_ENDPOINT = getpass(\"Azure OpenAI Endpoint :\")\n", |
| 71 | + "AZURE_DEPLOYMENT_ID = getpass(\"Azure Deployment ID :\")\n", |
| 72 | + "AZURE_OPENAI_API_VERSION = getpass(\"Azure OpenAI API Version :\")\n", |
| 73 | + "\n", |
| 74 | + "elastic_index_name = \"azure-openai-vector-search-demo\"" |
| 75 | + ] |
| 76 | + }, |
| 77 | + { |
| 78 | + "cell_type": "markdown", |
| 79 | + "id": "8b22dc16-c0a0-48f0-979d-5d21c17bd264", |
| 80 | + "metadata": {}, |
| 81 | + "source": [ |
| 82 | + "## Embedding generation\n", |
| 83 | + "\n" |
| 84 | + ] |
| 85 | + }, |
| 86 | + { |
| 87 | + "cell_type": "code", |
| 88 | + "execution_count": null, |
| 89 | + "id": "ca56532d-7c82-4e2b-aecf-2173520d3696", |
| 90 | + "metadata": {}, |
| 91 | + "outputs": [], |
| 92 | + "source": [ |
| 93 | + "def generate_embeddings(text):\n", |
| 94 | + " client = AzureOpenAI(\n", |
| 95 | + " api_key=AZURE_OPENAI_API_KEY,\n", |
| 96 | + " api_version=AZURE_OPENAI_API_VERSION,\n", |
| 97 | + " azure_endpoint=AZURE_OPENAI_ENDPOINT,\n", |
| 98 | + " )\n", |
| 99 | + "\n", |
| 100 | + " response = client.embeddings.create(\n", |
| 101 | + " input=text,\n", |
| 102 | + " model=AZURE_DEPLOYMENT_ID,\n", |
| 103 | + " )\n", |
| 104 | + "\n", |
| 105 | + " return response.data[0].embedding\n", |
| 106 | + "\n", |
| 107 | + "\n", |
| 108 | + "sample_text = \"India generally experiences a hot summer from March to June, with temperatures often exceeding 40°C in central and northern regions. Monsoon season, from June to September, brings heavy rainfall, especially in the western coast and northeastern areas. Post-monsoon months, October and November, mark a transition with decreasing rainfall. Winter, from December to February, varies in temperature across the country, with colder conditions in the north and milder weather in the south. India's diverse climate is influenced by its geographical features, resulting in regional\"\n", |
| 109 | + "embeddings = generate_embeddings(sample_text)" |
| 110 | + ] |
| 111 | + }, |
| 112 | + { |
| 113 | + "cell_type": "markdown", |
| 114 | + "id": "6239eda7-3bed-43dd-a6a8-a8369b907d5c", |
| 115 | + "metadata": {}, |
| 116 | + "source": [ |
| 117 | + "## Connecting Elasticsearch" |
| 118 | + ] |
| 119 | + }, |
| 120 | + { |
| 121 | + "cell_type": "code", |
| 122 | + "execution_count": null, |
| 123 | + "id": "7cbade18-3049-46f1-8d3e-5b22d4aade5b", |
| 124 | + "metadata": {}, |
| 125 | + "outputs": [], |
| 126 | + "source": [ |
| 127 | + "es = Elasticsearch(cloud_id=ELASTIC_CLOUD_ID, api_key=ELASTIC_API_KEY)" |
| 128 | + ] |
| 129 | + }, |
| 130 | + { |
| 131 | + "cell_type": "markdown", |
| 132 | + "id": "20d070c8-9e19-48a3-bc3b-5f22067eb63f", |
| 133 | + "metadata": {}, |
| 134 | + "source": [ |
| 135 | + "## Index document with Elasticsearch" |
| 136 | + ] |
| 137 | + }, |
| 138 | + { |
| 139 | + "cell_type": "code", |
| 140 | + "execution_count": null, |
| 141 | + "id": "e02ca81e-7caa-4505-95c6-3c6be7843c8f", |
| 142 | + "metadata": {}, |
| 143 | + "outputs": [], |
| 144 | + "source": [ |
| 145 | + "doc = {\"text\": sample_text, \"text_embedding\": embeddings}\n", |
| 146 | + "\n", |
| 147 | + "resp = es.index(index=elastic_index_name, document=doc)\n", |
| 148 | + "\n", |
| 149 | + "print(resp)" |
| 150 | + ] |
| 151 | + }, |
| 152 | + { |
| 153 | + "cell_type": "markdown", |
| 154 | + "id": "afa0d371-afbf-4f98-9cd1-ee457839f323", |
| 155 | + "metadata": {}, |
| 156 | + "source": [ |
| 157 | + "## Searching for document with Elasticsearch" |
| 158 | + ] |
| 159 | + }, |
| 160 | + { |
| 161 | + "cell_type": "code", |
| 162 | + "execution_count": 7, |
| 163 | + "id": "d71eeacc-d0c8-4035-b052-a1c03300aec0", |
| 164 | + "metadata": {}, |
| 165 | + "outputs": [ |
| 166 | + { |
| 167 | + "name": "stdout", |
| 168 | + "output_type": "stream", |
| 169 | + "text": [ |
| 170 | + "\n", |
| 171 | + "\n", |
| 172 | + "ID: SxtQyY4BMvvuJ06pSACG\n", |
| 173 | + "\n", |
| 174 | + "Text: India generally experiences a hot summer from March to June, with temperatures often exceeding 40°C in central and northern regions. Monsoon season, from June to September, brings heavy rainfall, especially in the western coast and northeastern areas. Post-monsoon months, October and November, mark a transition with decreasing rainfall. Winter, from December to February, varies in temperature across the country, with colder conditions in the north and milder weather in the south. India's diverse climate is influenced by its geographical features, resulting in regional\n" |
| 175 | + ] |
| 176 | + } |
| 177 | + ], |
| 178 | + "source": [ |
| 179 | + "q = \"How's weather in India?\"\n", |
| 180 | + "\n", |
| 181 | + "embeddings = generate_embeddings(q)\n", |
| 182 | + "\n", |
| 183 | + "resp = es.search(\n", |
| 184 | + " index=elastic_index_name,\n", |
| 185 | + " knn={\n", |
| 186 | + " \"field\": \"text_embedding\",\n", |
| 187 | + " \"query_vector\": embeddings,\n", |
| 188 | + " \"k\": 10,\n", |
| 189 | + " \"num_candidates\": 100,\n", |
| 190 | + " },\n", |
| 191 | + ")\n", |
| 192 | + "\n", |
| 193 | + "\n", |
| 194 | + "for result in resp[\"hits\"][\"hits\"]:\n", |
| 195 | + " pretty_output = f\"\\n\\nID: {result['_id']}\\n\\nText: {result['_source']['text']}\"\n", |
| 196 | + " print(pretty_output)" |
| 197 | + ] |
| 198 | + } |
| 199 | + ], |
| 200 | + "metadata": { |
| 201 | + "kernelspec": { |
| 202 | + "display_name": "Python 3 (ipykernel)", |
| 203 | + "language": "python", |
| 204 | + "name": "python3" |
| 205 | + }, |
| 206 | + "language_info": { |
| 207 | + "codemirror_mode": { |
| 208 | + "name": "ipython", |
| 209 | + "version": 3 |
| 210 | + }, |
| 211 | + "file_extension": ".py", |
| 212 | + "mimetype": "text/x-python", |
| 213 | + "name": "python", |
| 214 | + "nbconvert_exporter": "python", |
| 215 | + "pygments_lexer": "ipython3", |
| 216 | + "version": "3.11.4" |
| 217 | + } |
| 218 | + }, |
| 219 | + "nbformat": 4, |
| 220 | + "nbformat_minor": 5 |
| 221 | +} |
0 commit comments