|
1 | 1 | # OpenAI embeddings example application
|
2 | 2 |
|
3 |
| -## Overview |
| 3 | +This is a small example Node.js/Express application that demonstrates how to |
| 4 | +integrate Elastic and OpenAI. |
4 | 5 |
|
5 |
| -Small example Node.js/Express.js application to demonstrate how to integrate Elastic and OpenAI. |
| 6 | +The application has two components: |
| 7 | +* [generate](generate_embeddings.js) |
| 8 | + * Generates embeddings for [sample_data](sample_data/medicare.json) into |
| 9 | + Elasticsearch. |
| 10 | +* [app](search_app.js) |
| 11 | + * Runs the web service which hosts the [web frontend](views) and the |
| 12 | + search API. |
| 13 | +* Both scripts use the [Elasticsearch](https://github.com/elastic/elasticsearch-js) and [OpenAI](https://github.com/openai/openai-node) JavaScript clients. |
6 | 14 |
|
7 |
| -This folder includes two files: |
| 15 | + |
8 | 16 |
|
9 |
| -- `generate_embeddings.js`: Processes a JSON file, generates text embeddings for each document in the file using OpenAI's API, and then stores the documents and their corresponding embeddings in an Elasticsearch index. |
10 |
| -- `search_app.js`: A tiny Express.js web app that renders a search bar, generates embeddings for search queries, and performs semantic search using Elasticsearch's [kNN search](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html). It retrieves the search results and returns a list of hits, ranked by relevance. |
| 17 | +## Download the Project |
11 | 18 |
|
12 |
| -Both scripts use the [Elasticsearch](https://github.com/elastic/elasticsearch-js) and [OpenAI](https://github.com/openai/openai-node) JavaScript clients. |
| 19 | +Download the project from Github and extract the `openai-embeddings` folder. |
| 20 | + |
| 21 | +```bash |
| 22 | +curl https://codeload.github.com/elastic/elasticsearch-labs/tar.gz/main | \ |
| 23 | +tar -xz --strip=2 elasticsearch-labs-main/example-apps/openai-embeddings |
| 24 | +``` |
13 | 25 |
|
14 |
| -## Requirements |
| 26 | +## Make your .env file |
15 | 27 |
|
16 |
| -- Node.js 16+ |
| 28 | +Copy [env.example](env.example) to `.env` and fill in values noted inside. |
17 | 29 |
|
18 |
| -## Setup |
| 30 | +## Installing and connecting to Elasticsearch |
19 | 31 |
|
20 |
| -This section will walk you through the steps for setting up and using the application from scratch. |
21 |
| -(Skip the first steps if you already have an Elastic deployment and OpenAI account/API key.) |
| 32 | +There are a number of ways to install Elasticsearch. Cloud is best for most |
| 33 | +use-cases. Visit the [Install Elasticsearch](https://www.elastic.co/search-labs/tutorials/install-elasticsearch) for more information. |
22 | 34 |
|
23 |
| -### 1. Download the Project |
| 35 | +Once you decided your approach, edit your `.env` file accordingly. |
24 | 36 |
|
25 |
| -Download the project from Github and extract the `openai-embeddings` folder. |
| 37 | +### Running your own Elastic Stack with Docker |
| 38 | + |
| 39 | +If you'd like to start Elastic locally, you can use the provided |
| 40 | +[docker-compose-elastic.yml](docker-compose-elastic.yml) file. This starts |
| 41 | +Elasticsearch, Kibana, and APM Server and only requires Docker installed. |
| 42 | + |
| 43 | +Use docker compose to run Elastic stack in the background: |
26 | 44 |
|
27 | 45 | ```bash
|
28 |
| -curl https://codeload.github.com/elastic/elasticsearch-labs/tar.gz/main | \ |
29 |
| -tar -xz --strip=2 elasticsearch-labs-main/example-apps/openai-embeddings |
| 46 | +docker compose -f docker-compose-elastic.yml up --force-recreate -d |
30 | 47 | ```
|
31 | 48 |
|
32 |
| -### 2. Create OpenAI account and API key |
| 49 | +Then, you can view Kibana at http://localhost:5601/app/home#/ |
33 | 50 |
|
34 |
| -- Go to https://platform.openai.com/ and sign up |
35 |
| -- Generate an API key and make note of it |
| 51 | +If asked for a username and password, use username: elastic and password: elastic. |
36 | 52 |
|
37 |
| - |
| 53 | +Clean up when finished, like this: |
38 | 54 |
|
39 |
| -### 3. Create Elastic Cloud account and credentials |
| 55 | +```bash |
| 56 | +docker compose -f docker-compose-elastic.yml down |
| 57 | +``` |
40 | 58 |
|
41 |
| -- [Sign up](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-samples) for a Elastic cloud account |
42 |
| -- Make note of the master username/password shown to you during creation of the deployment |
43 |
| -- Make note of the Elastic Cloud ID after the deployment |
| 59 | +## Running the App |
44 | 60 |
|
45 |
| - |
| 61 | +There are two ways to run the app: via Docker or locally. Docker is advised for |
| 62 | +ease while locally is advised if you are making changes to the application. |
46 | 63 |
|
47 |
| - |
| 64 | +### Run with docker |
48 | 65 |
|
49 |
| -### 4. Install Node dependencies |
| 66 | +Docker compose is the easiest way, as you get one-step to: |
| 67 | +* generate embeddings and store them into Elasticsearch |
| 68 | +* run the app, which listens on http://localhost:3000 |
50 | 69 |
|
51 |
| -```sh |
52 |
| -npm install |
| 70 | +**Double-check you have a `.env` file with all your variables set first!** |
| 71 | + |
| 72 | +```bash |
| 73 | +docker compose up --build --force-recreate |
53 | 74 | ```
|
54 | 75 |
|
55 |
| -### 5. Set environment variables |
| 76 | +Clean up when finished, like this: |
56 | 77 |
|
57 |
| -```sh |
58 |
| -export ELASTIC_CLOUD_ID=<your Elastic cloud ID> |
59 |
| -export ELASTIC_USERNAME=<your Elastic username> |
60 |
| -export ELASTIC_PASSWORD=<your Elastic password> |
61 |
| -export OPENAI_API_KEY=<your OpenAI API key> |
| 78 | +```bash |
| 79 | +docker compose down |
62 | 80 | ```
|
63 | 81 |
|
64 |
| -### 6. Generate embeddings and index documents |
| 82 | +### Run locally |
65 | 83 |
|
66 |
| -```sh |
67 |
| -npm run generate |
| 84 | +First, set up a Node.js environment for the example like this: |
68 | 85 |
|
69 |
| -Connecting to Elastic Cloud: my-openai-integration-test:dXMt(...) |
70 |
| -(node:95956) ExperimentalWarning: stream/web is an experimental feature. This feature could change at any time |
71 |
| -(Use `node --trace-warnings ...` to show where the warning was created) |
72 |
| -Reading from file sample_data/medicare.json |
73 |
| -Processing 12 documents... |
74 |
| -Processing batch of 10 documents... |
75 |
| -Calling OpenAI API for 10 embeddings with model text-embedding-ada-002 |
76 |
| -Indexing 10 documents to index openai-integration... |
77 |
| -Processing batch of 2 documents... |
78 |
| -Calling OpenAI API for 2 embeddings with model text-embedding-ada-002 |
79 |
| -Indexing 2 documents to index openai-integration... |
80 |
| -Processing complete |
| 86 | +```bash |
| 87 | +nvm use --lts # or similar to setup Node.js v20 or later |
| 88 | +npm install |
81 | 89 | ```
|
82 | 90 |
|
83 |
| -_**Note**: the example application uses the `text-embedding-ada-002` OpenAI model for generating the embeddings, which provides a 1536-dimensional vector output. See [this section](#using-a-different-openai-model) if you want to use a different model._ |
84 |
| - |
85 |
| -### 7. Launch web app |
| 91 | +**Double-check you have a `.env` file with all your variables set first!** |
86 | 92 |
|
87 |
| -```sh |
88 |
| -npm run app |
| 93 | +#### Run the generate command |
89 | 94 |
|
90 |
| -Connecting to Elastic Cloud: my-openai-integration-test:dXMt(...) |
91 |
| -(node:96017) ExperimentalWarning: stream/web is an experimental feature. This feature could change at any time |
92 |
| -(Use `node --trace-warnings ...` to show where the warning was created) |
93 |
| -Express app listening on port 3000 |
| 95 | +First, ingest the data into elasticsearch: |
| 96 | +```bash |
| 97 | +npm run generate |
94 | 98 | ```
|
95 | 99 |
|
96 |
| -### 8. Run semantic search in the web app |
97 |
| - |
98 |
| -- Open http://localhost:3000 in your browser |
99 |
| -- Enter a search query and press Search |
| 100 | +#### Run the app |
100 | 101 |
|
101 |
| - |
| 102 | +Now, run the app, which listens on http://localhost:3000 |
| 103 | +```bash |
| 104 | +npm run app |
| 105 | +``` |
102 | 106 |
|
103 |
| -## Customize configuration |
| 107 | +## Advanced |
104 | 108 |
|
105 |
| -Here are some tips for modifying the code for your use case. For example, you might want to use your own sample data. |
| 109 | +Here are some tips for modifying the code for your use case. For example, you |
| 110 | +might want to use your own sample data. |
106 | 111 |
|
107 | 112 | ### Using a different source file or document mapping
|
108 | 113 |
|
109 | 114 | - Ensure your file contains the documents in JSON format
|
110 |
| -- Modify the document mappings and fields in the `.js` files and in `views/search.hbs` |
111 |
| -- Modify the initialization of `FILE` in `utils.js` |
| 115 | +- Modify the document mappings and fields in the `.js` files and in [views/search.hbs](views/search.hbs) |
| 116 | +- Modify the initialization of `FILE` in [utils.js](utils.js) |
112 | 117 |
|
113 | 118 | ### Using a different OpenAI model
|
114 | 119 |
|
115 |
| -- Modify the initialization of `MODEL` in `utils.js` |
116 |
| -- Ensure that `embedding.dims` in your index mapping is the same number as the dimensions of the model's output |
| 120 | +- Modify `EMBEDDINGS_MODEL` in `.env` |
| 121 | +- Ensure that `embedding.dims` in your index mapping is the same number as the dimensions of the model's output. |
117 | 122 |
|
118 | 123 | ### Using a different Elastic index
|
119 | 124 |
|
120 |
| -- Modify the initialization of `INDEX` in `utils.js` |
121 |
| - |
122 |
| -### Using a different method for authenticating with Elastic |
123 |
| - |
124 |
| -- Modify the initialization of `elasticsearchClient` in `utils.js` |
125 |
| -- Refer to [this document](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/client-connecting.html#authentication) about authentication schemes |
| 125 | +- Modify the initialization of `INDEX` in [utils.js](utils.js) |
126 | 126 |
|
127 |
| -### Running on self-managed Elastic cluster |
| 127 | +### Using a different method to connect to Elastic |
128 | 128 |
|
129 |
| -- Modify the initialization of `elasticsearchClient` in `utils.js` |
130 |
| -- Refer to [this document](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/client-connecting.html#connect-self-managed-new) about connecting to a self-managed cluster |
| 129 | +- Modify the initialization of `elasticsearchClient` in [utils.js](utils.js) |
| 130 | +- Refer to [this document](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/client-connecting.html) |
0 commit comments