Skip to content

Commit c8f8723

Browse files
openai-embeddings: matches style and flow with chatbot-rag-app (#373)
Signed-off-by: Adrian Cole <[email protected]>
1 parent 9dc1075 commit c8f8723

16 files changed

+268
-1015
lines changed

example-apps/openai-embeddings/.npmrc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
package-lock=false
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
FROM node:22-alpine
2+
3+
ENV NODE_ENV production
4+
WORKDIR /usr/src/app
5+
COPY . .
6+
RUN --mount=type=cache,target=/root/.npm \
7+
npm install --omit=dev
8+
USER node
9+
EXPOSE 3000
10+
11+
ENTRYPOINT ["npm", "run"]
12+
CMD ["app"]
Lines changed: 78 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -1,130 +1,130 @@
11
# OpenAI embeddings example application
22

3-
## Overview
3+
This is a small example Node.js/Express application that demonstrates how to
4+
integrate Elastic and OpenAI.
45

5-
Small example Node.js/Express.js application to demonstrate how to integrate Elastic and OpenAI.
6+
The application has two components:
7+
* [generate](generate_embeddings.js)
8+
* Generates embeddings for [sample_data](sample_data/medicare.json) into
9+
Elasticsearch.
10+
* [app](search_app.js)
11+
* Runs the web service which hosts the [web frontend](views) and the
12+
search API.
13+
* Both scripts use the [Elasticsearch](https://github.com/elastic/elasticsearch-js) and [OpenAI](https://github.com/openai/openai-node) JavaScript clients.
614

7-
This folder includes two files:
15+
![Screenshot of the sample app](./app-demo.png)
816

9-
- `generate_embeddings.js`: Processes a JSON file, generates text embeddings for each document in the file using OpenAI's API, and then stores the documents and their corresponding embeddings in an Elasticsearch index.
10-
- `search_app.js`: A tiny Express.js web app that renders a search bar, generates embeddings for search queries, and performs semantic search using Elasticsearch's [kNN search](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html). It retrieves the search results and returns a list of hits, ranked by relevance.
17+
## Download the Project
1118

12-
Both scripts use the [Elasticsearch](https://github.com/elastic/elasticsearch-js) and [OpenAI](https://github.com/openai/openai-node) JavaScript clients.
19+
Download the project from Github and extract the `openai-embeddings` folder.
20+
21+
```bash
22+
curl https://codeload.github.com/elastic/elasticsearch-labs/tar.gz/main | \
23+
tar -xz --strip=2 elasticsearch-labs-main/example-apps/openai-embeddings
24+
```
1325

14-
## Requirements
26+
## Make your .env file
1527

16-
- Node.js 16+
28+
Copy [env.example](env.example) to `.env` and fill in values noted inside.
1729

18-
## Setup
30+
## Installing and connecting to Elasticsearch
1931

20-
This section will walk you through the steps for setting up and using the application from scratch.
21-
(Skip the first steps if you already have an Elastic deployment and OpenAI account/API key.)
32+
There are a number of ways to install Elasticsearch. Cloud is best for most
33+
use-cases. Visit the [Install Elasticsearch](https://www.elastic.co/search-labs/tutorials/install-elasticsearch) for more information.
2234

23-
### 1. Download the Project
35+
Once you decided your approach, edit your `.env` file accordingly.
2436

25-
Download the project from Github and extract the `openai-embeddings` folder.
37+
### Running your own Elastic Stack with Docker
38+
39+
If you'd like to start Elastic locally, you can use the provided
40+
[docker-compose-elastic.yml](docker-compose-elastic.yml) file. This starts
41+
Elasticsearch, Kibana, and APM Server and only requires Docker installed.
42+
43+
Use docker compose to run Elastic stack in the background:
2644

2745
```bash
28-
curl https://codeload.github.com/elastic/elasticsearch-labs/tar.gz/main | \
29-
tar -xz --strip=2 elasticsearch-labs-main/example-apps/openai-embeddings
46+
docker compose -f docker-compose-elastic.yml up --force-recreate -d
3047
```
3148

32-
### 2. Create OpenAI account and API key
49+
Then, you can view Kibana at http://localhost:5601/app/home#/
3350

34-
- Go to https://platform.openai.com/ and sign up
35-
- Generate an API key and make note of it
51+
If asked for a username and password, use username: elastic and password: elastic.
3652

37-
![OpenAI API key](images/openai_api_key.png)
53+
Clean up when finished, like this:
3854

39-
### 3. Create Elastic Cloud account and credentials
55+
```bash
56+
docker compose -f docker-compose-elastic.yml down
57+
```
4058

41-
- [Sign up](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-samples) for a Elastic cloud account
42-
- Make note of the master username/password shown to you during creation of the deployment
43-
- Make note of the Elastic Cloud ID after the deployment
59+
## Running the App
4460

45-
![Elastic Cloud credentials](images/elastic_credentials.png)
61+
There are two ways to run the app: via Docker or locally. Docker is advised for
62+
ease while locally is advised if you are making changes to the application.
4663

47-
![Elastic Cloud ID](images/elastic_cloud_id.png)
64+
### Run with docker
4865

49-
### 4. Install Node dependencies
66+
Docker compose is the easiest way, as you get one-step to:
67+
* generate embeddings and store them into Elasticsearch
68+
* run the app, which listens on http://localhost:3000
5069

51-
```sh
52-
npm install
70+
**Double-check you have a `.env` file with all your variables set first!**
71+
72+
```bash
73+
docker compose up --build --force-recreate
5374
```
5475

55-
### 5. Set environment variables
76+
Clean up when finished, like this:
5677

57-
```sh
58-
export ELASTIC_CLOUD_ID=<your Elastic cloud ID>
59-
export ELASTIC_USERNAME=<your Elastic username>
60-
export ELASTIC_PASSWORD=<your Elastic password>
61-
export OPENAI_API_KEY=<your OpenAI API key>
78+
```bash
79+
docker compose down
6280
```
6381

64-
### 6. Generate embeddings and index documents
82+
### Run locally
6583

66-
```sh
67-
npm run generate
84+
First, set up a Node.js environment for the example like this:
6885

69-
Connecting to Elastic Cloud: my-openai-integration-test:dXMt(...)
70-
(node:95956) ExperimentalWarning: stream/web is an experimental feature. This feature could change at any time
71-
(Use `node --trace-warnings ...` to show where the warning was created)
72-
Reading from file sample_data/medicare.json
73-
Processing 12 documents...
74-
Processing batch of 10 documents...
75-
Calling OpenAI API for 10 embeddings with model text-embedding-ada-002
76-
Indexing 10 documents to index openai-integration...
77-
Processing batch of 2 documents...
78-
Calling OpenAI API for 2 embeddings with model text-embedding-ada-002
79-
Indexing 2 documents to index openai-integration...
80-
Processing complete
86+
```bash
87+
nvm use --lts # or similar to setup Node.js v20 or later
88+
npm install
8189
```
8290

83-
_**Note**: the example application uses the `text-embedding-ada-002` OpenAI model for generating the embeddings, which provides a 1536-dimensional vector output. See [this section](#using-a-different-openai-model) if you want to use a different model._
84-
85-
### 7. Launch web app
91+
**Double-check you have a `.env` file with all your variables set first!**
8692

87-
```sh
88-
npm run app
93+
#### Run the generate command
8994

90-
Connecting to Elastic Cloud: my-openai-integration-test:dXMt(...)
91-
(node:96017) ExperimentalWarning: stream/web is an experimental feature. This feature could change at any time
92-
(Use `node --trace-warnings ...` to show where the warning was created)
93-
Express app listening on port 3000
95+
First, ingest the data into elasticsearch:
96+
```bash
97+
npm run generate
9498
```
9599

96-
### 8. Run semantic search in the web app
97-
98-
- Open http://localhost:3000 in your browser
99-
- Enter a search query and press Search
100+
#### Run the app
100101

101-
![Search example](images/search.png)
102+
Now, run the app, which listens on http://localhost:3000
103+
```bash
104+
npm run app
105+
```
102106

103-
## Customize configuration
107+
## Advanced
104108

105-
Here are some tips for modifying the code for your use case. For example, you might want to use your own sample data.
109+
Here are some tips for modifying the code for your use case. For example, you
110+
might want to use your own sample data.
106111

107112
### Using a different source file or document mapping
108113

109114
- Ensure your file contains the documents in JSON format
110-
- Modify the document mappings and fields in the `.js` files and in `views/search.hbs`
111-
- Modify the initialization of `FILE` in `utils.js`
115+
- Modify the document mappings and fields in the `.js` files and in [views/search.hbs](views/search.hbs)
116+
- Modify the initialization of `FILE` in [utils.js](utils.js)
112117

113118
### Using a different OpenAI model
114119

115-
- Modify the initialization of `MODEL` in `utils.js`
116-
- Ensure that `embedding.dims` in your index mapping is the same number as the dimensions of the model's output
120+
- Modify `EMBEDDINGS_MODEL` in `.env`
121+
- Ensure that `embedding.dims` in your index mapping is the same number as the dimensions of the model's output.
117122

118123
### Using a different Elastic index
119124

120-
- Modify the initialization of `INDEX` in `utils.js`
121-
122-
### Using a different method for authenticating with Elastic
123-
124-
- Modify the initialization of `elasticsearchClient` in `utils.js`
125-
- Refer to [this document](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/client-connecting.html#authentication) about authentication schemes
125+
- Modify the initialization of `INDEX` in [utils.js](utils.js)
126126

127-
### Running on self-managed Elastic cluster
127+
### Using a different method to connect to Elastic
128128

129-
- Modify the initialization of `elasticsearchClient` in `utils.js`
130-
- Refer to [this document](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/client-connecting.html#connect-self-managed-new) about connecting to a self-managed cluster
129+
- Modify the initialization of `elasticsearchClient` in [utils.js](utils.js)
130+
- Refer to [this document](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/client-connecting.html)
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
name: elastic-stack
2+
3+
services:
4+
elasticsearch:
5+
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
6+
container_name: elasticsearch
7+
ports:
8+
- 9200:9200
9+
environment:
10+
- node.name=elasticsearch
11+
- cluster.name=docker-cluster
12+
- discovery.type=single-node
13+
- ELASTIC_PASSWORD=elastic
14+
- bootstrap.memory_lock=true
15+
- xpack.security.enabled=true
16+
- xpack.security.http.ssl.enabled=false
17+
- xpack.security.transport.ssl.enabled=false
18+
- xpack.license.self_generated.type=trial
19+
- ES_JAVA_OPTS=-Xmx8g
20+
ulimits:
21+
memlock:
22+
soft: -1
23+
hard: -1
24+
healthcheck:
25+
test: ["CMD-SHELL", "curl -s http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=500ms"]
26+
retries: 300
27+
interval: 1s
28+
29+
elasticsearch_settings:
30+
depends_on:
31+
elasticsearch:
32+
condition: service_healthy
33+
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
34+
container_name: elasticsearch_settings
35+
restart: 'no'
36+
command: >
37+
bash -c '
38+
# gen-ai assistants in kibana save state in a way that requires security to be enabled, so we need to create
39+
# a kibana system user before starting it.
40+
echo "Setup the kibana_system password";
41+
until curl -s -u "elastic:elastic" -X POST http://elasticsearch:9200/_security/user/kibana_system/_password -d "{\"password\":\"elastic\"}" -H "Content-Type: application/json" | grep -q "^{}"; do sleep 5; done;
42+
'
43+
44+
kibana:
45+
image: docker.elastic.co/kibana/kibana:8.17.0
46+
container_name: kibana
47+
depends_on:
48+
elasticsearch_settings:
49+
condition: service_completed_successfully
50+
ports:
51+
- 5601:5601
52+
environment:
53+
- SERVERNAME=kibana
54+
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
55+
- ELASTICSEARCH_USERNAME=kibana_system
56+
- ELASTICSEARCH_PASSWORD=elastic
57+
# Non-default settings from here:
58+
# https://github.com/elastic/apm-server/blob/main/testing/docker/kibana/kibana.yml
59+
- MONITORING_UI_CONTAINER_ELASTICSEARCH_ENABLED=true
60+
- XPACK_SECURITY_ENCRYPTIONKEY=fhjskloppd678ehkdfdlliverpoolfcr
61+
- XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY=fhjskloppd678ehkdfdlliverpoolfcr
62+
- SERVER_PUBLICBASEURL=http://127.0.0.1:5601
63+
healthcheck:
64+
test: ["CMD-SHELL", "curl -s http://localhost:5601/api/status | grep -q 'All services are available'"]
65+
retries: 300
66+
interval: 1s
67+
68+
apm-server:
69+
image: docker.elastic.co/apm/apm-server:8.17.0
70+
container_name: apm-server
71+
depends_on:
72+
elasticsearch:
73+
condition: service_healthy
74+
command: >
75+
apm-server
76+
-E apm-server.kibana.enabled=true
77+
-E apm-server.kibana.host=http://kibana:5601
78+
-E apm-server.kibana.username=elastic
79+
-E apm-server.kibana.password=elastic
80+
-E output.elasticsearch.hosts=["http://elasticsearch:9200"]
81+
-E output.elasticsearch.username=elastic
82+
-E output.elasticsearch.password=elastic
83+
cap_add: ["CHOWN", "DAC_OVERRIDE", "SETGID", "SETUID"]
84+
cap_drop: ["ALL"]
85+
ports:
86+
- 8200:8200
87+
healthcheck:
88+
test: ["CMD-SHELL", "bash -c 'echo -n > /dev/tcp/127.0.0.1/8200'"]
89+
retries: 300
90+
interval: 1s
91+
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
name: chatbot-rag-app
2+
3+
services:
4+
generate:
5+
build:
6+
context: .
7+
container_name: generate
8+
restart: 'no'
9+
environment:
10+
# host.docker.internal means connect to the host machine, e.g. your laptop
11+
ELASTICSEARCH_URL: "http://host.docker.internal:9200"
12+
env_file:
13+
- .env
14+
command: generate
15+
extra_hosts:
16+
- "host.docker.internal:host-gateway"
17+
18+
app:
19+
depends_on:
20+
generate:
21+
condition: service_completed_successfully
22+
container_name: api-frontend
23+
build:
24+
context: .
25+
environment:
26+
# host.docker.internal means connect to the host machine, e.g. your laptop
27+
ELASTICSEARCH_URL: "http://host.docker.internal:9200"
28+
env_file:
29+
- .env
30+
ports:
31+
- "3000:3000"
32+
extra_hosts:
33+
- "host.docker.internal:host-gateway"
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Make a copy of this file with the name .env and assign values to variables
2+
3+
# How you connect to Elasticsearch: change details to your instance
4+
ELASTICSEARCH_URL=http://localhost:9200
5+
ELASTICSEARCH_USER=elastic
6+
ELASTICSEARCH_PASSWORD=elastic
7+
# ELASTICSEARCH_API_KEY=
8+
9+
# Update this with your real OpenAI API key
10+
OPENAI_API_KEY=
11+
# EMBEDDINGS_MODEL=text-embedding-ada-002
12+
13+
# Uncomment to use Ollama instead of OpenAI
14+
# OPENAI_BASE_URL=http://localhost:11434/v1
15+
# OPENAI_API_KEY=unused
16+
# EMBEDDINGS_MODEL=all-minilm:33m

0 commit comments

Comments
 (0)