Skip to content

Adjustments #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 121 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
121 commits
Select commit Hold shift + click to select a range
85395f7
Updates to VectorDB creation job, RAG LLM service for Chat-in-a-box w…
selvik Oct 16, 2024
c7774c8
Merge pull request #1 from elotl/selvik/ragfortext
selvik Oct 18, 2024
e4e8e7a
Add link to user instructions
selvik Oct 24, 2024
0e51ec6
Merge pull request #2 from elotl/selvik/adddocs
selvik Oct 24, 2024
931e7b0
Make some embedding and LLM model parameters configurable via env vars
selvik Oct 21, 2024
2de3fae
Merge pull request #3 from elotl/selvik/chunksize
selvik Nov 4, 2024
32a6682
Add a simple chat UI
selvik Nov 4, 2024
b4916c3
Re-add accidentally deleted str_to_int method
selvik Nov 5, 2024
1153777
Merge pull request #5 from elotl/selvik/fixserverag
selvik Nov 5, 2024
888ef31
Merge pull request #4 from elotl/simplechat
selvik Nov 5, 2024
438cfd9
Add dockerfile
murbans1 Nov 5, 2024
abebdde
Add make push image script
murbans1 Nov 5, 2024
0e0e131
Add deployment yaml
murbans1 Nov 5, 2024
d122c59
Fix image name
murbans1 Nov 5, 2024
655c25e
Fix app name
murbans1 Nov 5, 2024
832456a
Fix - make sure venv python is used
murbans1 Nov 5, 2024
9b9946a
Fix try without venv
murbans1 Nov 5, 2024
dead856
Fix case
murbans1 Nov 5, 2024
8f5eca9
Fix - remove probes
murbans1 Nov 5, 2024
cddc554
Fix svc dns address
murbans1 Nov 6, 2024
0a7ef63
Convert Google docs for Installation to Markdown in the repo
selvik Nov 7, 2024
038b566
Merge pull request #7 from elotl/markdown-docs
selvik Nov 8, 2024
8e60a71
Add Luna labels for the Vector store creation and RAG llm service
selvik Nov 13, 2024
2d48c2d
Merge pull request #8 from elotl/selvik/addlunalabels
selvik Nov 14, 2024
ec44d23
Merge pull request #6 from elotl/simple-chat-deployment
selvik Nov 14, 2024
79d9658
RAG Query Py Instead of curl
githubjbw Nov 14, 2024
e2345e9
Add script for preparing jira tickets csv to embedding json
murbans1 Nov 15, 2024
b5d7d47
Fix lint
murbans1 Nov 15, 2024
5cb7775
Add new type for uploading jsonl files
murbans1 Nov 15, 2024
654a1ce
Fix lint
murbans1 Nov 15, 2024
39d3312
Add mode for querying jira tickets data
murbans1 Nov 15, 2024
f36816c
Enable passing jira env variable
murbans1 Nov 15, 2024
0002319
Fix lint
murbans1 Nov 15, 2024
09d2fe8
Fix
murbans1 Nov 18, 2024
bf498a3
Fix reading env variable
murbans1 Nov 19, 2024
b256a85
Add ticket links for jira
murbans1 Nov 19, 2024
6927827
Pass jira url and adjust rag service port
murbans1 Nov 19, 2024
fbbd02a
Add support for jira mode
murbans1 Nov 19, 2024
dab4d01
Pass jira url and adjust rag service port
murbans1 Nov 19, 2024
09d008d
Add new image
murbans1 Nov 19, 2024
a4413b0
Remove debug prints
murbans1 Nov 19, 2024
ec01b63
Add in comments version for AWS and GKE
murbans1 Nov 26, 2024
759fbff
Extract code to be able to run a local setup
murbans1 Nov 26, 2024
a7589e5
Fix: WARNING: You must pass the application as an import string to e…
murbans1 Nov 26, 2024
66bd06d
Add info how to run ui
murbans1 Nov 26, 2024
cb8104b
Extract common code to be able to start changing
murbans1 Nov 26, 2024
d9d824e
Extract common code to be able to start changing
murbans1 Nov 26, 2024
528fee5
Save each ticket to separate file and add jira url
murbans1 Nov 26, 2024
15a3fa3
Use jira urls from input files
murbans1 Nov 26, 2024
8b84051
Fix readme
murbans1 Nov 26, 2024
1c59d5c
Add rating slider and logging ratings
murbans1 Nov 27, 2024
f8d2953
Add doc chunking if text too long
murbans1 Nov 27, 2024
50e9a72
Enchance readme
murbans1 Nov 27, 2024
be4338d
Use regexp good for aws and gcp
murbans1 Nov 27, 2024
afc50d9
Update Readme
murbans1 Nov 27, 2024
404749f
Evaluation script for answer comparison during RAG pipeline tuning
selvik Nov 27, 2024
f12c9a2
Merge pull request #11 from elotl/selvik/addeval
selvik Nov 28, 2024
6db73c7
Note down extra steps for processing csv
murbans1 Nov 29, 2024
f463ab1
Use embedding chunk size and overlap passed through env variables
murbans1 Nov 29, 2024
0c907f7
Use defaults
murbans1 Nov 29, 2024
fd08cb3
Rename mode to more generic
murbans1 Nov 29, 2024
8ecbe7b
Fix text
murbans1 Nov 29, 2024
8de7794
Rename env var name
murbans1 Nov 29, 2024
8798412
Merge pull request #9 from elotl/add_jira_tickets_preprocessing
murbans1 Nov 29, 2024
a13be47
Update instructions for QA-in-a-box to include EKS Bottlerocket w/sna…
anneholler Dec 4, 2024
77741f4
Merge pull request #12 from elotl/anne-update-docs
amholler Dec 4, 2024
6d5b1b4
Update the install.md for updated end point and new python query scri…
githubjbw Dec 9, 2024
f198b47
Merge pull request #10 from elotl/origin/justin/rag-query-py-instead-…
githubjbw Dec 9, 2024
ff2d8ec
Update doc to include EKS bottlerocket image w/vllm preinstalled and …
anneholler Jan 1, 2025
08bb49c
Merge pull request #16 from elotl/anne-update
amholler Jan 2, 2025
a04fd16
Update doc to describe using hf-transfer to improve EKS and GKE image…
anneholler Jan 4, 2025
46bcdb0
update
anneholler Jan 5, 2025
d55f7fe
Merge pull request #18 from elotl/anne-hf-transfer
amholler Jan 5, 2025
9f210cd
Add links to example tuned EKS and GKE RayService files
anneholler Jan 6, 2025
1d703cf
Merge pull request #19 from elotl/anne-add-example
amholler Jan 6, 2025
ead82ad
Fix/add missing files to docker (#20)
murbans1 Jan 7, 2025
4fbbb13
Data prep scripts for Zendesk ticket data exported as JSON (#23)
selvik Jan 8, 2025
480e819
Create and publish images workflow (#21)
murbans1 Jan 8, 2025
7f2d524
Debug release image workflow (#25)
murbans1 Jan 8, 2025
9d49537
Fix debug release image workflow (#26)
murbans1 Jan 8, 2025
ef087e1
Fix create and publish images workflow (#27)
murbans1 Jan 8, 2025
45eba92
Fix create and publish images workflow (#28)
murbans1 Jan 8, 2025
44d864d
Fix create and publish images workflow (#29)
murbans1 Jan 8, 2025
4693bbd
Fix create and publish images workflow (#30)
murbans1 Jan 8, 2025
2940ccc
Fix create and publish images workflow (#31)
murbans1 Jan 8, 2025
a76b454
Update install doc to include AKS node startup improvements (#32)
amholler Jan 8, 2025
31c6465
Fix create and publish images workflow (#33)
murbans1 Jan 8, 2025
a27bc1e
Fix create and publish images workflow (#34)
murbans1 Jan 8, 2025
a212653
Workflow guards and cleanup (#36)
murbans1 Jan 8, 2025
bcddc06
Fix typo in instance type Regex + Docker image version updates (#22)
selvik Jan 8, 2025
2951d01
Use different workflow guard (#37)
murbans1 Jan 9, 2025
25dd946
Update top-level README with links to Installation document sections …
selvik Jan 9, 2025
c09ed34
Upates to system prompt to be factual (#40)
selvik Jan 10, 2025
f6088d1
Fix create vector DB to use pagination to download all files (#39)
selvik Jan 10, 2025
36188fa
Update block_device_mapping.json link to raw file to make it easier f…
selvik Jan 10, 2025
60d4157
Minor doc and requirements.txt updates (#44)
selvik Jan 10, 2025
0f94668
Improve logging (#42)
murbans1 Jan 11, 2025
d79c624
Auth proxy (#41)
murbans1 Jan 11, 2025
e2a441e
Fixes to Chat Auth to correctly route to Chat UI (#45)
selvik Jan 11, 2025
472c6af
Fix Zendesk data prep to include nested fields (#46)
selvik Jan 15, 2025
29136b3
Update data processing, forced remove of context from answers and Lun…
selvik Jan 17, 2025
5d1a0df
Add system prompt to local setup (#51)
murbans1 Jan 17, 2025
11ae653
Handle "context" and "question" hallucinations being appended to the …
selvik Jan 17, 2025
c5f56e3
Update chat and createvdb images to have consistent versions (#58)
selvik Jan 17, 2025
625f409
Add end-user docs for Question-Answer ChatBot (#59)
selvik Jan 22, 2025
30cdc96
Improved handling of ChatML tokens, im_end and im_start in LLM genera…
murbans1 Feb 3, 2025
90907d5
Update Chat UI to display history of question and answers (#63)
murbans1 Feb 3, 2025
83f5de3
Update README with Infra stack and RAG graphics (#61)
selvik Feb 3, 2025
71ca784
Move logic from db creation to data preparation scripts (#56)
murbans1 Feb 4, 2025
3ea1d28
Update image versions (#67)
murbans1 Feb 10, 2025
1b1db8e
Make response rating work with history (#66)
murbans1 Feb 10, 2025
8fb88fd
Put metadata to each chunk (#68)
murbans1 Feb 12, 2025
a1b7db9
Text to SQL querying for structured data (#69)
selvik Mar 6, 2025
257b493
Use weaviate hybrid search v2 (#70)
murbans1 Mar 6, 2025
d9af3e3
Add a question router to choose between SQL and vector (hybrid) searc…
selvik Mar 7, 2025
3701773
Docs for preparatory steps for enabling Text-to-SQL search (#72)
selvik Mar 7, 2025
43c8c64
Fix to retain RAG only Question-Answer Chatbot (#73)
selvik Mar 7, 2025
a9ce7f6
Minor updates to SQL and Hybrid search (#74)
selvik Mar 10, 2025
eef31f2
Add missing weaviate module to requirements.txt (#77)
selvik Mar 11, 2025
b07fdeb
Make chatbox window bigger
murbans1 Mar 11, 2025
ce6baf1
Add more trim after labels
murbans1 Mar 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/actions/setup-docker/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: "Set Up Docker"
description: "Set up Docker"

inputs:
docker-username:
description: "DockerHub username"
required: true
docker-password:
description: "DockerHub password"
required: true

runs:
using: "composite"
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Log into DockerHub
uses: docker/login-action@v3
with:
username: ${{ inputs.docker-username }}
password: ${{ inputs.docker-password }}
77 changes: 77 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
name: Build and Release Images

on:
push:
tags:
- v*
workflow_dispatch:
inputs:
tags:
description: 'Tags'
env:
AWS_REGION: "us-east-1"

permissions:
contents: read
pull-requests: read
repository-projects: read

jobs:
release-images:
runs-on: ubuntu-latest
steps:
- name: Check permissions using GitHub CLI
env:
GH_TOKEN: ${{ github.token }}
run: |
permission=$(gh api repos/${{ github.repository }}/collaborators/${{ github.actor }}/permission --jq '.permission')
if [ "$permission" = "admin" ]; then
echo "Has admin access"
# Your workflow steps here
else
echo "Permission denied"
exit 1
fi

- name: Remove software and language runtimes we're not using
run: |
sudo rm -rf /usr/share/swift
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/share/powershell
sudo rm -rf /usr/local/share/chromium
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/local/lib/node_modules
sudo rm -rf /usr/local/julia*
sudo rm -rf /opt/google/chrome
df . -h

- name: Check out repository
uses: actions/checkout@v2
with:
fetch-depth: '0'

- name: Fetch all tags
run: git fetch origin +refs/tags/*:refs/tags/*

- name: Set up Docker
uses: ./.github/actions/setup-docker
with:
docker-username: ${{ secrets.DOCKER_USERNAME }}
docker-password: ${{ secrets.DOCKER_PASSWORD }}

- name: Set tag
run: |
TAG=$(git describe --tags --match "v*" --abbrev=0)
echo "TAG=$TAG" >> $GITHUB_ENV

- name: Build and push image - createvectordb
run: |
./dockers/llm.vdb.service/makeDocker.sh elotl/createvectordb ${{ env.TAG }}

- name: Build and push image - llm-chat
run: |
./dockers/llm.chatui.service/makeDocker.sh elotl/llm-chat ${{ env.TAG }}

- name: Build and push image - serveragllm
run: |
./dockers/llm.rag.service/makeDocker.sh elotl/serveragllm ${{ env.TAG }}
42 changes: 40 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,40 @@
# k8s-rag-llm
Deployment of RAG + LLM model serving on multiple K8s cloud clusters
# Question-Answer Chatbot with Self-hosted LLMs & RAG

- Setup the complete infrastructure stack for a Question-Answer chatbot for your private data in just a few minutes!
- Your stack will be powered by Self-hosted Open-Source Large Language Models and Retrieval Augmented Generation running on Kubernetes Cloud clusters.

## Overview

The Question-Answer Chatbot is powered by these technologies:

1. Open-Source [Large Language Models](https://en.wikipedia.org/wiki/Large_language_model)
2. [Retrieval Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation)
3. [Vector Stores](https://en.wikipedia.org/wiki/Vector_database)
4. [Ray AI/ML compute framework](https://www.ray.io/)
5. [Elotl Luna](https://www.elotl.co/luna.html)

<img src="./diagrams/elotl_genai_infrastack.png" alt="elotl_genai_stack_enduser" width="400"/>

## Retrieval Augmented Generation

The graphic below shows how RAG is used to determine an answer to the end-user's question about a specific knowledge base.

<center>
<img src="./diagrams/elotl_genai_stack_enduser.png" alt="elotl_genai_stack_enduser" width="600"/>
</center>

## Installation

* [Cluster Setup Summary](docs/install.md#cluster-setup-summary)
* [Install Infrastructure Tools](docs/install.md#install-infrastructure-tools)
* [Install Model Serve Stack](docs/install.md#install-model-serve-stack)
* [Model Serving](docs/install.md#model-serve)
* [Retrieval Augmented Generation using FAISS](docs/install.md#retrieval-augmented-generation-rag-using-faiss)
* [Creation of the Vector Store](docs/install.md#creation-of-the-vector-store)
* [Install the RAG & LLM querying service](docs/install.md#setup-rag--llm-service)
* [Send a question to your LLM with RAG](docs/install.md#query-the-llm-with-rag)
* [Query your LLM with RAG using a Chat UI](docs/install.md#query-the-llm-with-rag-using-a-chat-ui)
* [Uninstall](docs/install.md#uninstall)

Jump to complete install doc available [here](docs/install.md).

93 changes: 93 additions & 0 deletions demo/llm.chatui.service/auth-proxy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# nginx-auth-proxy-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-auth-proxy-config
data:
nginx.conf: |
events {
worker_connections 1024;
}
http {
server {
listen 80;

location / {
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/auth/.htpasswd;

proxy_pass http://simple-chat-service.default.svc.cluster.local:7860; # Points to our simple chat service
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
}

---
# auth-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: auth-proxy-credentials
type: Opaque
data:
# Generated using: htpasswd -c .htpasswd username
# Then base64 encode the file content
# htpasswd -c .htpasswd your_chosen_username
# cat .htpasswd | base64
# myuser:elotl

.htpasswd: ZWxvdGw6JGFwcjEkRmtKeUFMWjMkYjd5WXdBdmhHbmtTSjN2QTdCOXlGMAo=

---
# auth-proxy-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: auth-proxy
spec:
replicas: 2 # For high availability
selector:
matchLabels:
app: auth-proxy
template:
metadata:
labels:
app: auth-proxy
spec:
volumes:
- name: nginx-config
configMap:
name: nginx-auth-proxy-config
- name: auth-volume
secret:
secretName: auth-proxy-credentials
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
- name: auth-volume
mountPath: /etc/nginx/auth
readOnly: true

---
# auth-proxy-service.yaml
apiVersion: v1
kind: Service
metadata:
name: auth-proxy-service
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
app: auth-proxy
23 changes: 23 additions & 0 deletions demo/llm.chatui.service/pv-and-pvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: simple-chat-pv
spec:
capacity:
storage: 20Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
hostPath:
path: /mnt/data/simple-chat-logs
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: simple-chat-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
57 changes: 57 additions & 0 deletions demo/llm.chatui.service/simple-chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: simple-chat
labels:
app: simple-chat
spec:
replicas: 1
selector:
matchLabels:
app: simple-chat
template:
metadata:
labels:
app: simple-chat
elotl-luna: "true"
annotations:
node.elotl.co/instance-type-regexp: "^(t3.xlarge|n2-standard-4)$"
spec:
containers:
- name: chat
image: elotl/llm-chat:v1.3.12
imagePullPolicy: Always
ports:
- containerPort: 7860
env:
- name: RAG_LLM_QUERY_URL
value: "http://serveragllm-service.default.svc.cluster.local:8000"
- name: USE_CHATBOT_HISTORY
value: "True"
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
volumeMounts:
- name: log-storage
mountPath: /app/logs
volumes:
- name: log-storage
persistentVolumeClaim:
claimName: simple-chat-pvc
---
apiVersion: v1
kind: Service
metadata:
name: simple-chat-service
spec:
selector:
app: simple-chat
ports:
- protocol: TCP
port: 7860
targetPort: 7860
type: ClusterIP
2 changes: 1 addition & 1 deletion demo/llm.gpu.service/block_device_mapping.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[
{
"DeviceName": "/dev/xvda",
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 80,
Expand Down
21 changes: 21 additions & 0 deletions demo/llm.gpu.service/block_device_mapping_bottlerocket.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 80,
"VolumeType": "gp3",
"Encrypted": false
}
},
{
"DeviceName": "/dev/xvdb",
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 80,
"VolumeType": "gp3",
"Encrypted": false,
"SnapshotId": "snap-09946d545033d96f7"
}
}
]
4 changes: 4 additions & 0 deletions demo/llm.gpu.service/get-user-data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
clustername=$1
region=$2
eksctl get cluster --region $region --name $clustername -o json \
| jq --raw-output '.[] | "settings.kubernetes.api-server = \"" + .Endpoint + "\"\nsettings.kubernetes.cluster-certificate =\"" + .CertificateAuthority.Data + "\"\n"' > user-data.toml
Loading