Adjustments #17

murbans1 · 2025-03-11T19:46:26Z

make chatbox window bigger and resizable
add more trim after labels

…ith RAG * Enable creation of a vector DB with text docs downloaded from an S3 bucket. Added a new INPUT_TYPE value "text-docs" to the vector DB creation docker container for this. These text documents will be used as the RAG dataset. * Created new versions of the VectorDB creation job and RAG LLM service that have two changes - Do not need the image pull secret i.e (registry credentials, "regcred"). Corresponding images have been made public. - Both these k8s resources run in the default namespace (rather that MODEL_NAMESPACE) * Moved from using OpenAI API client.completions.create to client.chat.completions.create This needed newer langchain packages to be added to the Dockerfile. This has meant that the answer is now available in completions.choices[0].message.content (which was previously available in completions.choices[0].text) * Increased RAG context from 1 doc to 4 docs Minor: - Added error handling for OpenAI client creation and S3 client's object access - Removed 30s sleep after Vector RB retriever is created, since it didn't seem necessary - Added Env Var VECTOR_DB_S3_BUCKET to specify an S3 bucket instead of prior hard-coded value - If Env Var MODEL_LLM_SERVER_URL is not defined, we will use a preset default. - Changed model name from "mosaicml--mpt-7b-chat" to "mosaicml/mpt-7b-chat" since the former was producing an error.

Update VectorDB creation job, RAG LLM service for Chat-in-a-box with RAG

Add link to user instructions

- Added these configurable values to the RAG LLM service: MODEL_ID (DEFAULT=mosaicml/mpt-7b-chat) (Optional) RELEVANT_DOCS (DEFAULT = 2) (Optional) MAX_TOKENS (DEFAULT=128) (Optional) MODEL_TEMPERATURE (DEFAULT=0.01) - Added these configurable values to the Vector DB creation job: (Optional) EMBEDDING_CHUNK_SIZE (DEFAULT=1000) (Optional) EMBEDDING_CHUNK_OVERLAP (DEFAULT=100) (Optional) EMBEDDING_MODEL_NAME (DEFAULT=sentence-transformers/all-MiniLM-L6-v2) - Remove duplicate MODEL_ID read - handle integer and float embedding and RAG querying configuration parameters correctly - Use explicit embedding model for RAG dataset type sitemap - Update both vectorDB creation job and RAG LLM deploy to use Intel instance types via annotations - Adding logs of newly added configurable parameters - Updated k8 manifests to deploy version v1.2 of the vector db creation job and RAG LLM deploy + service. Fixes: - Explicit convertion of non-string parameters to int and float (and only if its non-empty) - Remove setting k (number of relevant docs) for retriever.invoke (only applicable for as_retriever) - Removed unused methods

Make Model, Vector store embedding and Vector store Retrieving parameters configurable

In order to start the UI within a browser: - export RAG_LLM_QUERY_URL=<IP of the RAG LLM query service> - $cd dockers/llm.chatui.service - In order to install the Gradio python module, you can use venv to do so: python -m venv .venv source .venv/bin/activate python3 -m pip install gradio - Start the Python UI in a browser: $python simple_chat.py

Re-add accidentally deleted str_to_int method

Add a simple chat UI

Convert v0.1.4 to markdown that can be checked in with the repo

Convert Google doc for Install to Markdown in the repo

Add Luna labels for the Vector store creation and RAG llm service

Simple chat deployment

…38) The top-level README was a bit sparse so I've added links to the main sections of our install docs with a brief intro this project.

* Our "text-docs" system prompt specifically instructs the LLM to answer using only the retrieved context. * Updated the "json-format" system prompt to also use only provided context.

- Also added a script to exercise this new method and verify the number of files in the user-provided s3 bucket. This will allow us to do a quick validation before running the vector DB creation job. - This script can be invoked like this: $ uv run check_files_in_s3_bucket.py --bucket $VECTOR_DB_S3_BUCKET --folder zendesk-input --region us-west-2

…or users to download (#43)

- Adding query.py to this repo to make it easier for testing

* Add logging to file * Send context with response * Log response context * Add pvc for storing logs * Add log rotation and removal

* Hide simple chat behind auth proxy * Empty-Commit

* Fixes to Chat Auth to correctly route to Chat UI * Update docs to include setup, configuration and use of Chat UI * Increase PV size for Chat logs to 20G from 1G * Increase log persistence to 7 days from 24 hrs * Install docs typo fix

* Make system prompt for json-format configurable via env var * Allow system prompt to be configured via env var * Update Zendesk data prep script to retrieve nested fields * Change reference to ticket ID from key to ticket * Update Chat UI from Message to Question * Changes to Chat UI * Changes "Chatbot" to "Question-Answering Chatbot" * Removes context from answer returned from RAG LLM * Changes slider for rating to stars * Removes the ChatML end token from the answer "<|im_end|>" * Use newer version of Gradio to use Ratings component * Updates to Chat UI - Remove ChatML end token * Rename key to ticket in vector db processing * Updates and corrections to Zendesk data prep * Update vector db creation image * Update serverag llm which uses older hard-coded metadata Unique ID field * Remove generated answers after Chat ML end token * Update ServeRAGLLM image and add system prompt env var to container * Allow chat app to truncate after Chat ML end token * fix typo in simple chat app logging method * fix logging formatting * Only print changes to serve RAG LLM * Update system prompt to prevent hallucination of new, related questions * Keep ticket URL generic * image version updates * update createvectordb image * Removed older versions of Zendesk dataset processing scripts

…a label fixes (#52) * Update system prompt to prevent adding context to generated responses * Remove context from generated response * Update zendesk processing to include ticket ID and dates in text before embedding * Fix missing logging import and add PV-PVC manifests for RAG LLM * fix unbound string, fix volume mount on ragllm * Update RAG LLM image * Update logging in RAG LLM * Move luna labels to the correct location in the Create Vector DB manifest * Update prints to logging in RAG LLM * Change error prints to logging.error in RAG LLM * Chat container should also be placed by Luna * update rag llm image

…generated answer (#57) This PR includes the following changes: * Handle hallucinated context being added to generated answers and is preceeded by the label: "Content" * Manually remove new-question hallucination which seems to be preceeded by the label "Question:" * Propagate any LLM API errors to the UI (this is otherwise causing a generic API response error to be received and displayed in the UI)

* Add end-user docs for Question-Answering ChatBot

…ted response (#64) Problem: With the Microsoft Phi-3 model and JSON processing we noticed that the generated response from the LLM (sometimes) includes multiple occurrences of the Chat ML tokens, “im_start” and “im_end”. New hallucinated questions are also added to the response. In certain cases these newly introduced questions (by the LLM) are relevant to the end-user’s question. In some instances, these are not related. Solution in this PR: Currently this will be processed as follows: * Any content after (and including) the first im_end ChatML token and the keywords "Question:", "Content:" and "Context:" is trimmed. * Within the remaining content, we replace any occurrence of im_start with a space.

Currently the QA in a box UI only shows the current question typed in by the user and corresponding answer generated by the LLM. Prospective customers wanted to be able to view prior questions, so they could use context from the generated answer in a follow up questions. This PR, updates the Chat UI to show the history of all questions asked by the user during the current session. This is implemented by setting chat history to be enabled by default. Note that this updates only the display. Prior questions and answers are not sent to the LLM as context for a subsequent question.

* Update README with end-user graphic * Add stack diagram * Add RAG operation

* Prepare script for processing txt files to common json * Remove special handling for txt files from db creation * Remove special handling for txt files from db creation * Simplify reading env vars * Prepare script for processing sitemaps to common json * Remove special handling for sitemaps from db creation * Remove not used imports * Add reading config from .env file or env vars and templates * Add settings in local setup * Add settings in k8s setup * Make sure docker works * Extract s3 operations * Use click in k8s version * Refactor - common config loading func * Enable running local and s3 modes from one file * Remove not needed file * Add requirements file * Load s3 files the same way as local dir * Introduce tween services * Use services * Remove not used methods * Make sure all files are visible for docker * Get rid of boto3 and move things around * Fix reading config * Bring back param removed while refactoring * Make sure aws creds are available * Add tests * Fix s3 mock * No need to set region * Improve print * Fix checking test mode * Cleanup docker and project file * Log validation error * Make sure yaml does not fail when integers passed * Cleanup test env file * Add readme * Change yaml expect env vars to have \"

* Update image version * Update image version * Update image version

* Make response rating work with history * Fix lint

* Align metadata with what goes to text * Remove metadata fiels from text * Move tags to metadata * Add method for chunking with enriching text with metadata * Leave space for metada inside the chunk * Use new chunking and skip if value missing * Improve description * Better eliminate extra spaces

* Updates to convert CSV to SQL lite DB for query execution * Add PV to RAG LLM service for mounting SQL DB for text-to-sql search * Convert zendesk JSON to CSV * Change QuestionType to SearchType, Update Dockerfile with DB * Add python deps for text-to-sql in Dockerfile * Add PV and PVC for support ticket DB for RAG LLM deployment * Fix retriever creation: Only for VECTOR search type * Comments cleanup, Fix SQL DB addition * Text to SQL querying for structured data - Adds Text to SQL capability for answering questions about user's structured data. - LLM inferencing is used in two ways in this new workflow: 1) Converting the user's natural language question into a SQL query 2) Converting the SQL results into a natural language answer that will be returned to the user - This PR uses Langchain's text-to-SQL APIs which require the use of an LLM that has function-calling (or tool-calling) ability. So this PR needs to be used with an LLM like the RubraAI's Phi-3 model: [https://huggingface.co/rubra-ai/Phi-3-mini-128k-instruct](https://huggingface.co/rubra-ai/Phi-3-mini-128k-instruct). This model takes the base Phi3 model and fine-tunes it for function calling. - This PR expects the structured data to be mounted to the RAG LLM deployment via a Persistent volume from S3. Instructions on doing this will be added to the [Install docs](https://github.com/elotl/GenAI-infra-stack/blob/main/docs/install.md). - This new env variable needs to be defined to invoke this feature: `export SEARCH_TYPE=SQL` In this PR, default search is being set to SQL temporarily. A follow up PR will allow an incoming question to be automatically classified and the appropriate search type: SQL or VECTOR to be picked at runtime. - This feature is only available on EKS and will be extended to other cloud providers in follow-up PRs. 1. Run the RubraAI model in the GenAI infra stack and port-forward the LLM query endpoint: ``` % kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kuberay-operator ClusterIP 10.100.164.141 <none> 8080/TCP 10d kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 10d llm-model-serve-head-svc ClusterIP 10.100.235.95 <none> 10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP 17h llm-model-serve-raycluster-fhc7j-head-svc ClusterIP 10.100.202.89 <none> 10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP 18h llm-model-serve-serve-svc ClusterIP 10.100.63.8 <none> 8000/TCP 17h ``` ``` selvik@Selvis-MacBook-Pro pynb % kubectl port-forward svc/llm-model-serve-serve-svc 8080:8000 Forwarding from 127.0.0.1:8080 -> 8000 Forwarding from [::1]:8080 -> 8000 ``` 2. Run the local version of the text-to-sql script: `python serverragllm_zendesk_csv_sql_local.py ` 3. Send a question to this local endpoint: ` % cd GenAI-infra-stack/scripts/query` `% python query_private_data.py` `Type your query here: How many tickets are there?` `Answer: {'answer': 'There are 123 tickets.', 'relevant_tickets': ['n/a'], 'sources': ['n/a'], 'context': ''} `

* Add docker compose for local setup * Enable serving weaviate from local docker instance * Add mothod for creating weaviate db * Enable creating local weaviate DB * Serve from local weaviate DB * Add retriever with score * Try similarity search with score * What I run against Phi * Add simple handling of too big context error * Use new model * Add simple enchancement for filtering by ticket id * Comment out filter * Make adding metadata to text better for embeddings * Cleanup texts a bit more * Try : instead of is * Add deployment for weaviate * Pass params to weaviate * Add deployment for creating weaviate db * Enable running different apps from one image * Parsing workaround * Fix to make it work :) * Fix imports after rebase * Fix yaml * Use one yaml * Add comments * Add docs

#71) This PR includes a question router. This router determines whether an incoming question will be handled by the SQL search or the RAG vector search. RAG vector search is parametrized to enable hybrid search i.e. a combination of vector search and text search. The question classification module uses a random forest classifier trained against synthetic questions (not included in this repo). The two classes are - "aggregation" questions - "pointed" questions. SQL search code path is chosen for: A. all aggregation questions, as well as, B. questions that have symbols or numerals in the questions. B is included in the SQL search type, because vector search does not handle alphanumeric words in a question such as: IP addresses, ticket numbers, etc based on our experiments. Squashed commits: * Random forest model for question classification * Add question router * Return unique sources * Update file paths for SQL DB and classification models to containerized value --------- Co-authored-by: Maciej Urbański <[email protected]>

This PR includes: Allow Weaviate's alpha parameter to be configurable Update logging references to configured "logger" Allowed containerized location of SQL DB and question classification models to be easily replaced with the local file paths using env vars * Allow Weaviate's alpha parameter to be configurable * Order imports - isort * Fix formatting - black * Remove not used field * Fix use logging instead of print --------- Co-authored-by: Maciej Urbański <[email protected]>

* Add missing weaviate module to requirements.txt * Add prompt to show equivalence of customers, clients, requesters and submitters * fix sql db path * minor Dockerfile fixes * Dockerfile updates to speed up builds * Add model path as env var to RAG LLMcontainer

selvik and others added 30 commits October 18, 2024 11:30

Merge pull request #1 from elotl/selvik/ragfortext

c7774c8

Update VectorDB creation job, RAG LLM service for Chat-in-a-box with RAG

Add link to user instructions

e4e8e7a

Merge pull request #2 from elotl/selvik/adddocs

0e51ec6

Add link to user instructions

Merge pull request #3 from elotl/selvik/chunksize

2de3fae

Make Model, Vector store embedding and Vector store Retrieving parameters configurable

Re-add accidentally deleted str_to_int method

b4916c3

Merge pull request #5 from elotl/selvik/fixserverag

1153777

Re-add accidentally deleted str_to_int method

Merge pull request #4 from elotl/simplechat

888ef31

Add a simple chat UI

Add dockerfile

438cfd9

Add make push image script

abebdde

Add deployment yaml

0e0e131

Fix image name

d122c59

Fix app name

655c25e

Fix - make sure venv python is used

832456a

Fix try without venv

9b9946a

Fix case

dead856

Fix - remove probes

8f5eca9

Fix svc dns address

cddc554

Convert Google docs for Installation to Markdown in the repo

0a7ef63

Convert v0.1.4 to markdown that can be checked in with the repo

Merge pull request #7 from elotl/markdown-docs

038b566

Convert Google doc for Install to Markdown in the repo

Add Luna labels for the Vector store creation and RAG llm service

8e60a71

Merge pull request #8 from elotl/selvik/addlunalabels

2d48c2d

Add Luna labels for the Vector store creation and RAG llm service

Merge pull request #6 from elotl/simple-chat-deployment

ec44d23

Simple chat deployment

RAG Query Py Instead of curl

79d9658

Add script for preparing jira tickets csv to embedding json

e2345e9

Fix lint

b5d7d47

Add new type for uploading jsonl files

5cb7775

Fix lint

654a1ce

selvik and others added 30 commits January 9, 2025 12:31

Update top-level README with links to Installation document sections (#…

25dd946

…38) The top-level README was a bit sparse so I've added links to the main sections of our install docs with a brief intro this project.

Upates to system prompt to be factual (#40)

c09ed34

* Our "text-docs" system prompt specifically instructs the LLM to answer using only the retrieved context. * Updated the "json-format" system prompt to also use only provided context.

Update block_device_mapping.json link to raw file to make it easier f…

36188fa

…or users to download (#43)

Minor doc and requirements.txt updates (#44)

60d4157

- Adding query.py to this repo to make it easier for testing

Improve logging (#42)

0f94668

* Add logging to file * Send context with response * Log response context * Add pvc for storing logs * Add log rotation and removal

Auth proxy (#41)

d79c624

* Hide simple chat behind auth proxy * Empty-Commit

Add system prompt to local setup (#51)

5d1a0df

Update chat and createvdb images to have consistent versions (#58)

c5f56e3

Add end-user docs for Question-Answer ChatBot (#59)

625f409

* Add end-user docs for Question-Answering ChatBot

Update README with Infra stack and RAG graphics (#61)

83f5de3

* Update README with end-user graphic * Add stack diagram * Add RAG operation

Update image versions (#67)

3ea1d28

* Update image version * Update image version * Update image version

Make response rating work with history (#66)

1b1db8e

* Make response rating work with history * Fix lint

Docs for preparatory steps for enabling Text-to-SQL search (#72)

3701773

Fix to retain RAG only Question-Answer Chatbot (#73)

43c8c64

Make chatbox window bigger

b07fdeb

Add more trim after labels

ce6baf1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adjustments #17

Adjustments #17

Uh oh!

murbans1 commented Mar 11, 2025

Uh oh!

Uh oh!

Adjustments #17

Are you sure you want to change the base?

Adjustments #17

Uh oh!

Conversation

murbans1 commented Mar 11, 2025

Uh oh!

Uh oh!