Skip to content

Adjustments #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 121 commits into
base: main
Choose a base branch
from
Open

Adjustments #17

wants to merge 121 commits into from

Conversation

murbans1
Copy link

  • make chatbox window bigger and resizable
  • add more trim after labels

selvik and others added 30 commits October 18, 2024 11:30
…ith RAG

* Enable creation of a vector DB with text docs downloaded from an S3 bucket. Added a new
  INPUT_TYPE value "text-docs" to the vector DB creation docker container for this.
  These text documents will be used as the RAG dataset.
* Created new versions of the VectorDB creation job and RAG LLM service that have two changes
   - Do not need the image pull secret i.e (registry credentials, "regcred"). Corresponding
     images have been made public.
   - Both these k8s resources run in the default namespace (rather that MODEL_NAMESPACE)
* Moved from using OpenAI API client.completions.create to client.chat.completions.create
  This needed newer langchain packages to be added to the Dockerfile. This has meant that
  the answer is now available in completions.choices[0].message.content (which was previously
  available in completions.choices[0].text)
* Increased RAG context from 1 doc to 4 docs

Minor:
- Added error handling for OpenAI client creation and S3 client's object access
- Removed 30s sleep after Vector RB retriever is created, since it didn't seem necessary
- Added Env Var VECTOR_DB_S3_BUCKET to specify an S3 bucket instead of prior hard-coded value
- If Env Var MODEL_LLM_SERVER_URL is not defined, we will use a preset default.
- Changed model name from "mosaicml--mpt-7b-chat" to "mosaicml/mpt-7b-chat" since the former
  was producing an error.
Update VectorDB creation job, RAG LLM service for Chat-in-a-box with RAG
Add link to user instructions
- Added these configurable values to the RAG LLM service:
    MODEL_ID (DEFAULT=mosaicml/mpt-7b-chat)
    (Optional) RELEVANT_DOCS (DEFAULT = 2)
    (Optional) MAX_TOKENS (DEFAULT=128)
    (Optional) MODEL_TEMPERATURE (DEFAULT=0.01)
- Added these configurable values to the Vector DB creation job:
    (Optional) EMBEDDING_CHUNK_SIZE (DEFAULT=1000)
    (Optional) EMBEDDING_CHUNK_OVERLAP (DEFAULT=100)
    (Optional) EMBEDDING_MODEL_NAME (DEFAULT=sentence-transformers/all-MiniLM-L6-v2)
- Remove duplicate MODEL_ID read
- handle integer and float embedding and RAG querying configuration parameters correctly
- Use explicit embedding model for RAG dataset type sitemap
- Update both vectorDB creation job and RAG LLM deploy to use Intel instance types via annotations
- Adding logs of newly added configurable parameters
- Updated k8 manifests to deploy version v1.2 of the vector db creation job and RAG LLM deploy + service.

Fixes:
- Explicit convertion of non-string parameters to int and float (and only if its non-empty)
- Remove setting k (number of relevant docs) for retriever.invoke (only applicable for as_retriever)
- Removed unused methods
Make Model, Vector store embedding and Vector store Retrieving parameters configurable
In order to start the UI within a browser:
- export RAG_LLM_QUERY_URL=<IP of the RAG LLM query service>
- $cd dockers/llm.chatui.service
- In order to install the Gradio python module, you can use venv to do so:
   python -m venv .venv
   source .venv/bin/activate
   python3 -m pip install gradio
- Start the Python UI in a browser: $python simple_chat.py
Re-add accidentally deleted str_to_int method
Convert v0.1.4 to markdown that can be checked in with the repo
Convert Google doc for Install to Markdown in the repo
Add Luna labels for the Vector store creation and RAG llm service
selvik and others added 30 commits January 9, 2025 12:31
…38)

The top-level README was a bit sparse so I've added links to the main sections of our install docs with a brief intro this project.
   *  Our "text-docs" system prompt specifically instructs the LLM to answer using only the retrieved context.
   * Updated the "json-format" system prompt to also use only provided context.
- Also added a script to exercise this new method and verify the number
  of files in the user-provided s3 bucket. This will allow us to do
  a quick validation before running the vector DB creation job.

- This script can be invoked like this:

$ uv run check_files_in_s3_bucket.py --bucket $VECTOR_DB_S3_BUCKET --folder zendesk-input  --region us-west-2
- Adding query.py to this repo to make it easier for testing
* Add logging to file

* Send context with response

* Log response context

* Add pvc for storing logs

* Add log rotation and removal
* Hide simple chat behind auth proxy

* Empty-Commit
* Fixes to Chat Auth to correctly route to Chat UI

* Update docs to include setup, configuration and use of Chat UI

* Increase PV size for Chat logs to 20G from 1G

* Increase log persistence to 7 days from 24 hrs

* Install docs typo fix
* Make system prompt for json-format configurable via env var

* Allow system prompt to be configured via env var

* Update Zendesk data prep script to retrieve nested fields

* Change reference to ticket ID from key to ticket

* Update Chat UI from Message to Question

* Changes to Chat UI

* Changes "Chatbot" to "Question-Answering Chatbot"
* Removes context from answer returned from RAG LLM
* Changes slider for rating to stars
* Removes the ChatML end token from the answer "<|im_end|>"

* Use newer version of Gradio to use Ratings component

* Updates to Chat UI

- Remove ChatML end token

* Rename key to ticket in vector db processing

* Updates and corrections to Zendesk data prep

* Update vector db creation image

* Update serverag llm which uses older hard-coded metadata Unique ID field

* Remove generated answers after Chat ML end token

* Update ServeRAGLLM image and add system prompt env var to container

* Allow chat app to truncate after Chat ML end token

* fix typo in simple chat app logging method

* fix logging formatting

* Only print changes to serve RAG LLM

* Update system prompt to prevent hallucination of new, related questions

* Keep ticket URL generic

* image version updates

* update createvectordb image

* Removed older versions of Zendesk dataset processing scripts
…a label fixes (#52)

* Update system prompt to prevent adding context to generated responses

* Remove context from generated response

* Update zendesk processing to include ticket ID and dates in text before embedding

* Fix missing logging import and add PV-PVC manifests for RAG LLM

* fix unbound string, fix volume mount on ragllm

* Update RAG LLM image

* Update logging in RAG LLM

* Move luna labels to the correct location in the Create Vector DB manifest

* Update prints to logging in RAG LLM

* Change error prints to logging.error in RAG LLM

* Chat container should also be placed by Luna

* update rag llm image
…generated answer (#57)

This PR includes the following changes:

* Handle hallucinated context being added to generated answers and is preceeded by the label: "Content"
* Manually remove new-question hallucination which seems to be preceeded by the label "Question:"
* Propagate any LLM API errors to the UI (this is otherwise causing a generic API response error to be received and displayed in the UI)
* Add end-user docs for Question-Answering ChatBot
…ted response (#64)

Problem:

With the Microsoft Phi-3 model and JSON processing we noticed that the generated response from the LLM (sometimes) includes multiple occurrences of the Chat ML tokens, “im_start” and “im_end”. New hallucinated questions are also added to the response. In certain cases these newly introduced questions (by the LLM) are relevant to the end-user’s question. In some instances, these are not related.

Solution in this PR:

Currently this will be processed as follows:
* Any content after (and including) the first im_end ChatML token and the keywords "Question:", "Content:" and "Context:" is trimmed.
* Within the remaining content, we replace any occurrence of im_start with a space.
Currently the QA in a box UI only shows the current question typed in by the user and corresponding answer generated by the LLM. Prospective customers wanted to be able to view prior questions, so they could use context from the generated answer in a follow up questions.

This PR, updates the Chat UI to show the history of all questions asked by the user during the current session.

This is implemented by setting chat history to be enabled by default.

Note that this updates only the display. Prior questions and answers are not sent to the LLM as context for a subsequent question.
* Update README with end-user graphic
* Add stack diagram
* Add RAG operation
* Prepare script for processing txt files to common json

* Remove special handling for txt files from db creation

* Remove special handling for txt files from db creation

* Simplify reading env vars

* Prepare script for processing sitemaps to common json

* Remove special handling for sitemaps from db creation

* Remove not used imports

* Add reading config from .env file or env vars and templates

* Add settings in local setup

* Add settings in k8s setup

* Make sure docker works

* Extract s3 operations

* Use click in k8s version

* Refactor - common config loading func

* Enable running local and s3 modes from one file

* Remove not needed file

* Add requirements file

* Load s3 files the same way as local dir

* Introduce tween services

* Use services

* Remove not used methods

* Make sure all files are visible for docker

* Get rid of boto3 and move things around

* Fix reading config

* Bring back param removed while refactoring

* Make sure aws creds are available

* Add tests

* Fix s3 mock

* No need to set region

* Improve print

* Fix checking test mode

* Cleanup docker and project file

* Log validation error

* Make sure yaml does not fail when integers passed

* Cleanup test env file

* Add readme

* Change yaml expect env vars to have \"
* Update image version

* Update image version

* Update image version
* Make response rating work with history

* Fix lint
* Align metadata with what goes to text

* Remove metadata fiels from text

* Move tags to metadata

* Add method for chunking with enriching text with metadata

* Leave space for metada inside the chunk

* Use new chunking and skip if value missing

* Improve description

* Better eliminate extra spaces
* Updates to convert CSV to SQL lite DB for query execution

* Add PV to RAG LLM service for mounting SQL DB for text-to-sql search

* Convert zendesk JSON to CSV

* Change QuestionType to SearchType, Update Dockerfile with DB

* Add python deps for text-to-sql in Dockerfile

* Add PV and PVC for support ticket DB for RAG LLM deployment

* Fix retriever creation: Only for VECTOR search type

* Comments cleanup, Fix SQL DB addition

* Text to SQL querying for structured data

- Adds Text to SQL capability for answering questions about user's structured data.
- LLM inferencing is used in two ways in this new workflow:
   1) Converting the user's natural language question into a SQL query
   2) Converting the SQL results into a natural language answer that will be returned to the user
- This PR uses Langchain's text-to-SQL APIs which require the use of an LLM that has function-calling (or tool-calling) ability. So this PR needs to be used with an LLM like the RubraAI's Phi-3 model: [https://huggingface.co/rubra-ai/Phi-3-mini-128k-instruct](https://huggingface.co/rubra-ai/Phi-3-mini-128k-instruct). This model takes the base Phi3 model and fine-tunes it for function calling.
- This PR expects the structured data to be mounted to the RAG LLM deployment via a Persistent volume from S3. Instructions on doing this will be added to the [Install docs](https://github.com/elotl/GenAI-infra-stack/blob/main/docs/install.md).
- This new env variable needs to be defined to invoke this feature:
`export SEARCH_TYPE=SQL`
 In this PR, default search is being set to SQL temporarily. A follow up PR will allow an incoming question to be automatically classified and the appropriate search type: SQL or VECTOR to be picked at runtime.
- This feature is only available on EKS and will be extended to other cloud providers in follow-up PRs.

1. Run the RubraAI model in the GenAI infra stack and port-forward the LLM query endpoint:

```
 % kubectl get svc
NAME                                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                         AGE
kuberay-operator                            ClusterIP   10.100.164.141   <none>        8080/TCP                                        10d
kubernetes                                  ClusterIP   10.100.0.1       <none>        443/TCP                                         10d
llm-model-serve-head-svc                    ClusterIP   10.100.235.95    <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP   17h
llm-model-serve-raycluster-fhc7j-head-svc   ClusterIP   10.100.202.89    <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP   18h
llm-model-serve-serve-svc                   ClusterIP   10.100.63.8      <none>        8000/TCP                                        17h
```

```
selvik@Selvis-MacBook-Pro pynb % kubectl  port-forward svc/llm-model-serve-serve-svc  8080:8000
Forwarding from 127.0.0.1:8080 -> 8000
Forwarding from [::1]:8080 -> 8000
```

2. Run the local version of the text-to-sql script:
`python serverragllm_zendesk_csv_sql_local.py
`

3. Send a question to this local endpoint:
`
% cd GenAI-infra-stack/scripts/query`

`% python query_private_data.py`

`Type your query here: How many tickets are there?`

`Answer: {'answer': 'There are 123 tickets.', 'relevant_tickets': ['n/a'], 'sources': ['n/a'], 'context': ''}
`
* Add docker compose for local setup

* Enable serving weaviate from local docker instance

* Add mothod for creating weaviate db

* Enable creating local weaviate DB

* Serve from local weaviate DB

* Add retriever with score

* Try similarity search with score

* What I run against Phi

* Add simple handling of too big context error

* Use new model

* Add simple enchancement for filtering by ticket id

* Comment out filter

* Make adding metadata to text better for embeddings

* Cleanup texts a bit more

* Try : instead of is

* Add deployment for weaviate

* Pass params to weaviate

* Add deployment for creating weaviate db

* Enable running different apps from one image

* Parsing workaround

* Fix to make it work :)

* Fix imports after rebase

* Fix yaml

* Use one yaml

* Add comments

* Add docs
#71)

This PR includes a question router. This router determines whether an incoming question will be handled by the SQL search or the RAG vector search. RAG vector search is parametrized to enable hybrid search i.e. a combination of vector search and text search.

The question classification module uses a random forest classifier trained against synthetic questions (not included in this repo). The two classes are
-     "aggregation" questions
-     "pointed" questions.

SQL search code path is chosen for:
A. all aggregation questions, as well as,
B. questions that have symbols or numerals in the questions.

B is included in the SQL search type, because vector search does not handle alphanumeric words in a question such as: IP addresses, ticket numbers, etc based on our experiments.

Squashed commits: 

* Random forest model for question classification

* Add question router

* Return unique sources

* Update file paths for SQL DB and classification models to containerized value

---------

Co-authored-by: Maciej Urbański <[email protected]>
This PR includes:

    Allow Weaviate's alpha parameter to be configurable
    Update logging references to configured "logger"
    Allowed containerized location of SQL DB and question classification models to be easily replaced with the local file paths using env vars

* Allow Weaviate's alpha parameter to be configurable

* Order imports - isort

* Fix formatting - black

* Remove not used field

* Fix use logging instead of print

---------

Co-authored-by: Maciej Urbański <[email protected]>
* Add missing weaviate module to requirements.txt

* Add prompt to show equivalence of customers, clients, requesters and submitters

* fix sql db path

* minor Dockerfile fixes

* Dockerfile updates to speed up builds

* Add model path as env var to RAG LLMcontainer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants