-
Notifications
You must be signed in to change notification settings - Fork 2
Adjustments #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
murbans1
wants to merge
121
commits into
loftyoutcome:main
Choose a base branch
from
elotl:adjustments
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Adjustments #17
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
murbans1
commented
Mar 11, 2025
- make chatbox window bigger and resizable
- add more trim after labels
…ith RAG * Enable creation of a vector DB with text docs downloaded from an S3 bucket. Added a new INPUT_TYPE value "text-docs" to the vector DB creation docker container for this. These text documents will be used as the RAG dataset. * Created new versions of the VectorDB creation job and RAG LLM service that have two changes - Do not need the image pull secret i.e (registry credentials, "regcred"). Corresponding images have been made public. - Both these k8s resources run in the default namespace (rather that MODEL_NAMESPACE) * Moved from using OpenAI API client.completions.create to client.chat.completions.create This needed newer langchain packages to be added to the Dockerfile. This has meant that the answer is now available in completions.choices[0].message.content (which was previously available in completions.choices[0].text) * Increased RAG context from 1 doc to 4 docs Minor: - Added error handling for OpenAI client creation and S3 client's object access - Removed 30s sleep after Vector RB retriever is created, since it didn't seem necessary - Added Env Var VECTOR_DB_S3_BUCKET to specify an S3 bucket instead of prior hard-coded value - If Env Var MODEL_LLM_SERVER_URL is not defined, we will use a preset default. - Changed model name from "mosaicml--mpt-7b-chat" to "mosaicml/mpt-7b-chat" since the former was producing an error.
Update VectorDB creation job, RAG LLM service for Chat-in-a-box with RAG
Add link to user instructions
- Added these configurable values to the RAG LLM service: MODEL_ID (DEFAULT=mosaicml/mpt-7b-chat) (Optional) RELEVANT_DOCS (DEFAULT = 2) (Optional) MAX_TOKENS (DEFAULT=128) (Optional) MODEL_TEMPERATURE (DEFAULT=0.01) - Added these configurable values to the Vector DB creation job: (Optional) EMBEDDING_CHUNK_SIZE (DEFAULT=1000) (Optional) EMBEDDING_CHUNK_OVERLAP (DEFAULT=100) (Optional) EMBEDDING_MODEL_NAME (DEFAULT=sentence-transformers/all-MiniLM-L6-v2) - Remove duplicate MODEL_ID read - handle integer and float embedding and RAG querying configuration parameters correctly - Use explicit embedding model for RAG dataset type sitemap - Update both vectorDB creation job and RAG LLM deploy to use Intel instance types via annotations - Adding logs of newly added configurable parameters - Updated k8 manifests to deploy version v1.2 of the vector db creation job and RAG LLM deploy + service. Fixes: - Explicit convertion of non-string parameters to int and float (and only if its non-empty) - Remove setting k (number of relevant docs) for retriever.invoke (only applicable for as_retriever) - Removed unused methods
Make Model, Vector store embedding and Vector store Retrieving parameters configurable
In order to start the UI within a browser: - export RAG_LLM_QUERY_URL=<IP of the RAG LLM query service> - $cd dockers/llm.chatui.service - In order to install the Gradio python module, you can use venv to do so: python -m venv .venv source .venv/bin/activate python3 -m pip install gradio - Start the Python UI in a browser: $python simple_chat.py
Re-add accidentally deleted str_to_int method
Add a simple chat UI
Convert v0.1.4 to markdown that can be checked in with the repo
Convert Google doc for Install to Markdown in the repo
Add Luna labels for the Vector store creation and RAG llm service
Simple chat deployment
…38) The top-level README was a bit sparse so I've added links to the main sections of our install docs with a brief intro this project.
* Our "text-docs" system prompt specifically instructs the LLM to answer using only the retrieved context. * Updated the "json-format" system prompt to also use only provided context.
- Also added a script to exercise this new method and verify the number of files in the user-provided s3 bucket. This will allow us to do a quick validation before running the vector DB creation job. - This script can be invoked like this: $ uv run check_files_in_s3_bucket.py --bucket $VECTOR_DB_S3_BUCKET --folder zendesk-input --region us-west-2
…or users to download (#43)
- Adding query.py to this repo to make it easier for testing
* Add logging to file * Send context with response * Log response context * Add pvc for storing logs * Add log rotation and removal
* Hide simple chat behind auth proxy * Empty-Commit
* Fixes to Chat Auth to correctly route to Chat UI * Update docs to include setup, configuration and use of Chat UI * Increase PV size for Chat logs to 20G from 1G * Increase log persistence to 7 days from 24 hrs * Install docs typo fix
* Make system prompt for json-format configurable via env var * Allow system prompt to be configured via env var * Update Zendesk data prep script to retrieve nested fields * Change reference to ticket ID from key to ticket * Update Chat UI from Message to Question * Changes to Chat UI * Changes "Chatbot" to "Question-Answering Chatbot" * Removes context from answer returned from RAG LLM * Changes slider for rating to stars * Removes the ChatML end token from the answer "<|im_end|>" * Use newer version of Gradio to use Ratings component * Updates to Chat UI - Remove ChatML end token * Rename key to ticket in vector db processing * Updates and corrections to Zendesk data prep * Update vector db creation image * Update serverag llm which uses older hard-coded metadata Unique ID field * Remove generated answers after Chat ML end token * Update ServeRAGLLM image and add system prompt env var to container * Allow chat app to truncate after Chat ML end token * fix typo in simple chat app logging method * fix logging formatting * Only print changes to serve RAG LLM * Update system prompt to prevent hallucination of new, related questions * Keep ticket URL generic * image version updates * update createvectordb image * Removed older versions of Zendesk dataset processing scripts
…a label fixes (#52) * Update system prompt to prevent adding context to generated responses * Remove context from generated response * Update zendesk processing to include ticket ID and dates in text before embedding * Fix missing logging import and add PV-PVC manifests for RAG LLM * fix unbound string, fix volume mount on ragllm * Update RAG LLM image * Update logging in RAG LLM * Move luna labels to the correct location in the Create Vector DB manifest * Update prints to logging in RAG LLM * Change error prints to logging.error in RAG LLM * Chat container should also be placed by Luna * update rag llm image
…generated answer (#57) This PR includes the following changes: * Handle hallucinated context being added to generated answers and is preceeded by the label: "Content" * Manually remove new-question hallucination which seems to be preceeded by the label "Question:" * Propagate any LLM API errors to the UI (this is otherwise causing a generic API response error to be received and displayed in the UI)
* Add end-user docs for Question-Answering ChatBot
…ted response (#64) Problem: With the Microsoft Phi-3 model and JSON processing we noticed that the generated response from the LLM (sometimes) includes multiple occurrences of the Chat ML tokens, “im_start” and “im_end”. New hallucinated questions are also added to the response. In certain cases these newly introduced questions (by the LLM) are relevant to the end-user’s question. In some instances, these are not related. Solution in this PR: Currently this will be processed as follows: * Any content after (and including) the first im_end ChatML token and the keywords "Question:", "Content:" and "Context:" is trimmed. * Within the remaining content, we replace any occurrence of im_start with a space.
Currently the QA in a box UI only shows the current question typed in by the user and corresponding answer generated by the LLM. Prospective customers wanted to be able to view prior questions, so they could use context from the generated answer in a follow up questions. This PR, updates the Chat UI to show the history of all questions asked by the user during the current session. This is implemented by setting chat history to be enabled by default. Note that this updates only the display. Prior questions and answers are not sent to the LLM as context for a subsequent question.
* Update README with end-user graphic * Add stack diagram * Add RAG operation
* Prepare script for processing txt files to common json * Remove special handling for txt files from db creation * Remove special handling for txt files from db creation * Simplify reading env vars * Prepare script for processing sitemaps to common json * Remove special handling for sitemaps from db creation * Remove not used imports * Add reading config from .env file or env vars and templates * Add settings in local setup * Add settings in k8s setup * Make sure docker works * Extract s3 operations * Use click in k8s version * Refactor - common config loading func * Enable running local and s3 modes from one file * Remove not needed file * Add requirements file * Load s3 files the same way as local dir * Introduce tween services * Use services * Remove not used methods * Make sure all files are visible for docker * Get rid of boto3 and move things around * Fix reading config * Bring back param removed while refactoring * Make sure aws creds are available * Add tests * Fix s3 mock * No need to set region * Improve print * Fix checking test mode * Cleanup docker and project file * Log validation error * Make sure yaml does not fail when integers passed * Cleanup test env file * Add readme * Change yaml expect env vars to have \"
* Update image version * Update image version * Update image version
* Make response rating work with history * Fix lint
* Align metadata with what goes to text * Remove metadata fiels from text * Move tags to metadata * Add method for chunking with enriching text with metadata * Leave space for metada inside the chunk * Use new chunking and skip if value missing * Improve description * Better eliminate extra spaces
* Updates to convert CSV to SQL lite DB for query execution * Add PV to RAG LLM service for mounting SQL DB for text-to-sql search * Convert zendesk JSON to CSV * Change QuestionType to SearchType, Update Dockerfile with DB * Add python deps for text-to-sql in Dockerfile * Add PV and PVC for support ticket DB for RAG LLM deployment * Fix retriever creation: Only for VECTOR search type * Comments cleanup, Fix SQL DB addition * Text to SQL querying for structured data - Adds Text to SQL capability for answering questions about user's structured data. - LLM inferencing is used in two ways in this new workflow: 1) Converting the user's natural language question into a SQL query 2) Converting the SQL results into a natural language answer that will be returned to the user - This PR uses Langchain's text-to-SQL APIs which require the use of an LLM that has function-calling (or tool-calling) ability. So this PR needs to be used with an LLM like the RubraAI's Phi-3 model: [https://huggingface.co/rubra-ai/Phi-3-mini-128k-instruct](https://huggingface.co/rubra-ai/Phi-3-mini-128k-instruct). This model takes the base Phi3 model and fine-tunes it for function calling. - This PR expects the structured data to be mounted to the RAG LLM deployment via a Persistent volume from S3. Instructions on doing this will be added to the [Install docs](https://github.com/elotl/GenAI-infra-stack/blob/main/docs/install.md). - This new env variable needs to be defined to invoke this feature: `export SEARCH_TYPE=SQL` In this PR, default search is being set to SQL temporarily. A follow up PR will allow an incoming question to be automatically classified and the appropriate search type: SQL or VECTOR to be picked at runtime. - This feature is only available on EKS and will be extended to other cloud providers in follow-up PRs. 1. Run the RubraAI model in the GenAI infra stack and port-forward the LLM query endpoint: ``` % kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kuberay-operator ClusterIP 10.100.164.141 <none> 8080/TCP 10d kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 10d llm-model-serve-head-svc ClusterIP 10.100.235.95 <none> 10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP 17h llm-model-serve-raycluster-fhc7j-head-svc ClusterIP 10.100.202.89 <none> 10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP 18h llm-model-serve-serve-svc ClusterIP 10.100.63.8 <none> 8000/TCP 17h ``` ``` selvik@Selvis-MacBook-Pro pynb % kubectl port-forward svc/llm-model-serve-serve-svc 8080:8000 Forwarding from 127.0.0.1:8080 -> 8000 Forwarding from [::1]:8080 -> 8000 ``` 2. Run the local version of the text-to-sql script: `python serverragllm_zendesk_csv_sql_local.py ` 3. Send a question to this local endpoint: ` % cd GenAI-infra-stack/scripts/query` `% python query_private_data.py` `Type your query here: How many tickets are there?` `Answer: {'answer': 'There are 123 tickets.', 'relevant_tickets': ['n/a'], 'sources': ['n/a'], 'context': ''} `
* Add docker compose for local setup * Enable serving weaviate from local docker instance * Add mothod for creating weaviate db * Enable creating local weaviate DB * Serve from local weaviate DB * Add retriever with score * Try similarity search with score * What I run against Phi * Add simple handling of too big context error * Use new model * Add simple enchancement for filtering by ticket id * Comment out filter * Make adding metadata to text better for embeddings * Cleanup texts a bit more * Try : instead of is * Add deployment for weaviate * Pass params to weaviate * Add deployment for creating weaviate db * Enable running different apps from one image * Parsing workaround * Fix to make it work :) * Fix imports after rebase * Fix yaml * Use one yaml * Add comments * Add docs
#71) This PR includes a question router. This router determines whether an incoming question will be handled by the SQL search or the RAG vector search. RAG vector search is parametrized to enable hybrid search i.e. a combination of vector search and text search. The question classification module uses a random forest classifier trained against synthetic questions (not included in this repo). The two classes are - "aggregation" questions - "pointed" questions. SQL search code path is chosen for: A. all aggregation questions, as well as, B. questions that have symbols or numerals in the questions. B is included in the SQL search type, because vector search does not handle alphanumeric words in a question such as: IP addresses, ticket numbers, etc based on our experiments. Squashed commits: * Random forest model for question classification * Add question router * Return unique sources * Update file paths for SQL DB and classification models to containerized value --------- Co-authored-by: Maciej Urbański <[email protected]>
This PR includes: Allow Weaviate's alpha parameter to be configurable Update logging references to configured "logger" Allowed containerized location of SQL DB and question classification models to be easily replaced with the local file paths using env vars * Allow Weaviate's alpha parameter to be configurable * Order imports - isort * Fix formatting - black * Remove not used field * Fix use logging instead of print --------- Co-authored-by: Maciej Urbański <[email protected]>
* Add missing weaviate module to requirements.txt * Add prompt to show equivalence of customers, clients, requesters and submitters * fix sql db path * minor Dockerfile fixes * Dockerfile updates to speed up builds * Add model path as env var to RAG LLMcontainer
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.