📄 SmartDoc Assistant – RAG-based PDF QA Chatbot

SmartDoc Assistant is an end-to-end intelligent document Q&A assistant powered by LangGraph, LangChain, and Gemini (Google Generative AI). It allows users to upload any PDF file and instantly ask questions about its contents using Retrieval-Augmented Generation (RAG).

✅ Uses Google Embeddings + Gemini LLM
✅ Summarizes & answers questions based on document context
✅ Fully in-memory (privacy-friendly: no permanent file storage)
✅ Deployed via Streamlit Cloud (Free Tier)

🚀 Live Demo

SmartDoc Assistant 👉 Try it on Streamlit

🛠 Tech Stack

Layer	Tech
UI	Streamlit
LLM	Google Generative AI (Gemini 2.5 Flash)
Embeddings	Google Generative AI Embeddings (`embedding-001`)
Vector DB	FAISS (In-memory)
Graph Flow	LangGraph
Framework	LangChain
PDF Parser	PyMuPDF
Language	Python 3.11
Hosting	Streamlit Cloud (free tier)

📦 Project Structure

smartdoc-assistant/
├── backend/
│   ├── chains.py              # LLMs and chain setup
│   ├── config.py              # Constants and config
│   ├── rag_utils.py           # PDF loading, embedding, vectorstore utils
├── frontend/
│   └── streamlit_app.py       # Main Streamlit UI
├── langgraph_app.py           # LangGraph workflow (input -> retrieval -> output)
├── test_embed_and_store.py    # Simple CLI test for embedding logic
├── temp/                      # Temporary file store (auto-cleared)
├── .env                       # Google API Key & Config
├── requirements.txt           # Project dependencies
└── README.md                  # You're reading it!

📁 How to Run Locally

# Clone repo
git clone https://github.com/deepak4siriboyina/smartdoc-assistant.git
cd smartdoc-assistant

# Create virtual environment
python -m venv virtenvt
virtenvt\Scripts\activate  # (Use PowerShell)

# Install dependencies
pip install -r requirements.txt

# Set your API Key
echo GOOGLE_API_KEY=your-api-key > .env

# Run the app
streamlit run frontend/streamlit_app.py

🧠 How It Works

User uploads a PDF file through the Streamlit UI.
The PDF is parsed, chunked, and embedded using Google's embedding-001 model.
The chunks are stored in a temporary in-memory FAISS vector store.
When the user asks a question:
- The LangGraph flow is triggered:
- → input → retrieve → answer
- A retriever fetches relevant chunks, and Gemini 2.5 Flash answers using RetrievalQA.
All Q&A pairs are saved in the session, can be viewed via dropdown, and downloaded as .txt or .csv

✨ Features

📄 Upload any PDF and ask questions interactively.
⚙️ Temporary in-memory processing – no persistent storage or data leakage.
🧠 Uses Google's latest Gemini Flash model for fast responses.
🗂️ Expandable chat history with full Q&A transcripts.
⏬ One-click download of chat history.
✅ Lightweight, free to run, and private by design.

📤 Deployment

Streamlit Frontend → Streamlit Cloud

🔐 Data Privacy

All uploaded PDFs are processed in-memory and deleted after embedding.
No document data is permanently stored.

🙌 Credits

🧑‍💻 Author

Deepak Siriboyina – LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📄 SmartDoc Assistant – RAG-based PDF QA Chatbot

🚀 Live Demo

🛠 Tech Stack

📦 Project Structure

📁 How to Run Locally

🧠 How It Works

✨ Features

📤 Deployment

🔐 Data Privacy

🙌 Credits

🧑‍💻 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
langgraph_app.py		langgraph_app.py
requirements.txt		requirements.txt
test_embed_and_store.py		test_embed_and_store.py

Deepak4Siriboyina/smartdoc-assistant

Folders and files

Latest commit

History

Repository files navigation

📄 SmartDoc Assistant – RAG-based PDF QA Chatbot

🚀 Live Demo

🛠 Tech Stack

📦 Project Structure

📁 How to Run Locally

🧠 How It Works

✨ Features

📤 Deployment

🔐 Data Privacy

🙌 Credits

🧑‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages