This project implements a search engine using Natural Language Processing techniques in Python with a Streamlit web interface.
- Query tokenization
- Stop word removal
- Stemming
- KMP search algorithm
- Normalization and TF-IDF ranking
- Web page text extraction
- Clone the repository
- Install dependencies:
pip install -r requirements.txt- Prepare your
links.txtfile with web page URLs to search
streamlit run main.py- Streamlit
- NLTK
- BeautifulSoup
- urllib3
- Tokenizes search queries
- Removes stop words
- Applies Porter Stemming
- Searches across predefined web pages
- Ranks results using normalization and TF-IDF
Contributions are welcome! Please check the outstanding issues and feel free to open a pull request.