Skip to content

hawkh/searchengine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Search Engine with NLP

Overview

This project implements a search engine using Natural Language Processing techniques in Python with a Streamlit web interface.

Features

  • Query tokenization
  • Stop word removal
  • Stemming
  • KMP search algorithm
  • Normalization and TF-IDF ranking
  • Web page text extraction

Installation

  1. Clone the repository
  2. Install dependencies:
pip install -r requirements.txt
  1. Prepare your links.txt file with web page URLs to search

Usage

streamlit run main.py

Dependencies

  • Streamlit
  • NLTK
  • BeautifulSoup
  • urllib3

Methodology

  • Tokenizes search queries
  • Removes stop words
  • Applies Porter Stemming
  • Searches across predefined web pages
  • Ranks results using normalization and TF-IDF

🤝 Contributing

Contributions are welcome! Please check the outstanding issues and feel free to open a pull request.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published