Hack-a-Bot-Scraper 🤖

A comprehensive web scraping tool designed to automatically collect hackathon information from multiple popular hackathon platforms. This tool helps developers, students, and hackathon enthusiasts stay updated with the latest hackathon opportunities by scraping and storing event details in a MongoDB database.

🎯 Features

Multi-Platform Support: Scrapes hackathons from multiple popular platforms:
- Devpost: One of the largest hackathon hosting platforms
- AllHackathons: Comprehensive hackathon listing website
- Hack2Skill: Platform for skill-based hackathons and competitions
Intelligent Data Extraction: Extracts comprehensive hackathon details including:
- Event name and description
- Start and end dates
- Registration deadlines
- Event mode (Online/Offline/Hybrid)
- Location information
- Prize amounts and participant counts
- Event images and URLs
- Tags and categories
- Timeline information
Database Integration: Stores all scraped data in MongoDB with duplicate prevention
Anti-Bot Protection: Uses undetected Chrome driver to bypass anti-bot measures
Error Handling: Robust error handling and logging for reliable operation
Modular Design: Each scraper is independently developed and can be run separately

🏗️ Project Structure

Hack-a-Bot-scraper/
├── LICENSE
├── README.md
└── Scraper/
    ├── __init__.py
    ├── __main__.py          # Main entry point
    ├── db.py               # Database operations
    ├── Procfile           # Heroku deployment configuration
    ├── requirements.txt   # Python dependencies
    └── web/               # Individual scrapers
        ├── __init__.py
        ├── allhackathon_scraper.py    # AllHackathons.com scraper
        ├── devpost_scraper.py         # Devpost.com scraper
        └── Hack2skill_scraper.py      # Hack2Skill.com scraper

🚀 Installation

Prerequisites

Python 3.7+
Chrome/Chromium browser installed
MongoDB database (local or cloud)

Setup

Clone the repository:

git clone https://github.com/yourusername/Hack-a-Bot-scraper.git
cd Hack-a-Bot-scraper

Install dependencies:
```
pip install -r Scraper/requirements.txt
```
Install Chrome and ChromeDriver:
- Install Google Chrome:
  Download and install the latest version of Google Chrome from the official website.
- Download ChromeDriver:
  Download the ChromeDriver version that matches your installed Chrome browser from the ChromeDriver Downloads page.
- Extract and Set Up ChromeDriver:
  1. Extract the downloaded chromedriver.exe to a folder, e.g., C:\chromedriver.
  2. Add the folder path to your Windows PATH environment variable:
    - Press Win + S, search for "Environment Variables", and open "Edit the system environment variables".
    - Click "Environment Variables".
    - Under "System variables", find and select the Path variable, then click "Edit".
    - Click "New" and enter the path to your ChromeDriver folder (e.g., C:\chromedriver).
    - Click "OK" to save and close all dialogs.
- Verify Installation:
  - Open a new Command Prompt and run:
```
chromedriver --version
```
  - You should see the installed ChromeDriver version displayed.
Note: Ensure that the ChromeDriver version matches your installed Chrome browser version for compatibility.

Environment Configuration: Create a .env file in the root directory:

MONGO_URI=mongodb://localhost:27017/HackBot
# Or for MongoDB Atlas:
# MONGO_URI=mongodb+srv://username:password@cluster.mongodb.net/HackBot

Database Setup:
- Ensure MongoDB is running (local installation) or configure MongoDB Atlas
- The application will automatically create the required database and collections

🎮 Usage

Running All Scrapers

cd Scraper
python __main__.py

Running Individual Scrapers

You can also run specific scrapers independently:

# Run Devpost scraper only
from Scraper.web import devpost_scraper
devpost_scraper.main()

# Run AllHackathons scraper only  
from Scraper.web import allhackathon_scraper
scraper = allhackathon_scraper.HackClub()
scraper.scrape_hackathons()

# Run Hack2Skill scraper only
from Scraper.web import Hack2skill_scraper
Hack2skill_scraper.main()

📊 Data Schema

Each hackathon entry stored in the database contains:

{
    'name': str,           # Hackathon name
    'url': str,            # Event URL
    'image': str,          # Event image URL
    'start': str,          # Start date
    'end': str,            # End date
    'mode': str,           # Online/Offline/Hybrid
    'location': str,       # Event location
    'website': str,        # Source website (DEVPOST/ALLHACKATHONS/HACK2SKILL)
    'new': bool,           # Flag for new entries
    'prize_amount': str,   # Prize information (optional)
    'participants': str,   # Participant count (optional)
    'tags': list,          # Event tags/categories (optional)
    'timeline': str,       # Event timeline (optional)
    'deadline': str        # Registration deadline (optional)
}

🛠️ Dependencies

selenium: Web automation and scraping
undetected-chromedriver: Anti-detection Chrome driver
pymongo: MongoDB database operations
python-dotenv: Environment variable management
dnspython: DNS resolution for MongoDB
urllib3: HTTP library for web requests

🐳 Deployment

The project includes a Procfile for easy deployment on platforms like Heroku:

# Deploy to Heroku
heroku create your-app-name
heroku config:set MONGO_URI=your_mongodb_connection_string
git push heroku main

🔧 Configuration

Chrome Driver Options

The scrapers use various Chrome options for optimal performance:

--no-sandbox: Bypass OS security model
--disable-dev-shm-usage: Overcome limited resource problems
--disable-gpu: Disable GPU hardware acceleration
--window-size=1920,1080: Set browser window size
--disable-blink-features=AutomationControlled: Hide automation indicators

Database Configuration

The database module (db.py) provides functions for:

save_hackathons(): Insert new hackathon data
get_hackathons(): Retrieve existing hackathons by website
delete_hackathons(): Remove specific hackathon entries

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Adding New Scrapers

To add a new hackathon platform scraper:

Create a new file in Scraper/web/ directory
Implement the scraping logic following the existing patterns
Add database integration using the provided db.py functions
Update __main__.py to include the new scraper
Update this README with the new platform information

📝 License

This project is licensed under MIT License, the terms are specified in the LICENSE file.

⚠️ Disclaimer

This tool is designed for educational and personal use. Please ensure you comply with the terms of service of the websites being scraped. The developers are not responsible for any misuse of this tool.

🐛 Known Issues & Troubleshooting

Chrome Driver Issues: Ensure Chrome/Chromium is installed and up to date
Database Connection: Verify MongoDB is running and connection string is correct
Rate Limiting: The scrapers include delays to respect website rate limits
Element Not Found: Some websites may change their structure over the course of time; and this scraper method may need updates

📈 Future Enhancements

Add more hackathon platforms (MLH, HackerEarth, etc.)
Implement scheduling for automatic periodic scraping
Add email notifications for new hackathons
Create a web dashboard for viewing scraped data
Implement data filtering and search capabilities
Add export functionality (CSV, JSON)

📞 Support

For questions, issues, or contributions, please open an issue on GitHub or connect with The Developer in LinkedIn in their Profile page

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
Scraper		Scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hack-a-Bot-Scraper 🤖

🎯 Features

🏗️ Project Structure

🚀 Installation

Prerequisites

Setup

🎮 Usage

Running All Scrapers

Running Individual Scrapers

📊 Data Schema

🛠️ Dependencies

🐳 Deployment

🔧 Configuration

Chrome Driver Options

Database Configuration

🤝 Contributing

Adding New Scrapers

📝 License

⚠️ Disclaimer

🐛 Known Issues & Troubleshooting

📈 Future Enhancements

📞 Support

About

Uh oh!

Releases

Packages

Languages

License

JyotirmoyDas05/Hack-a-Bot-scraper

Folders and files

Latest commit

History

Repository files navigation

Hack-a-Bot-Scraper 🤖

🎯 Features

🏗️ Project Structure

🚀 Installation

Prerequisites

Setup

🎮 Usage

Running All Scrapers

Running Individual Scrapers

📊 Data Schema

🛠️ Dependencies

🐳 Deployment

🔧 Configuration

Chrome Driver Options

Database Configuration

🤝 Contributing

Adding New Scrapers

📝 License

⚠️ Disclaimer

🐛 Known Issues & Troubleshooting

📈 Future Enhancements

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages