A comprehensive web scraping tool designed to automatically collect hackathon information from multiple popular hackathon platforms. This tool helps developers, students, and hackathon enthusiasts stay updated with the latest hackathon opportunities by scraping and storing event details in a MongoDB database.
-
Multi-Platform Support: Scrapes hackathons from multiple popular platforms:
- Devpost: One of the largest hackathon hosting platforms
- AllHackathons: Comprehensive hackathon listing website
- Hack2Skill: Platform for skill-based hackathons and competitions
-
Intelligent Data Extraction: Extracts comprehensive hackathon details including:
- Event name and description
- Start and end dates
- Registration deadlines
- Event mode (Online/Offline/Hybrid)
- Location information
- Prize amounts and participant counts
- Event images and URLs
- Tags and categories
- Timeline information
-
Database Integration: Stores all scraped data in MongoDB with duplicate prevention
-
Anti-Bot Protection: Uses undetected Chrome driver to bypass anti-bot measures
-
Error Handling: Robust error handling and logging for reliable operation
-
Modular Design: Each scraper is independently developed and can be run separately
Hack-a-Bot-scraper/
โโโ LICENSE
โโโ README.md
โโโ Scraper/
โโโ __init__.py
โโโ __main__.py # Main entry point
โโโ db.py # Database operations
โโโ Procfile # Heroku deployment configuration
โโโ requirements.txt # Python dependencies
โโโ web/ # Individual scrapers
โโโ __init__.py
โโโ allhackathon_scraper.py # AllHackathons.com scraper
โโโ devpost_scraper.py # Devpost.com scraper
โโโ Hack2skill_scraper.py # Hack2Skill.com scraper
- Python 3.7+
- Chrome/Chromium browser installed
- MongoDB database (local or cloud)
-
Clone the repository:
git clone https://github.com/yourusername/Hack-a-Bot-scraper.git cd Hack-a-Bot-scraper -
Install dependencies:
pip install -r Scraper/requirements.txt
-
Install Chrome and ChromeDriver:
-
Install Google Chrome:
Download and install the latest version of Google Chrome from the official website. -
Download ChromeDriver:
Download the ChromeDriver version that matches your installed Chrome browser from the ChromeDriver Downloads page. -
Extract and Set Up ChromeDriver:
- Extract the downloaded
chromedriver.exeto a folder, e.g.,C:\chromedriver. - Add the folder path to your Windows
PATHenvironment variable:- Press
Win + S, search for "Environment Variables", and open "Edit the system environment variables". - Click "Environment Variables".
- Under "System variables", find and select the
Pathvariable, then click "Edit". - Click "New" and enter the path to your ChromeDriver folder (e.g.,
C:\chromedriver). - Click "OK" to save and close all dialogs.
- Press
- Extract the downloaded
-
Verify Installation:
- Open a new Command Prompt and run:
chromedriver --version
- You should see the installed ChromeDriver version displayed.
- Open a new Command Prompt and run:
Note: Ensure that the ChromeDriver version matches your installed Chrome browser version for compatibility.
-
-
Environment Configuration: Create a
.envfile in the root directory:MONGO_URI=mongodb://localhost:27017/HackBot # Or for MongoDB Atlas: # MONGO_URI=mongodb+srv://username:password@cluster.mongodb.net/HackBot
-
Database Setup:
- Ensure MongoDB is running (local installation) or configure MongoDB Atlas
- The application will automatically create the required database and collections
cd Scraper
python __main__.pyYou can also run specific scrapers independently:
# Run Devpost scraper only
from Scraper.web import devpost_scraper
devpost_scraper.main()
# Run AllHackathons scraper only
from Scraper.web import allhackathon_scraper
scraper = allhackathon_scraper.HackClub()
scraper.scrape_hackathons()
# Run Hack2Skill scraper only
from Scraper.web import Hack2skill_scraper
Hack2skill_scraper.main()Each hackathon entry stored in the database contains:
{
'name': str, # Hackathon name
'url': str, # Event URL
'image': str, # Event image URL
'start': str, # Start date
'end': str, # End date
'mode': str, # Online/Offline/Hybrid
'location': str, # Event location
'website': str, # Source website (DEVPOST/ALLHACKATHONS/HACK2SKILL)
'new': bool, # Flag for new entries
'prize_amount': str, # Prize information (optional)
'participants': str, # Participant count (optional)
'tags': list, # Event tags/categories (optional)
'timeline': str, # Event timeline (optional)
'deadline': str # Registration deadline (optional)
}- selenium: Web automation and scraping
- undetected-chromedriver: Anti-detection Chrome driver
- pymongo: MongoDB database operations
- python-dotenv: Environment variable management
- dnspython: DNS resolution for MongoDB
- urllib3: HTTP library for web requests
The project includes a Procfile for easy deployment on platforms like Heroku:
# Deploy to Heroku
heroku create your-app-name
heroku config:set MONGO_URI=your_mongodb_connection_string
git push heroku mainThe scrapers use various Chrome options for optimal performance:
--no-sandbox: Bypass OS security model--disable-dev-shm-usage: Overcome limited resource problems--disable-gpu: Disable GPU hardware acceleration--window-size=1920,1080: Set browser window size--disable-blink-features=AutomationControlled: Hide automation indicators
The database module (db.py) provides functions for:
save_hackathons(): Insert new hackathon dataget_hackathons(): Retrieve existing hackathons by websitedelete_hackathons(): Remove specific hackathon entries
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
To add a new hackathon platform scraper:
- Create a new file in
Scraper/web/directory - Implement the scraping logic following the existing patterns
- Add database integration using the provided
db.pyfunctions - Update
__main__.pyto include the new scraper - Update this README with the new platform information
This project is licensed under MIT License, the terms are specified in the LICENSE file.
This tool is designed for educational and personal use. Please ensure you comply with the terms of service of the websites being scraped. The developers are not responsible for any misuse of this tool.
- Chrome Driver Issues: Ensure Chrome/Chromium is installed and up to date
- Database Connection: Verify MongoDB is running and connection string is correct
- Rate Limiting: The scrapers include delays to respect website rate limits
- Element Not Found: Some websites may change their structure over the course of time; and this scraper method may need updates
- Add more hackathon platforms (MLH, HackerEarth, etc.)
- Implement scheduling for automatic periodic scraping
- Add email notifications for new hackathons
- Create a web dashboard for viewing scraped data
- Implement data filtering and search capabilities
- Add export functionality (CSV, JSON)
For questions, issues, or contributions, please open an issue on GitHub or connect with The Developer in LinkedIn in their Profile page