-
Snap Packages:
Selenium does not seem to support Snap packages. Therefore, do not use the Ubuntu distribution in combination with this project – Mint and Debian have proven to work well. -
Firefox via Flatpak:
Using Firefox via Flatpak has not been tested with this project. -
Google Chrome:
Chrome generally works with Selenium, but its integration may be less stable. -
Twitter Login:
To use Twitter lists, you must be logged in. Lists are the only reliably chronological views. The best approach is to log in to Twitter with a new profile in Firefox and then copy this profile to the target server (including Raspberry Pi, etc.). Adjust the profile path accordingly in the script. You can find your Firefox profile by navigating toabout:profiles. The bot can also work without login, but in that case, the page must be publicly accessible without authentication.
This project enables Twitter data crawling without using the official Twitter API. All retrieved tweets are automatically forwarded through two modules:
-
Telegram Bot:
With optional filtering (e.g., by specific keywords, lines, or locations). -
Mastodon Bot:
Simple forwarding of tweets to Mastodon.
Additionally, there is a Control Bot for Telegram, allowing you to manage chat IDs, filter terms, and control the bot.
- Python (including pip)
- The following Python modules (installable via pip):
- See requirements.txt
Ensure Python and pip are installed. Then, install the required modules:
pip install -r requirements.txtIf you want to crawl Twitter data without logging in, make the following changes in the twitter_bot.py file:
- Comment out:
# firefox_profile = webdriver.FirefoxProfile(firefox_profile_path) # firefox_options.profile = firefox_profile
- In the
def main()function:- Uncomment:
driver = webdriver.Firefox(options=firefox_options)
- Comment out:
# driver = webdriver.Firefox(options=firefox_options, firefox_profile=firefox_profile_path)
- Uncomment:
- Additionally:
Comment out thedelete_temp_files()function, as it is probably not needed in this mode.
- Adjust the
firefox_profile_pathvalue intwitter_bot.pyto access protected or personalized pages. - You can find your profile name under
about:profilesin Firefox.
- Add Twitter Pages:
Enter the Twitter page you want to capture tweets from intwitter_bot.py. - Disable Unnecessary Modules:
Comment out the calls to the Telegram or Mastodon bots indef main()if you do not need them:# await telegram_bot.main(new_tweets) # mastodon_bot.main(new_tweets)
-
Telegram:
Get your API keys via BotFather and enter them into the respective files. -
Mastodon:
The API key can be found in your instance's settings (under Development). Make sure the required permissions are granted – if changes are made, the API key must be regenerated. Also, specify your instance in the script. The Gemini API is used to generate free alt texts for images. -
Gemini API (For Testing Purposes):
Add your Gemini API key to your~/.bashrc. Open the file with:nano ~/.bashrcand add the line:
export GOOGLE_API_KEY="YOURAPIKEY"
You can get a free Gemini API key here: Gemini API Key.
Run the bot in the appropriate directory for testing:
python twitter_bot.py-
Note:
Selenium usually tries to install the correct Geckodriver for Firefox automatically. If this does not work, download the Geckodriver manually:- x64 & ARM: Geckodriver Releases
Extract Geckodriver and copy it to the system directory:
sudo cp geckodriver /usr/local/bin/geckodriver
If using the Telegram bot, add your API key to telegram_controll_bot.py.
It is recommended to use an absolute path instead of DATA_FILE = 'data.json' – do not forget to apply this change in telegram_bot.py as well.
To run the bot continuously in the background, set it up as a system service:
- Create a service file:
sudo nano /etc/systemd/system/twitter_bot.service
- Add the following content, adjusting
YOURUSERandYOURAPIKEY:[Unit] Description=twitter_bot After=network.target [Service] Environment="GEMINI_API_KEY=YOURAPIKEY" WorkingDirectory=/home/YOURUSER/bots ExecStart=/home/YOURUSER/bots/venv/bin/python3 /home/YOURUSER/bots/twitter_bot.py Restart=always RestartSec=10 User=YOURUSER Group=YOURUSER [Install] WantedBy=multi-user.target
- Reload system services:
sudo systemctl daemon-reload
- Start and enable the service:
sudo systemctl start twitter_bot.service sudo systemctl enable twitter_bot.service - Set up
telegram_controll_botsimilarly.
Congratulations – the bot should now be running successfully!
Special thanks to shaikhsajid1111. This project helped me understand how to use CSS selectors to extract tweets. It is particularly useful for beginners who want to crawl profiles, even though chronological sorting is often no longer available. My approach using Twitter lists offers more flexibility.
Best of luck using the Selenium Twitter Webcrawler!