You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*This repository is a companion to the article [Advanced Web Scraping: Bypassing captcha, "403 Forbidden," and more](http://sangaline.com/post/advanced-web-scraping).
4
+
Please refer to the article for further details.*
5
+
6
+
This is a [scrapy](https://scrapy.org/) web scraper for the fictional Zipru torrent site.
7
+
It is designed to bypass four distinct anti-scraping mechanisms:
8
+
9
+
1. User agent filtering.
10
+
2. Obfuscated javascript redirects.
11
+
3. Captchas.
12
+
4. Header consistency checks.
13
+
14
+
The scraper is not actually functional because Zipru is not a real site.
15
+
The code, however, is otherwise complete and can easily be adapted to work on other sites.
0 commit comments