Skip to content

Commit 930a12a

Browse files
committed
Add a README.
1 parent 297ed4a commit 930a12a

File tree

1 file changed

+15
-0
lines changed

1 file changed

+15
-0
lines changed

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Advanced Web Scraping Tutorial Project
2+
3+
*This repository is a companion to the article [Advanced Web Scraping: Bypassing captcha, "403 Forbidden," and more](http://sangaline.com/post/advanced-web-scraping).
4+
Please refer to the article for further details.*
5+
6+
This is a [scrapy](https://scrapy.org/) web scraper for the fictional Zipru torrent site.
7+
It is designed to bypass four distinct anti-scraping mechanisms:
8+
9+
1. User agent filtering.
10+
2. Obfuscated javascript redirects.
11+
3. Captchas.
12+
4. Header consistency checks.
13+
14+
The scraper is not actually functional because Zipru is not a real site.
15+
The code, however, is otherwise complete and can easily be adapted to work on other sites.

0 commit comments

Comments
 (0)