Challenge 6: Bypass IP Filter with Scrapoxy
Goal
Like in the pagination challenge, each page has a few persons. But you cannot retrieve all pages from one IP.
Use Scrapoxy to bypass the IP restriction.
Start
git checkout scrapoxy
Instructions
  1. Install Scrapoxy with the Quick Start
  2. Use AWS provider with your credential on eu-west-1 region (please create a your own account)
  3. Start 2 instances maximum in Scrapoxy
  4. Follow the tutorial: Integrate Scrapoxy to Scrapy
  5. Start the scraper (see Start instructions) and check the log item_scraped_count.
Soluce
git checkout .
git checkout scrapoxy-soluce

Persons 0 - 2 / 100

Name
Mr Aaron Willer
Birth year
Death year
1912
Gender
M
Marital status
Spouse
Ticket class
3
Ticket number
3410
Ticket price
8.14
Residence
Job
Companions count
0
Cabin
Embarked in
Cherbourg
Destination
Chicago Illinois United States
Died in the Titanic
Yes
Body recovered
No
Rescue boat number
Name
Mr Albert Augustsson
Birth year
1889
Death year
1912
Gender
M
Marital status
Spouse
Ticket class
3
Ticket number
347468
Ticket price
7.17
Residence
Krakoryd Småland Sweden
Job
General Labourer
Companions count
0
Cabin
Embarked in
Southampton
Destination
Bloomington Indiana United States
Died in the Titanic
Yes
Body recovered
No
Rescue boat number