Using Scrapy

Abstract

After a lengthy introduction to Beautiful Soup and custom scrapers, it’s time to look at Scrapy: the website scraping tool for Python.

This is a preview of subscription content, log in via an institution to check access.

Tax calculation will be finalised at checkout

Purchases are for personal use only

1.
https://docs.scrapy.org/en/latest/intro/install.html#intro-install
2.
www.w3.org/TR/selectors/
3.
www.w3.org/TR/xpath/all/
4.
Once our client was banned from StackOverflow (SO) for too many requests in a minute. Around 100 software developers have had a hard time without SO.
5.
https://github.com/scrapy/scrapy/pull/3039
6.
In the current version of Python, the dictionaries are ordered by their key per default. This means every time you run your spider on the same 3.6 CPython implementation, the order of the columns will stay the same.
7.
https://en.wikipedia.org/wiki/Dbm
8.
https://github.com/google/leveldb

Authors

Gábor László Hajba
View author publications
You can also search for this author in PubMed Google Scholar

Hajba, G.L. (2018). Using Scrapy. In: Website Scraping with Python. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-3925-4_4

DOI: https://doi.org/10.1007/978-1-4842-3925-4_4
Published: 15 September 2018
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-3924-7
Online ISBN: 978-1-4842-3925-4
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)