Getting Structured Data from Internet: Web Scraping and Rest APIs
About the Book
- 1. Introduction to web scraping: Why is web scraping essential and who uses web scraping?
- 2. Intro to web services to get structured data
- 2.1 Getting data from Twitter APIs
- 2.2 Getting stock market data from Alphavantage
- 3. Web scraping in python using Beautiful Soup library
- 3.1 Tags and structure of HTML documents
- 3.2 Cascading style sheets (CSS)
- 3.3 Building first scraper with Beautiful Soup
- 3.4 Scraping a HTML table into pandas dataframe
- 3.5 Scraping XML files from clinicaltrials.gov
- 5. Advanced Topics
- 5.1 Boilerplate text removal
- 5.2 Solving captchas
- 5.3 Extracting top keywords, and text summarization from scraped documents
- 5.4 Extracting names, entities from scraped documents
The Leanpub 45-day 100% Happiness Guarantee
Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.