Getting Structured Data from Internet: Web Scraping and Rest APIs
Getting Structured Data from Internet: Web Scraping and Rest APIs
About the Book
Note: this book is now available for ordering at Apress with lots of extra content, and titled "Getting structured data from internet: Running Web Crawlers/Scrapers on a Big Data Production Scale"
This book will teach you web scraping to quickly get unlimited amounts of free data available on the web in structured format. You'll learn Python scripts to not only to access free APIs to get structured data from websites such as Twitter, but you'll also learn to scrape data from any HTML and Javascript page and convert that into Excel, CSV or SQL database of your choice. We will go beyond the basics of web scraping, and cover advanced topics such as natural language processing and text analytics to extract out top keywords, text summary, names of people, places, email addresses and contact details etc. from a page. All the code used in the book will be available to help you understand the concepts in practice and write your own web scraper.
Table of Contents
- 1. Introduction to web scraping: Why is web scraping essential and who uses web scraping?
- 2. Intro to web services to get structured data
- 2.1 Getting data from Twitter APIs
- 2.2 Getting stock market data from Alphavantage
- 3. Web scraping in python using Beautiful Soup library
- 3.1 Tags and structure of HTML documents
- 3.2 Cascading style sheets (CSS)
- 3.3 Building first scraper with Beautiful Soup
- 3.4 Scraping a HTML table into pandas dataframe
- 3.5 Scraping XML files from clinicaltrials.gov
- 4. Using selenium to scrape from Javascript
- 5. Advanced Topics
- 5.1 Boilerplate text removal
- 5.2 Solving captchas
- 5.3 Extracting top keywords, and text summarization from scraped documents
- 5.4 Extracting names, entities from scraped documents
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
80% Royalties. Earn $16 on a $20 book.
We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them