Scraping - getting a computer to capture information from online sources - is one of the most powerful techniques for data-savvy journalists who want to get to the story first, or find exclusives that no one else has spotted. Faster than FOI and more detailed than advanced search techniques, scraping also allows you to grab data that organisations would rather you didn’t have - and put it into a form that allows you to get answers.
Scraping for Journalists introduces you to a range of scraping techniques - from very simple scraping techniques which are no more complicated than a spreadsheet formula, to more complex challenges such as scraping databases or hundreds of documents. At every stage you'll see results - but you'll also be building towards more ambitious and powerful tools.
You’ll be scraping within 5 minutes of reading the first chapter - but more importantly you'll be learning key principles and techniques for dealing with scraping problems.
Unlike general books about programming languages, everything in this book has a direct application for journalism, and each principle of programming is related to their application in scraping for newsgathering. And unlike standalone guides and blog posts that cover particular tools or techniques, this book aims to give you skills that you can apply in new situations and with new tools.
Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks. We process the refunds manually, so they may take a few days to show up. See full terms.
If you buy a Leanpub book you get all the updates to the book for free! All books are available in PDF, EPUB (for iPad) and MOBI (for Kindle). There is no DRM. There is no risk, just guaranteed happiness or your money back.
Re: Python’s regex library