About the Book
Scraping - getting a computer to capture information from online sources - is one of the most powerful techniques for data-savvy journalists who want to get to the story first, or find exclusives that no one else has spotted. Faster than FOI and more detailed than advanced search techniques, scraping also allows you to grab data that organisations would rather you didn’t have - and put it into a form that allows you to get answers.
Scraping for Journalists introduces you to a range of scraping techniques - from very simple scraping techniques which are no more complicated than a spreadsheet formula, to more complex challenges such as scraping databases or hundreds of documents. At every stage you'll see results - but you'll also be building towards more ambitious and powerful tools.
You’ll be scraping within 5 minutes of reading the first chapter - but more importantly you'll be learning key principles and techniques for dealing with scraping problems.
Unlike general books about programming languages, everything in this book has a direct application for journalism, and each principle of programming is related to their application in scraping for newsgathering. And unlike standalone guides and blog posts that cover particular tools or techniques, this book aims to give you skills that you can apply in new situations and with new tools.
About the Author
Paul Bradshaw runs the MA in Online Journalism at Birmingham City University, and is a Visiting Professor at City University’s School of Journalism in London. He publishes the Online Journalism Blog, and is the founder of investigative journalism website HelpMeInvestigate. He has written for journalism.co.uk, Press Gazette, the Guardian and Telegraph’s data blogs, InPublishing, Nieman Reports and the Poynter Institute in the US. He is the co-author of the Online Journalism Handbook with former Financial Times web editor Liisa Rohumaa, and of Magazine Editing (3rd Edition) with John Morrish. Other books which Bradshaw has contributed to include Investigative Journalism (second edition), Web Journalism: A New Form of Citizenship; and Citizen Journalism: Global Perspectives.
Bradshaw has been listed in Journalism.co.uk’s list of the leading innovators in journalism and media and Poynter’s most influential people in social media. In 2010, he was shortlisted for Multimedia Publisher of the Year.
In addition to teaching and writing, Paul acts as a consultant and trainer to a number of organisations on social media and data journalism. You can find him on Twitter @paulbradshaw