Email the Author
You can use this page to email Gábor László Hajba about Website Scraping with Python.
About the Book
New version by Apress
In 2018 I teamed-up with Apress and we released an updated version of this book. You can find it on Amazon: https://amzn.to/2Dkl4gI
This book is the follow-up of my previous one: "XML processing and website scraping in Java". There I looked at ways and tools to process XML and HTML in Java, did some performace comparisons and introduced some new programming concepts to make things even better.
In this book I take a closer look at website scraping with the two tools used nowadays: BeautifulSoup and Scrapy.
I create the sample application from the Java book -- now in Python, use the two tools for parsing, show examples how to export CSV files in Python.
As a bonus I will compare the two tools for their runtime, try to tweak where possible and I will give a quick introduction on plotting the runtimes as charts.
Until it is finished, you can buy the book for a discounted price. The final book will be around $35.
I will write about the following topics in this book:
- BeautifulSoup
- Scrapy
- Performance comparison
- Plotting in Python
- Functional programming with Python
- Parallel code execution with Python
- Sample application to gather Amazon data
- Other real-life projects (source code coming soon into the package)
- Update for Scrapy's release and Python 3 (coming soon)
About the Author
Gábor László Hajba is a Senior Consultant at EBCONT enterprise technologies in Vienna, Austria, with the core competence of Java and Python. He is responsible for designing and developing solutions for customer needs in the enterprise software world.
In 2018, Apress released his book Website Scraping with Python -- Using BeautifulSoup and Scrapy, which started here, as a LeanPub book back in 2014.
In his free time, he's a husband, the father of a lovely little girl, and an aspiring bass player.