Email the Author

You can use this page to email Jay M. Patel about Getting Structured Data from Internet: Web Scraping and Rest APIs.

Please include an email address so the author can respond to your query

This message will be sent to Jay M. Patel

This site is protected by reCAPTCHA and the Google  Privacy Policy and  Terms of Service apply.

About the Book

Note: this book is now available for ordering at Apress with lots of extra content, and titled "Getting structured data from internet: Running Web Crawlers/Scrapers on a Big Data Production Scale"

This book will teach you web scraping to quickly get unlimited amounts of free data available on the web in structured format. You'll learn Python scripts to not only to access free APIs to get structured data from websites such as Twitter, but you'll also learn to scrape data from any HTML and Javascript page and convert that into Excel, CSV or SQL database of your choice. We will go beyond the basics of web scraping, and cover advanced topics such as natural language processing and text analytics to extract out top keywords, text summary, names of people, places, email addresses and contact details etc. from a page. All the code used in the book will be available to help you understand the concepts in practice and write your own web scraper.


About the Author

Jay M. Patel’s avatar Jay M. Patel

My name is Jay M. Patel and I am a fulltime freelance software developer and data scientist specializing in data mining, web crawling/scraping, natural language processing (NLP) projects. Please check out my consulting page for details on how to hire me for your project.

I worked at US Environmental Protection Agency (US EPA) for about five years before quitting in 2018 to do consulting fulltime and bootstrap my startup, Specrom Analytics, which applies AI algorithms for marketing, social listening and creating alternative financial datasets.

In my time at US EPA, I designed text mining and NLP algorithms to extract useful insights from hundreds of thousands of documents which were parts of regulatory filings from companies. I also led one of the first research teams within the agency to use Apache Spark based workflows for traditional cheminformatics applications such as chemical similarities and quantitative structure activity relationships. We also developed recurrent neural networks and more advanced LSTM models in Tensorflow for chemical SMILES generation. Please check out my Google Scholar for a full list of all my research papers and presentations.

I graduated with Bachelors in chemical engineering from UDCT, India and M.S. in computational chemistry from University of Georgia, Athens, GA, USA. Check out my CV for more information.

My blog posts here will be focused on digital marketing, alternative financial datasets, my current work, data science, and my experiences as a startup founder. I also have couple of book projects in the works and one published book, please check it out here for more info.

In my free time, I also volunteer in Dangs district in India to assist tribal community in building homes, getting clean water and sanitation.

Connect with me on Linkedin, Github or email me at jay@jaympatel.com for any questions.

Logo white 96 67 2x

Publish Early, Publish Often

  • Path
  • There are many paths, but the one you're on right now on Leanpub is:
  • Getting-structured-data-from-internet-web-scraping-and-rest-apis › Email Author › New
    • READERS
    • Newsletters
    • Weekly Sale
    • Monthly Sale
    • Store
    • Home
    • Redeem a Token
    • Search
    • Support
    • Leanpub FAQ
    • Leanpub Author FAQ
    • Search our Help Center
    • How to Contact Us
    • FRONTMATTER PODCAST
    • Featured Episode
    • Episode List
    • MEMBERSHIPS
    • Reader Memberships
    • Department Reader Memberships
    • Author Memberships
    • Your Membership
    • COMPANY
    • About
    • About Leanpub
    • Blog
    • Contact
    • Press
    • Essays
    • AI Services
    • Imagine a world...
    • Manifesto
    • More
    • Partner Program
    • Causes
    • Accessibility
    • AUTHORS
    • Write and Publish on Leanpub
    • Create a Book
    • Create a Bundle
    • Create a Course
    • Create a Track
    • Testimonials
    • Why Leanpub
    • Services
    • TranslateAI
    • TranslateWord
    • TranslateEPUB
    • PublishWord
    • Publish on Amazon
    • CourseAI
    • GlobalAuthor
    • Marketing Packages
    • IndexAI
    • Author Newsletter
    • The Leanpub Author Update
    • Author Support
    • Author Help Center
    • Leanpub Authors Forum
    • The Leanpub Manual
    • Supported Languages
    • The LFM Manual
    • Markua Manual
    • API Docs
    • Organizations
    • Learn More
    • Sign Up
    • LEGAL
    • Terms of Service
    • Copyright Policy
    • Privacy Policy
    • Refund Policy

*   *   *

Leanpub is copyright © 2010-2025 Ruboss Technology Corp.
All rights reserved.

This site is protected by reCAPTCHA
and the Google  Privacy Policy and  Terms of Service apply.

Leanpub requires cookies in order to provide you the best experience. Dismiss