Email the Author

You can use this page to email Shoaib Burq and Dr. Kashif Rasul about Apache Spark & Geodata.

Please include an email address so the author can respond to your query

This message will be sent to Shoaib Burq and Dr. Kashif Rasul

This site is protected by reCAPTCHA and the Google  Privacy Policy and  Terms of Service apply.

About the Book

In this book I will very quickly introduce you to the Apache Spark stack and then get into the meat of performing a full featured geospatial analysis. Using OpenStreetMap data as our base, our end goal will be to find the most cultural city in Western Europe!

That's right! In this book we will develop our own Cultural Weight Algorithm (TM) :) and apply it to a set of major cities in Europe. The data will be analyzed using Apache Spark and in the process we will learn the following phases of Big Data projects:

  • Consuming: Retrieving raw data from REST API's (OpenStreetMap).
  • Preparation: Data exploration and schema creation of geospatial data
  • Summarize: We will query data by Location. Perform Spatial Operations such as finding Overlapping geospatial features, do joins by location, also known as Spatial Joins and finally obtain location based summary statistics to arrive at our answer regarding the cultural capital of Europe.

Here's a summary of the book. I hope you will join us on this journey of exploring one of the most exciting technology stacks to come out of the good folks at the UC Berkeley.

Why Spark?

Spark has quickly overtaken Hadoop as the front runner in big data analysis technologies. There are a number of reasons for this such as its support for developer friendly interactive mode, it's polyglot interface in Scala, Java, Python, and R, and the full stack of Algorithmic libraries that such language ecosystems offer.

Out of the box, Spark includes a powerful set of tools: such as the ability to write SQL queries, perform streaming analytics, run machine learning algorithms, and even tackle graph-parallel computations but what really stands out is its usability.

With it's interactive shells (in both Scala and Python) it makes prototyping big data applications a breeze.

Why PySpark?

PySpark provides integrated API bindings around Spark and enables full usage of the Python ecosystem within all the nodes of the Spark cluster with the pickle Python serialization and, more importantly, supplies access to the rich ecosystem of Python’s machine learning libraries such as Scikit-Learn or data processing such as Pandas.

Throughout this book I am going to use a Docker Container with the relevant libraries. Don't worry if you don't know Docker, I walk you through setting up and running Docker too


About the Authors

Shoaib Burq’s avatar Shoaib Burq

@sabman

I am a Geospatial Applications Developer and have worked on projects ranging from Developing Geocoders for Australian emergency response agencies to Underwater mapping for Marine Exploration. My last startup was a geospatial database as a service with an easy to use API for developers to build mobile and geolocation apps.

Dr. Kashif Rasul’s avatar Dr. Kashif Rasul

@krasul

I have a PhD. in Mathematics from the Freie Universität Berlin and in parallel to this I have been working as a software developer in the area of location based services and geospatial web application development. Together with Shoaib, we were among the first developers to use Ruby/Rails with the open source geospatial stack at the time, and we have extensive experience in this area. I have also worked on PostGIS, the geospatial extension to PostgreSQL and developed APIs from it to be consumed by mobile applications for geo-fencing and real-time triggers. We have also talked and presented in depth about developing geospatial web applications at conferences like RailsConf, FOSS4G and other local meetups.

You can follow me on Github: kashif or Twitter: @krasul

Zaeem Burq’s avatar Zaeem Burq

Logo white 96 67 2x

Publish Early, Publish Often

  • Path
  • There are many paths, but the one you're on right now on Leanpub is:
  • Big-geodata-analysis-with-apache-spark › Email Author › New
    • READERS
    • Newsletters
    • Weekly Sale
    • Monthly Sale
    • Store
    • Home
    • Redeem a Token
    • Search
    • Support
    • Leanpub FAQ
    • Leanpub Author FAQ
    • Search our Help Center
    • How to Contact Us
    • FRONTMATTER PODCAST
    • Featured Episode
    • Episode List
    • MEMBERSHIPS
    • Reader Memberships
    • Department Reader Memberships
    • Author Memberships
    • Your Membership
    • COMPANY
    • About
    • About Leanpub
    • Blog
    • Contact
    • Press
    • Essays
    • AI Services
    • Imagine a world...
    • Manifesto
    • More
    • Partner Program
    • Causes
    • Accessibility
    • AUTHORS
    • Write and Publish on Leanpub
    • Create a Book
    • Create a Bundle
    • Create a Course
    • Create a Track
    • Testimonials
    • Why Leanpub
    • Services
    • TranslateAI
    • TranslateWord
    • TranslateEPUB
    • PublishWord
    • Publish on Amazon
    • CourseAI
    • GlobalAuthor
    • Marketing Packages
    • IndexAI
    • Author Newsletter
    • The Leanpub Author Update
    • Author Support
    • Author Help Center
    • Leanpub Authors Forum
    • The Leanpub Manual
    • Supported Languages
    • The LFM Manual
    • Markua Manual
    • API Docs
    • Organizations
    • Learn More
    • Sign Up
    • LEGAL
    • Terms of Service
    • Copyright Policy
    • Privacy Policy
    • Refund Policy

*   *   *

Leanpub is copyright © 2010-2025 Ruboss Technology Corp.
All rights reserved.

This site is protected by reCAPTCHA
and the Google  Privacy Policy and  Terms of Service apply.

Leanpub requires cookies in order to provide you the best experience. Dismiss