Mastering PySpark: Spark RDDs vs DataFrames vs SparkSQL
Mastering PySpark: Spark RDDs vs DataFrames vs SparkSQL
Minimum price
Suggested price
Mastering PySpark: Spark RDDs vs DataFrames vs SparkSQL

Last updated on 2018-02-02

About the Book

This book shows how to solve various use cases by using PySpark, Spark Python API that exposes the Spark programming model to Python. It shows how to use Resilient Distributed Datasets (RDDs), DataFrames and SparkSQL to answer the same kind of questions.

About the Author


Fisseha is a data scientist who loves continuous learning. He enjoys challenging and complex data analysis, data mining, machine learning and data visualization tasks. Fisseha holds a PhD in atmospheric Physics from Johns Hopkins University.

The Leanpub 45-day 100% Happiness Guarantee

Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Write and Publish on Leanpub

Authors, publishers and universities use Leanpub to publish amazing in-progress and completed books and courses, just like this one. You can use Leanpub to write, publish and sell your book or course as well! Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks. Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. It really is that easy.

Learn more about writing on Leanpub