This book shows how to solve various use cases by using PySpark, Spark Python API that exposes the Spark programming model to Python. It shows how to use Resilient Distributed Datasets (RDDs), DataFrames and SparkSQL to answer the same kind of questions.
Fisseha is a data scientist who loves continuous learning. He enjoys challenging and complex data analysis, data mining, machine learning and data visualization tasks. Fisseha holds a PhD in atmospheric Physics from Johns Hopkins University.
Leanpub requires cookies in order to provide you the best experience.