Spark Tutorials with Scala
Minimum price
Suggested price

Spark Tutorials with Scala

The Beginner's Guide

About the Book

Want to learn Apache Spark with Scala?  Looking for a place to begin?

In this book, Apache Spark with Scala tutorials are presented from a wide variety of perspectives.  

The approach is hands-on with access to source code downloads and screencasts of running examples.  Get ready to learn by examples!

Who is this for?

This book is suitable for beginners with no Spark or Scala experience, but some background in programming and/or databases.  It's a beginner book, but not for people brand new to development or data engineering.  This book is designed for people to augment their existing skills to advance their career and/or make better data intensive products. 

What You’ll Learn

For just $13, you’ll gain a great real-world understanding of how to use Spark with Scala. You will also learn the following:

  • How to use Spark from Scala
  • Comparison of Spark and Hadoop
  • Core Spark constructs: Resilient Distributed Datasets, Transformations, and Actions
  • Running Two Types of Spark Clusters
  • Deploying Scala applications to Spark Clusters
  • Spark SQL with Scala including CSV, JSON, and relational databases
  • Custom, Scala based Spark Streaming application
  • Writing and running automated tests for Spark applications
  • Build a custom Spark Machine Learning application
  • Spark with Amazon S3
  • Using Cassandra from Spark

By the end of this book, you'll be confident and productive using Spark with Scala in a variety of circumstances.

Why Spark and Scala?

Using Spark from a functional and object-oriented language like Scala are changing the way "big data" applications are built and deployed.  Moreover, this is just the beginning of a paradigm shift in data engineering and data science.  

Now and in the foreseeable future, companies will compete based on their ability to process huge volumes of data and their proprietary algorithms to create competitive advantages.  But, how will this be accomplished?  Two prominent tools are Spark and Scala.  

Stay ahead of the curve and get in now.  Begin by learning Spark with Scala through tutorial examples.  

Bonus Resources: Code Samples and Screencasts

Code samples are provided in a GitHub repository to download and use for learning or within your own projects.  

Also, links to video screencasts of the author running examples and explaining tutorials are available from within the book.

If you have any questions or comments, please don't hesitate to get in touch.
  • Share this book

  • Categories

    • Scala
    • Databases
    • Software Engineering
  • Installments completed

    8 / 10

  • Feedback

    Email the Author(s)

About the Author

Todd McGrath
Todd McGrath

Todd is a software veteran of 20 years.  He spent 6 years in Silicon Valley at 3 startups in the 1990s before moving to Costa Rica to learn how to surf.  Afterward, he ran a custom software development company for over 10 years before joining another Silicon Valley VC-backed startup in 2012.  These days you can usually find Todd at one of his kid's extracurricular events, up North at the cabin, software consulting, or building courses and books.

Table of Contents

Before We Begin 7

Objectives and Expectations 7

Assumptions 7

Formatting 8

Beyond this Book 8

What, Why, How 9

What is Apache Spark?  9

Why Spark?  9

Fundamentals of Apache Spark  9

How to Be Productive with Spark?  10

Apache Spark Ecosystem Components  10

Conclusion What about Hadoop?  10

Spark RDDs A Two Minute Guide For Beginners  11

What is a Spark RDD?  11

How are Spark RDDs created?  11

Why Spark RDDs?  11

When to use Spark RDDs? 12

Apache Spark The Building Blocks 13

Overview  13

Requirements  13

Spark with Scala First Tutorial  13

Spark Context and Resilient Distributed Datasets  15

Actions and Transformations  16

Looking Ahead  17

Apache Spark: Examples Of Transformations  18

Transformations Part 1  18

Transformations Part 2  22

Transformations Part 3  23

Apache Spark: Examples Of Actions  26

Conclusion  30

Spark Clusters  31

Apache Spark Cluster Part 1: Run Standalone  31

Running a Spark Standalone Cluster  31

Spark Cluster Part 2: Deploy Scala Program To Spark Cluster  35

Requirements  35

Steps to Deploy Scala Program to Spark Cluster  35

Conclusion  37

Further Reference  37

Spark SQL with Scala  38

SQL  38

DataFrames  38

Datasets  38

Looking ahead  38

Spark SQL CSV Examples  39

Overview  39

Methodology  39

Spark SQL CSV Example Tutorial Part 1  39

Spark SQL CSV Example Tutorial Part 2  41

Spark SQL JSON Examples  43

Overview  43

Methodology  43

Spark SQL JSON Example Tutorial Part 1  43

Spark SQL JSON Example Tutorial Part 2  44

Spark SQL MySQL Example With JDBC  47

Overview  47

Requirements  47

Quick Setup  47

Methodology  48

Spark SQL with MySQL (JDBC) Example Tutorial  48

Conclusion Spark SQL with MySQL (JDBC)  49

Spark Streaming with Scala   50

DStreams  50

Architecture and Abstraction  50

Transformations  50

Input Sources  51

Checkpointing  51

Streaming Processing Guarantees  51

Streaming UI  51

Performance Considerations  51

Spark Streaming With Scala  52

Overview  52

Steps  52

Making and Running Our Own NetworkWordCount  52

Steps  52

Spark Streaming With Scala Part 1 Conclusion  53

Spark Streaming – Let’s Stream From Slack  54

Spark Streaming Example Overview  54

Resources  61

Spark Streaming Automated Testing With Scala  63

Pre-requisites  63

Overview  63

Steps  63

Conclusion  69

Additional Resources  69

Spark Machine Learning  70

Overview  70

Apache Spark Machine Learning Example With Scala  70

Apache Spark Machine Learning Example  71

Apache Spark Machine Learning Scala Source Code Review  71

Resources  75

Special Recipes  76

Spark With Amazon S3  77

Apache Spark with Amazon S3 Examples  77

Example Load Text File from S3 Written from Hadoop Library  78

S3 from Spark Text File Interoperability  79

References  79

Apache Spark, Cassandra And Game Of Thrones  80

Overview  80

Requirements  80

Steps  80

Conclusion  86

Spark Cassandra Tutorial Resources  86

Looking Ahead and Thanks Again!  87

The Leanpub 60-day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $12 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub