Spark Tutorials with Scala
Spark Tutorials with Scala
The Beginner's Guide
About the Book
Want to learn Apache Spark with Scala? Looking for a place to begin?
In this book, Apache Spark with Scala tutorials are presented from a wide variety of perspectives.
The approach is hands-on with access to source code downloads and screencasts of running examples. Get ready to learn by examples!
Who is this for?
This book is suitable for beginners with no Spark or Scala experience, but some background in programming and/or databases. It's a beginner book, but not for people brand new to development or data engineering. This book is designed for people to augment their existing skills to advance their career and/or make better data intensive products.
What You’ll Learn
For just $13, you’ll gain a great real-world understanding of how to use Spark with Scala. You will also learn the following:
- How to use Spark from Scala
- Comparison of Spark and Hadoop
- Core Spark constructs: Resilient Distributed Datasets, Transformations, and Actions
- Running Two Types of Spark Clusters
- Deploying Scala applications to Spark Clusters
- Spark SQL with Scala including CSV, JSON, and relational databases
- Custom, Scala based Spark Streaming application
- Writing and running automated tests for Spark applications
- Build a custom Spark Machine Learning application
- Spark with Amazon S3
- Using Cassandra from Spark
By the end of this book, you'll be confident and productive using Spark with Scala in a variety of circumstances.
Why Spark and Scala?
Using Spark from a functional and object-oriented language like Scala are changing the way "big data" applications are built and deployed. Moreover, this is just the beginning of a paradigm shift in data engineering and data science.
Now and in the foreseeable future, companies will compete based on their ability to process huge volumes of data and their proprietary algorithms to create competitive advantages. But, how will this be accomplished? Two prominent tools are Spark and Scala.
Stay ahead of the curve and get in now. Begin by learning Spark with Scala through tutorial examples.
Bonus Resources: Code Samples and Screencasts
Code samples are provided in a GitHub repository to download and use for learning or within your own projects.
Also, links to video screencasts of the author running examples and explaining tutorials are available from within the book.
If you have any questions or comments, please don't hesitate to get in touch.Table of Contents
Before We Begin 7
Objectives and Expectations 7
Assumptions 7
Formatting 8
Beyond this Book 8
What, Why, How 9
What is Apache Spark? 9
Why Spark? 9
Fundamentals of Apache Spark 9
How to Be Productive with Spark? 10
Apache Spark Ecosystem Components 10
Conclusion What about Hadoop? 10
Spark RDDs A Two Minute Guide For Beginners 11
What is a Spark RDD? 11
How are Spark RDDs created? 11
Why Spark RDDs? 11
When to use Spark RDDs? 12
Apache Spark The Building Blocks 13
Overview 13
Requirements 13
Spark with Scala First Tutorial 13
Spark Context and Resilient Distributed Datasets 15
Actions and Transformations 16
Looking Ahead 17
Apache Spark: Examples Of Transformations 18
Transformations Part 1 18
Transformations Part 2 22
Transformations Part 3 23
Apache Spark: Examples Of Actions 26
Conclusion 30
Spark Clusters 31
Apache Spark Cluster Part 1: Run Standalone 31
Running a Spark Standalone Cluster 31
Spark Cluster Part 2: Deploy Scala Program To Spark Cluster 35
Requirements 35
Steps to Deploy Scala Program to Spark Cluster 35
Conclusion 37
Further Reference 37
Spark SQL with Scala 38
SQL 38
DataFrames 38
Datasets 38
Looking ahead 38
Spark SQL CSV Examples 39
Overview 39
Methodology 39
Spark SQL CSV Example Tutorial Part 1 39
Spark SQL CSV Example Tutorial Part 2 41
Spark SQL JSON Examples 43
Overview 43
Methodology 43
Spark SQL JSON Example Tutorial Part 1 43
Spark SQL JSON Example Tutorial Part 2 44
Spark SQL MySQL Example With JDBC 47
Overview 47
Requirements 47
Quick Setup 47
Methodology 48
Spark SQL with MySQL (JDBC) Example Tutorial 48
Conclusion Spark SQL with MySQL (JDBC) 49
Spark Streaming with Scala 50
DStreams 50
Architecture and Abstraction 50
Transformations 50
Input Sources 51
Checkpointing 51
Streaming Processing Guarantees 51
Streaming UI 51
Performance Considerations 51
Spark Streaming With Scala 52
Overview 52
Steps 52
Making and Running Our Own NetworkWordCount 52
Steps 52
Spark Streaming With Scala Part 1 Conclusion 53
Spark Streaming – Let’s Stream From Slack 54
Spark Streaming Example Overview 54
Resources 61
Spark Streaming Automated Testing With Scala 63
Pre-requisites 63
Overview 63
Steps 63
Conclusion 69
Additional Resources 69
Spark Machine Learning 70
Overview 70
Apache Spark Machine Learning Example With Scala 70
Apache Spark Machine Learning Example 71
Apache Spark Machine Learning Scala Source Code Review 71
Resources 75
Special Recipes 76
Spark With Amazon S3 77
Apache Spark with Amazon S3 Examples 77
Example Load Text File from S3 Written from Hadoop Library 78
S3 from Spark Text File Interoperability 79
References 79
Apache Spark, Cassandra And Game Of Thrones 80
Overview 80
Requirements 80
Steps 80
Conclusion 86
Spark Cassandra Tutorial Resources 86
Looking Ahead and Thanks Again! 87
The Leanpub 60-day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
Do Well. Do Good.
Authors have earned$11,590,330writing, publishing and selling on Leanpub, earning 80% royalties while saving up to 25 million pounds of CO2 and up to 46,000 trees.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them
Top Books
SignalR on .NET 6 - the Complete Guide
Fiodar SazanavetsLearn everything there is to learn about SignalR and how to integrate it with the latest .NET 6 and C# 10 features. Learn how to connect any type of client to SignalR, including plain WebSocket client. Learn how to build interactive applications that can communicate with each other in real time without making excessive calls.
The easiest way to learn design patterns
Fiodar SazanavetsLearn design patterns in the easiest way possible. You will no longer have to brute-force your way through each one of them while trying to figure out how it works. The book provides a unique methodology that will make your understanding of design patterns stick. It can also be used as a reference book where you can find design patterns in seconds.
Functional event-driven architecture: Powered by Scala 3
Gabriel VolpeExplore the event-driven architecture (EDA) in a purely functional way, mainly powered by Fs2 streams in Scala 3!
Leverage your functional programming skills by designing and writing stateless microservices that scale, powered by stateful message brokers.
C++20 - The Complete Guide
Nicolai M. JosuttisAll the new language and library features of C++20 (for those who know previous versions).
The book presents all new language and library features of C++20. Learn how this impacts day-to-day programming, to benefit in practice, to combine new features, and to avoid all new traps.
Buy early, pay less, free updates.
Other books:
Jetpack Compose internals
Jorge CastilloJetpack Compose is the future of Android UI. Master how it works internally and become a more efficient developer with it. You'll also find it valuable if you are not an Android dev. This book provides all the details to understand how the Compose compiler & runtime work, and how to create a client library using them.
Ansible for DevOps
Jeff GeerlingAnsible is a simple, but powerful, server and configuration management tool. Learn to use Ansible effectively, whether you manage one server—or thousands.
OpenIntro Statistics
David Diez, Christopher Barr, Mine Cetinkaya-Rundel, and OpenIntroA complete foundation for Statistics, also serving as a foundation for Data Science.
Leanpub revenue supports OpenIntro (US-based nonprofit) so we can provide free desk copies to teachers interested in using OpenIntro Statistics in the classroom and expand the project to support free textbooks in other subjects.
More resources: openintro.org.
R Programming for Data Science
Roger D. PengThis book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science. Printed copies of this book are available through Lulu.
CCIE Service Provider Version 4 Written and Lab Exam Comprehensive Guide
Nicholas RussoThe service provider landscape has changed rapidly over the past several years. Networking vendors are continuing to propose new standards, techniques, and procedures for overcoming new challenges while concurrently reducing costs and delivering new services. Cisco has recently updated the CCIE Service Provider track to reflect these changes; this book represents the author's personal journey in achieving that certification.
CCIE SP v5.0
Łukasz Bromirski, Piotr Jablonski, and Nicholas RussoAre you striving to prepare to and pass CCIE SP lab exam? Take the opportunity and get this workbook! With the attached initial cfg files you will prepare yourself for the CCIE SP exam as well as learn SP technologies applicable to all kinds of today modern networks! This workbook covers blueprint topics and provides challenging examples.
Top Bundles
- #1
Practical FP in Scala + Functional event-driven architecture
2 Books
Practical FP in Scala (A hands-on approach) & Functional event-driven architecture, aka FEDA, (Powered by Scala 3), together as a bundle! The content of PFP in Scala is a requirement to understand FEDA so why not take advantage of this bundle!? - #2
Software Architecture for Developers: Volumes 1 & 2 - Technical leadership and communication
2 Books
"Software Architecture for Developers" is a practical and pragmatic guide to modern, lightweight software architecture, specifically aimed at developers. You'll learn:The essence of software architecture.Why the software architecture role should include coding, coaching and collaboration.The things that you really need to think about before... - #3
All the Books of The Medical Futurist
6 Books
We put together the most popular books from The Medical Futurist to provide a clear picture about the major trends shaping the future of medicine and healthcare. Digital health technologies, artificial intelligence, the future of 20 medical specialties, big pharma, data privacy, digital health investments and how technology giants such as Amazon... - #4
CCIE Service Provider Ultimate Study Bundle
2 Books
Piotr Jablonski, Lukasz Bromirski, and Nick Russo have joined forces to deliver the only CCIE Service Provider training resource you'll ever need. This bundle contains a detailed and challenging collection of workbook labs, plus an extensively detailed technical reference guide. All of us have earned the CCIE Service Provider certification... - #6
Pattern-Oriented Memory Forensics and Malware Detection
2 Books
This training bundle for security engineers and researchers, malware and memory forensics analysts includes two accelerated training courses for Windows memory dump analysis using WinDbg. It is also useful for technical support and escalation engineers who analyze memory dumps from complex software environments and need to check for possible... - #7
Modern C++ Collection
3 Books
Get All about Modern C++C++ Standard Library, including C++20Concurrency with Modern C++, including C++20C++20Each book has about 200 complete code examples. Updates are included. When I update one of the books, you immediately get the updated bundle. You can expect significant updates to each new C++ standard (C++23, C++26, .. ) and also... - #9
Linux Administration Complet
4 Books
Ce lot comprend les quatre volumes du Guide Linux Administration :Linux Administration, Volume 1, Administration fondamentale : Guide pratique de préparation aux examens de certification LPIC 1, Linux Essentials, RHCSA et LFCS. Administration fondamentale. Introduction à Linux. Le Shell. Traitement du texte. Arborescence de fichiers. Sécurité...