Spark Tutorials with Scala
Spark Tutorials with Scala
The Beginner's Guide
About the Book
Want to learn Apache Spark with Scala? Looking for a place to begin?
In this book, Apache Spark with Scala tutorials are presented from a wide variety of perspectives.
The approach is hands-on with access to source code downloads and screencasts of running examples. Get ready to learn by examples!
Who is this for?
This book is suitable for beginners with no Spark or Scala experience, but some background in programming and/or databases. It's a beginner book, but not for people brand new to development or data engineering. This book is designed for people to augment their existing skills to advance their career and/or make better data intensive products.
What You’ll Learn
For just $13, you’ll gain a great real-world understanding of how to use Spark with Scala. You will also learn the following:
- How to use Spark from Scala
- Comparison of Spark and Hadoop
- Core Spark constructs: Resilient Distributed Datasets, Transformations, and Actions
- Running Two Types of Spark Clusters
- Deploying Scala applications to Spark Clusters
- Spark SQL with Scala including CSV, JSON, and relational databases
- Custom, Scala based Spark Streaming application
- Writing and running automated tests for Spark applications
- Build a custom Spark Machine Learning application
- Spark with Amazon S3
- Using Cassandra from Spark
By the end of this book, you'll be confident and productive using Spark with Scala in a variety of circumstances.
Why Spark and Scala?
Using Spark from a functional and object-oriented language like Scala are changing the way "big data" applications are built and deployed. Moreover, this is just the beginning of a paradigm shift in data engineering and data science.
Now and in the foreseeable future, companies will compete based on their ability to process huge volumes of data and their proprietary algorithms to create competitive advantages. But, how will this be accomplished? Two prominent tools are Spark and Scala.
Stay ahead of the curve and get in now. Begin by learning Spark with Scala through tutorial examples.
Bonus Resources: Code Samples and Screencasts
Code samples are provided in a GitHub repository to download and use for learning or within your own projects.
Also, links to video screencasts of the author running examples and explaining tutorials are available from within the book.If you have any questions or comments, please don't hesitate to get in touch.
Before We Begin 7
Objectives and Expectations 7
Beyond this Book 8
What, Why, How 9
What is Apache Spark? 9
Why Spark? 9
Fundamentals of Apache Spark 9
How to Be Productive with Spark? 10
Apache Spark Ecosystem Components 10
Conclusion What about Hadoop? 10
Spark RDDs A Two Minute Guide For Beginners 11
What is a Spark RDD? 11
How are Spark RDDs created? 11
Why Spark RDDs? 11
When to use Spark RDDs? 12
Apache Spark The Building Blocks 13
Spark with Scala First Tutorial 13
Spark Context and Resilient Distributed Datasets 15
Actions and Transformations 16
Looking Ahead 17
Apache Spark: Examples Of Transformations 18
Transformations Part 1 18
Transformations Part 2 22
Transformations Part 3 23
Apache Spark: Examples Of Actions 26
Spark Clusters 31
Apache Spark Cluster Part 1: Run Standalone 31
Running a Spark Standalone Cluster 31
Spark Cluster Part 2: Deploy Scala Program To Spark Cluster 35
Steps to Deploy Scala Program to Spark Cluster 35
Further Reference 37
Spark SQL with Scala 38
Looking ahead 38
Spark SQL CSV Examples 39
Spark SQL CSV Example Tutorial Part 1 39
Spark SQL CSV Example Tutorial Part 2 41
Spark SQL JSON Examples 43
Spark SQL JSON Example Tutorial Part 1 43
Spark SQL JSON Example Tutorial Part 2 44
Spark SQL MySQL Example With JDBC 47
Quick Setup 47
Spark SQL with MySQL (JDBC) Example Tutorial 48
Conclusion Spark SQL with MySQL (JDBC) 49
Spark Streaming with Scala 50
Architecture and Abstraction 50
Input Sources 51
Streaming Processing Guarantees 51
Streaming UI 51
Performance Considerations 51
Spark Streaming With Scala 52
Making and Running Our Own NetworkWordCount 52
Spark Streaming With Scala Part 1 Conclusion 53
Spark Streaming – Let’s Stream From Slack 54
Spark Streaming Example Overview 54
Spark Streaming Automated Testing With Scala 63
Additional Resources 69
Spark Machine Learning 70
Apache Spark Machine Learning Example With Scala 70
Apache Spark Machine Learning Example 71
Apache Spark Machine Learning Scala Source Code Review 71
Special Recipes 76
Spark With Amazon S3 77
Apache Spark with Amazon S3 Examples 77
Example Load Text File from S3 Written from Hadoop Library 78
S3 from Spark Text File Interoperability 79
Apache Spark, Cassandra And Game Of Thrones 80
Spark Cassandra Tutorial Resources 86
Looking Ahead and Thanks Again! 87
The Leanpub 45-day 100% Happiness Guarantee
Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Algebra-Driven DesignSandy Maguire
A how-to field guide on building leak-free abstractions and algebraically designing real-world applications.
Production HaskellMatt Parsons
Are you excited about Haskell, but don't know where to begin? Are you thrilled by the technical advantages, but worried about the unknown pitfalls? This book has you covered.
Ansible for DevOpsJeff Geerling
Ansible is a simple, but powerful, server and configuration management tool. Learn to use Ansible effectively, whether you manage one server—or thousands.
The Hundred-Page Machine Learning BookAndriy Burkov
Everything you really need to know in Machine Learning in a hundred pages.
Machine Learning EngineeringAndriy Burkov
"If you intend to use machine learning to solve business problems at scale, I'm delighted you got your hands on this book."
—Cassie Kozyrkov, Chief Decision Scientist at Google
"Foundational work about the reality of building machine learning models in production."
—Karolis Urbonas, Head of Machine Learning and Science at Amazon
Cloud StrategyGregor Hohpe
While most enterprises are moving to the cloud these days, many initiatives are driven by wishes or promises rather than a sound strategy. Harvested from half a decade of cloud migrations, this book shares frameworks, strategies, and anecdotes for a structured and decision-centric path to cloud success.
OpenIntro StatisticsDavid Diez, Christopher Barr, Mine Cetinkaya-Rundel, and OpenIntro
A complete foundation for Statistics, also serving as a foundation for Data Science.
Leanpub revenue supports OpenIntro (US-based nonprofit) so we can provide free desk copies to teachers interested in using OpenIntro Statistics in the classroom and expand the project to support free textbooks in other subjects.
More resources: openintro.org.
CCIE Service Provider Version 4 Written and Lab Exam Comprehensive GuideNicholas Russo
The service provider landscape has changed rapidly over the past several years. Networking vendors are continuing to propose new standards, techniques, and procedures for overcoming new challenges while concurrently reducing costs and delivering new services. Cisco has recently updated the CCIE Service Provider track to reflect these changes; this book represents the author's personal journey in achieving that certification.
CCIE SP v4.1 - WorkbookŁukasz Bromirski, Piotr Jablonski, and Nicholas Russo
Are you striving to prepare to and pass CCIE SP lab exam? Take the opportunity and get this workbook! With the attached initial cfg files you will prepare yourself for the CCIE SP exam as well as learn SP technologies applicable to all kinds of today modern networks! This workbook covers blueprint topics and provides challenging examples.
Sockets and PipesType Classes
Sockets and Pipes is not an introduction to Haskell; it is an introduction to writing software in Haskell. Using a handful of everyday Haskell libraries, this book walks through reading the HTTP specification and implementing it to create a web server.
The Node.js Bundle
3 BooksThis bundle combines three bestselling Leanpub Node.js books into a package that gives you everything you need to get started with developing Node.js applications at an unbeatable price.
The Tester's Library
8 BooksThe Tester's Library consists of eight five-star books that every software tester should read and re-read. As bound books, this collection would cost over $200. Even as e-books, their price would exceed $80, but in this bundle, their cost is only $49.99. Here are the books, and why they should be in your library: Perfect Software and Other...
11 BooksIn this bundle, you will find 10 different agile books. They are about different aspects of being agile. - finding a job - doing coding dojo's - Retrospectives - Personal kanban - a non-typical coaching book and even a book that gives you an insight in the lives of some agile people.
WTFlop 6M + HU - Beta Bundle
Fifty Quick Ideas
3 BooksGet all three books for the price of two! Fifty Quick Ideas books are full of practical, real-world techniques that you can use to improve teamwork, build better products and build them in a better way.
Growing Agile: Coach's Guide Series
4 BooksThis bundle provides a collection of training and workshop plans for a variety of agile topics. The series is aimed at agile coaches, trainers and ScrumMasters who often find themselves needing to help teams understand agile concepts. Each book in the series provides the plans, slides, handouts and activity instructions to run a number of...
Marionette.js A to Z
Complete Scala Bundle
3 BooksScala is a general-purpose programming language and it's getting extremely popular these days. Some say that learning Scala could be a challenging task. My experience, however, suggests that this is actually a myth that has very little to do with reality. With the right approach, learning Scala can be easy, fun and rewarding.The first book from...
Build A Better Backbone App
3 BooksThe best way to learn new development skills is through experience, but that takes time you don't have.Get the best of both worlds with this bundle: you'll learn how to produce modern web applications by learning from experienced developers like Derick Bailey and David Sulc. BackboneJS is one of the favorite tools on the web today, but it...