Exploratory Data Analysis with R
Exploratory Data Analysis with R
About the Book
This book covers the essential exploratory techniques for summarizing data with R. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data you have. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing informative data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.
If you are interested in a printed copy of this book, you can purchase one at Lulu.
Some of the topics we cover are
- Making exploratory graphs
- Principles of analytic graphics
- Plotting systems and graphics devices in R
- The base and ggplot2 plotting systems in R
- Clustering methods
- Dimension reduction techniques
Packages
The Book
This package contains just the book in PDF, EPUB, or MOBI formats.
PDF
EPUB
WEB
English
The Book + Datasets + R Code Files
This package contains the book and R code files corresponding to each of the chapters in the book. The package also contains the datasets used in all of the chapters so that the code can be fully executed.
Includes:
Datasets
R Code Files
PDF
EPUB
WEB
English
The Book + Lecture Videos (HD) + Datasets + R Code Files
This package includes the book, high definition lecture video files (720p) corresponding to each of the chapters, datasets and R code files for all chapters. The videos are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.
Includes:
Datasets
R Code Files
Lecture Videos (HD)
PDF
EPUB
WEB
English
Table of Contents
- 1. Stay in Touch!
- 2. Preface
-
3. Getting Started with R
- 3.1 Installation
- 3.2 Getting started with the R interface
-
4. Managing Data Frames with the
dplyr
package- 4.1 Data Frames
-
4.2 The
dplyr
Package -
4.3
dplyr
Grammar -
4.4 Installing the
dplyr
package -
4.5
select()
-
4.6
filter()
-
4.7
arrange()
-
4.8
rename()
-
4.9
mutate()
-
4.10
group_by()
-
4.11
%>%
- 4.12 Summary
-
5. Exploratory Data Analysis Checklist
- 5.1 Formulate your question
- 5.2 Read in your data
- 5.3 Check the packaging
-
5.4 Run
str()
- 5.5 Look at the top and the bottom of your data
- 5.6 Check your “n”s
- 5.7 Validate with at least one external data source
- 5.8 Try the easy solution first
- 5.9 Challenge your solution
- 5.10 Follow up questions
-
6. Principles of Analytic Graphics
- 6.1 Show comparisons
- 6.2 Show causality, mechanism, explanation, systematic structure
- 6.3 Show multivariate data
- 6.4 Integrate evidence
- 6.5 Describe and document the evidence
- 6.6 Content, Content, Content
- 6.7 References
-
7. Exploratory Graphs
- 7.1 Characteristics of exploratory graphs
- 7.2 Air Pollution in the United States
- 7.3 Getting the Data
- 7.4 Simple Summaries: One Dimension
- 7.5 Five Number Summary
- 7.6 Boxplot
- 7.7 Histogram
- 7.8 Overlaying Features
- 7.9 Barplot
- 7.10 Simple Summaries: Two Dimensions and Beyond
- 7.11 Multiple Boxplots
- 7.12 Multiple Histograms
- 7.13 Scatterplots
- 7.14 Scatterplot - Using Color
- 7.15 Multiple Scatterplots
- 7.16 Summary
-
8. Plotting Systems
- 8.1 The Base Plotting System
- 8.2 The Lattice System
- 8.3 The ggplot2 System
- 8.4 References
-
9. Graphics Devices
- 9.1 The Process of Making a Plot
- 9.2 How Does a Plot Get Created?
- 9.3 Graphics File Devices
- 9.4 Multiple Open Graphics Devices
- 9.5 Copying Plots
- 9.6 Summary
-
10. The Base Plotting System
- 10.1 Base Graphics
- 10.2 Simple Base Graphics
- 10.3 Some Important Base Graphics Parameters
- 10.4 Base Plotting Functions
- 10.5 Base Plot with Regression Line
- 10.6 Multiple Base Plots
- 10.7 Summary
-
11. Plotting and Color in R
- 11.1 Colors 1, 2, and 3
- 11.2 Connecting colors with data
- 11.3 Color Utilities in R
-
11.4
colorRamp()
-
11.5
colorRampPalette()
- 11.6 RColorBrewer Package
- 11.7 Using the RColorBrewer palettes
-
11.8 The
smoothScatter()
function - 11.9 Adding transparency
- 11.10 Summary
-
12. Hierarchical Clustering
- 12.1 Hierarchical clustering
- 12.2 How do we define close?
- 12.3 Example: Euclidean distance
- 12.4 Example: Manhattan distance
- 12.5 Example: Hierarchical clustering
- 12.6 Prettier dendrograms
- 12.7 Merging points: Complete
- 12.8 Merging points: Average
-
12.9 Using the
heatmap()
function - 12.10 Notes and further resources
-
13. K-Means Clustering
- 13.1 Illustrating the K-means algorithm
- 13.2 Stopping the algorithm
-
13.3 Using the
kmeans()
function - 13.4 Building heatmaps from K-means solutions
- 13.5 Notes and further resources
-
14. Dimension Reduction
- 14.1 Matrix data
- 14.2 Patterns in rows and columns
- 14.3 Related problem
- 14.4 SVD and PCA
- 14.5 Unpacking the SVD: u and v
- 14.6 SVD for data compression
- 14.7 Components of the SVD - Variance explained
- 14.8 Relationship to principal components
- 14.9 What if we add a second pattern?
- 14.10 Dealing with missing values
- 14.11 Example: Face data
- 14.12 Notes and further resources
-
15. The ggplot2 Plotting System: Part 1
-
15.1 The Basics:
qplot()
- 15.2 Before You Start: Label Your Data
- 15.3 ggplot2 “Hello, world!”
- 15.4 Modifying aesthetics
- 15.5 Adding a geom
- 15.6 Histograms
- 15.7 Facets
- 15.8 Case Study: MAACS Cohort
- 15.9 Summary of qplot()
-
15.1 The Basics:
-
16. The ggplot2 Plotting System: Part 2
- 16.1 Basic Components of a ggplot2 Plot
- 16.2 Example: BMI, PM2.5, Asthma
- 16.3 Building Up in Layers
- 16.4 First Plot with Point Layer
- 16.5 Adding More Layers: Smooth
- 16.6 Adding More Layers: Facets
- 16.7 Modifying Geom Properties
- 16.8 Modifying Labels
- 16.9 Customizing the Smooth
- 16.10 Changing the Theme
- 16.11 More Complex Example
- 16.12 A Quick Aside about Axis Limits
- 16.13 Resources
-
17. Data Analysis Case Study: Changes in Fine Particle Air Pollution in the U.S.
- 17.1 Synopsis
- 17.2 Loading and Processing the Raw Data
- 17.3 Results
- 18. About the Author
Other books by this author
The Leanpub 60-day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
80% Royalties. Earn $16 on a $20 book.
We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earned$12,046,757writing, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them
Top Books
OpenIntro Statistics
David Diez, Christopher Barr, Mine Cetinkaya-Rundel, and OpenIntroA complete foundation for Statistics, also serving as a foundation for Data Science.
Leanpub revenue supports OpenIntro (US-based nonprofit) so we can provide free desk copies to teachers interested in using OpenIntro Statistics in the classroom and expand the project to support free textbooks in other subjects.
More resources: openintro.org.
Mastering STM32 - Second Edition
Carmine NovielloWith more than 1200 microcontrollers, STM32 is probably the most complete ARM Cortex-M platform on the market. This book aims to be the most complete guide around introducing the reader to this exciting MCU portfolio from ST Microelectronics and its official CubeHAL and STM32CubeIDE development environment.
C++20 - The Complete Guide
Nicolai M. JosuttisAll new language and library features of C++20 (for those who know previous C++ versions).
The book presents all new language and library features of C++20. Learn how this impacts day-to-day programming, to benefit in practice, to combine new features, and to avoid all new traps.
Buy early, pay less, free updates.
Other books:
Jetpack Compose internals
Jorge CastilloJetpack Compose is the future of Android UI. Master how it works internally and become a more efficient developer with it. You'll also find it valuable if you are not an Android dev. This book provides all the details to understand how the Compose compiler & runtime work, and how to create a client library using them.
Talking with Tech Leads
Patrick KuaA book for Tech Leads, from Tech Leads. Discover how more than 35 Tech Leads find the delicate balance between the technical and non-technical worlds. Discover the challenges a Tech Lead faces and how to overcome them. You may be surprised by the lessons they have to share.Functional Event-Driven Architecture
Gabriel VolpeExplore the event-driven architecture (EDA) in a purely functional way. Learn to design and develop distributed systems that scale. Identify common design patterns in such systems.
Take your functional programming skills to the next level by joining me in developing a distributed system powered by Apache Pulsar and Fs2 streams, all in Scala 3!
Machine Learning Q and AI
Sebastian Raschka, PhDHave you recently completed a machine learning or deep learning course and wondered what to learn next? With 30 questions and answers on key concepts in machine learning and AI, this book provides bite-sized bits of knowledge for your journey to becoming a machine learning expert.
Getting to Know IntelliJ IDEA
Trisha Gee and Helen ScottIf we treat our IDE as a text editor, we are doing ourselves a disservice. Using a combination of tutorials and a questions-and-answers approach, Getting to Know IntelliJ IDEA will help you find ways to use IntelliJ IDEA that enable you to work comfortably and productively as a professional developer.
The Rails 7 Way
Obie Fernandez, Lucas Dohmen, and Tom Henrik AadlandThe Rails™ 7 Way is the comprehensive, authoritative reference guide for professionals delivering production-quality code using modern Ruby on Rails. It illuminates the entire Rails 7 API, its most powerful idioms, design approaches, and libraries. Building on the previous editions, this edition has been heavily refactored and updated.
Ansible for DevOps
Jeff GeerlingAnsible is a simple, but powerful, server and configuration management tool. Learn to use Ansible effectively, whether you manage one server—or thousands.
Top Bundles
- #1
Software Architecture
2 Books
"Software Architecture for Developers" is a practical and pragmatic guide to modern, lightweight software architecture, specifically aimed at developers. You'll learn:The essence of software architecture.Why the software architecture role should include coding, coaching and collaboration.The things that you really need to think about before... - #2
CCIE Service Provider Ultimate Study Bundle
2 Books
Piotr Jablonski, Lukasz Bromirski, and Nick Russo have joined forces to deliver the only CCIE Service Provider training resource you'll ever need. This bundle contains a detailed and challenging collection of workbook labs, plus an extensively detailed technical reference guide. All of us have earned the CCIE Service Provider certification... - #3
Modern C++ Collection
3 Books
Get All about Modern C++C++ Standard Library, including C++20Concurrency with Modern C++, including C++20C++20Each book has about 200 complete code examples. Updates are included. When I update one of the books, you immediately get the updated bundle. You can expect significant updates to each new C++ standard (C++23, C++26, .. ) and also... - #4
Pattern-Oriented Memory Forensics and Malware Detection
2 Books
This training bundle for security engineers and researchers, malware and memory forensics analysts includes two accelerated training courses for Windows memory dump analysis using WinDbg. It is also useful for technical support and escalation engineers who analyze memory dumps from complex software environments and need to check for possible... - #5
1500 QUIZ COMMENTATI (3 libri)
3 Books
Tre libri dei QUIZ MMG Commentati al prezzo di DUE! I QUIZ dei concorsi ufficiali di Medicina Generale relativi agli anni: 2000-2001-2003-2012-2013-2014-2015-2016-2017-2018-2019-2020-2021 +100 inediti Raccolti in unico bundle per aiutarvi nello studio e nella preparazione al concorso. All'interno di ogni libro i quiz sono stati suddivisi per... - #6
Practical FP in Scala + Functional event-driven architecture
2 Books
Practical FP in Scala (A hands-on approach) & Functional event-driven architecture, aka FEDA, (Powered by Scala 3), together as a bundle! The content of PFP in Scala is a requirement to understand FEDA so why not take advantage of this bundle!? - #8
Growing Agile: The Complete Coach's Guide
7 Books
Growing Agile: Coach's Guide Series This bundle provides a collection of training and workshop plans for a variety of agile topics. The series is aimed at agile coaches, trainers and ScrumMasters who often find themselves needing to help teams understand agile concepts. Each book in the series provides the plans, slides, handouts and activity... - #9
Development and Deployment of Multiplayer Online Games, Part ARCH. Architecture (Vol. I-III)
3 Books
What's the Big Idea? The idea behind this book is to summarize the body of knowledge that already exists on multiplayer games but is not available in one single place.And quite a fewof the issues discussed within this series (planned as three nine volumes ~300 pages each), while known in the industry, have not been published at all (except for...