Exploratory Data Analysis with R
Exploratory Data Analysis with R
About the Book
This book covers the essential exploratory techniques for summarizing data with R. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data you have. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing informative data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.
If you are interested in a printed copy of this book, you can purchase one at Lulu.
Some of the topics we cover are
- Making exploratory graphs
- Principles of analytic graphics
- Plotting systems and graphics devices in R
- The base and ggplot2 plotting systems in R
- Clustering methods
- Dimension reduction techniques
Packages
The Book
This package contains just the book in PDF, EPUB, or MOBI formats.
English
PDF
EPUB
MOBI
WEB
The Book + Datasets + R Code Files
This package contains the book and R code files corresponding to each of the chapters in the book. The package also contains the datasets used in all of the chapters so that the code can be fully executed.
Includes:
Datasets
R Code Files
English
PDF
EPUB
MOBI
WEB
The Book + Lecture Videos (HD) + Datasets + R Code Files
This package includes the book, high definition lecture video files (720p) corresponding to each of the chapters, datasets and R code files for all chapters. The videos are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.
Includes:
Datasets
R Code Files
Lecture Videos (HD)
English
PDF
EPUB
MOBI
WEB
Table of Contents
- 1. Stay in Touch!
- 2. Preface
-
3. Getting Started with R
- 3.1 Installation
- 3.2 Getting started with the R interface
-
4. Managing Data Frames with the
dplyr
package- 4.1 Data Frames
-
4.2 The
dplyr
Package -
4.3
dplyr
Grammar -
4.4 Installing the
dplyr
package -
4.5
select()
-
4.6
filter()
-
4.7
arrange()
-
4.8
rename()
-
4.9
mutate()
-
4.10
group_by()
-
4.11
%>%
- 4.12 Summary
-
5. Exploratory Data Analysis Checklist
- 5.1 Formulate your question
- 5.2 Read in your data
- 5.3 Check the packaging
-
5.4 Run
str()
- 5.5 Look at the top and the bottom of your data
- 5.6 Check your “n”s
- 5.7 Validate with at least one external data source
- 5.8 Try the easy solution first
- 5.9 Challenge your solution
- 5.10 Follow up questions
-
6. Principles of Analytic Graphics
- 6.1 Show comparisons
- 6.2 Show causality, mechanism, explanation, systematic structure
- 6.3 Show multivariate data
- 6.4 Integrate evidence
- 6.5 Describe and document the evidence
- 6.6 Content, Content, Content
- 6.7 References
-
7. Exploratory Graphs
- 7.1 Characteristics of exploratory graphs
- 7.2 Air Pollution in the United States
- 7.3 Getting the Data
- 7.4 Simple Summaries: One Dimension
- 7.5 Five Number Summary
- 7.6 Boxplot
- 7.7 Histogram
- 7.8 Overlaying Features
- 7.9 Barplot
- 7.10 Simple Summaries: Two Dimensions and Beyond
- 7.11 Multiple Boxplots
- 7.12 Multiple Histograms
- 7.13 Scatterplots
- 7.14 Scatterplot - Using Color
- 7.15 Multiple Scatterplots
- 7.16 Summary
-
8. Plotting Systems
- 8.1 The Base Plotting System
- 8.2 The Lattice System
- 8.3 The ggplot2 System
- 8.4 References
-
9. Graphics Devices
- 9.1 The Process of Making a Plot
- 9.2 How Does a Plot Get Created?
- 9.3 Graphics File Devices
- 9.4 Multiple Open Graphics Devices
- 9.5 Copying Plots
- 9.6 Summary
-
10. The Base Plotting System
- 10.1 Base Graphics
- 10.2 Simple Base Graphics
- 10.3 Some Important Base Graphics Parameters
- 10.4 Base Plotting Functions
- 10.5 Base Plot with Regression Line
- 10.6 Multiple Base Plots
- 10.7 Summary
-
11. Plotting and Color in R
- 11.1 Colors 1, 2, and 3
- 11.2 Connecting colors with data
- 11.3 Color Utilities in R
-
11.4
colorRamp()
-
11.5
colorRampPalette()
- 11.6 RColorBrewer Package
- 11.7 Using the RColorBrewer palettes
-
11.8 The
smoothScatter()
function - 11.9 Adding transparency
- 11.10 Summary
-
12. Hierarchical Clustering
- 12.1 Hierarchical clustering
- 12.2 How do we define close?
- 12.3 Example: Euclidean distance
- 12.4 Example: Manhattan distance
- 12.5 Example: Hierarchical clustering
- 12.6 Prettier dendrograms
- 12.7 Merging points: Complete
- 12.8 Merging points: Average
-
12.9 Using the
heatmap()
function - 12.10 Notes and further resources
-
13. K-Means Clustering
- 13.1 Illustrating the K-means algorithm
- 13.2 Stopping the algorithm
-
13.3 Using the
kmeans()
function - 13.4 Building heatmaps from K-means solutions
- 13.5 Notes and further resources
-
14. Dimension Reduction
- 14.1 Matrix data
- 14.2 Patterns in rows and columns
- 14.3 Related problem
- 14.4 SVD and PCA
- 14.5 Unpacking the SVD: u and v
- 14.6 SVD for data compression
- 14.7 Components of the SVD - Variance explained
- 14.8 Relationship to principal components
- 14.9 What if we add a second pattern?
- 14.10 Dealing with missing values
- 14.11 Example: Face data
- 14.12 Notes and further resources
-
15. The ggplot2 Plotting System: Part 1
-
15.1 The Basics:
qplot()
- 15.2 Before You Start: Label Your Data
- 15.3 ggplot2 “Hello, world!”
- 15.4 Modifying aesthetics
- 15.5 Adding a geom
- 15.6 Histograms
- 15.7 Facets
- 15.8 Case Study: MAACS Cohort
- 15.9 Summary of qplot()
-
15.1 The Basics:
-
16. The ggplot2 Plotting System: Part 2
- 16.1 Basic Components of a ggplot2 Plot
- 16.2 Example: BMI, PM2.5, Asthma
- 16.3 Building Up in Layers
- 16.4 First Plot with Point Layer
- 16.5 Adding More Layers: Smooth
- 16.6 Adding More Layers: Facets
- 16.7 Modifying Geom Properties
- 16.8 Modifying Labels
- 16.9 Customizing the Smooth
- 16.10 Changing the Theme
- 16.11 More Complex Example
- 16.12 A Quick Aside about Axis Limits
- 16.13 Resources
-
17. Data Analysis Case Study: Changes in Fine Particle Air Pollution in the U.S.
- 17.1 Synopsis
- 17.2 Loading and Processing the Raw Data
- 17.3 Results
- 18. About the Author
Other books by this author
Authors have earned$10,247,807writing, publishing and selling on Leanpub, earning 80% royalties while saving up to 25 million pounds of CO2 and up to 46,000 trees.
Learn more about writing on Leanpub
The Leanpub 45-day 100% Happiness Guarantee
Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them
Top Books
500 QUIZ MMG COMMENTATI
ALS Medicina Generale500 Quiz degli ULTIMI Concorsi di Medicina Generale (2014/2016/2017/2018/2019)
Riassunti e suddivisi per area con Griglia risposte vuota e Griglia risposte esatte Ministeriale
Commentati con link alla fonte per approfondimento e ausilio allo studio
C++20
Rainer GrimmC++20 is the next big C++ standard after C++11. As C++11 did it, C++20 changes the way we program modern C++. This change is, in particular, due to the big four of C++20: ranges, coroutines, concepts, and modules.
Functional Design and Architecture
Alexander GraninSoftware Design in Functional Programming, Design Patterns and Practices, Methodologies and Application Architectures. How to build real software in Haskell with less efforts and low risks. The first complete source of knowledge.
Atomic Kotlin
Bruce Eckel and Svetlana IsakovaFor both beginning and experienced programmers! From the author of the multi-award-winning Thinking in C++ and Thinking in Java together with a member of the Kotlin language team comes a book that breaks the concepts into small, easy-to-digest "atoms," along with exercises supported by hints and solutions directly inside IntelliJ IDEA!
R Programming for Data Science
Roger D. PengThis book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science. Printed copies of this book are available through Lulu.
Ansible for DevOps
Jeff GeerlingAnsible is a simple, but powerful, server and configuration management tool. Learn to use Ansible effectively, whether you manage one server—or thousands.
Algebra-Driven Design
Sandy MaguireA how-to field guide on building leak-free abstractions and algebraically designing real-world applications.
Thinking with Types
Sandy MaguireThis book aims to be the comprehensive manual for type-level programming. It's about getting you from here to there---from a competent Haskell programmer to one who convinces the compiler to do their work for them.
C++ Best Practices
Jason TurnerLevel up your C++, get the tools working for you, eliminate common problems, and move on to more exciting things!
Stratospheric
Tom Hombergs, Björn Wilmsmann, and Philip RiecksFrom Zero to Production with Spring Boot and AWS. All you need to know to get a Spring Boot application into production with AWS. No previous AWS knowledge required.
Top Bundles
- #1
Software Architecture for Developers: Volumes 1 & 2 - Technical leadership and communication
2 Books
"Software Architecture for Developers" is a practical and pragmatic guide to modern, lightweight software architecture, specifically aimed at developers. You'll learn:The essence of software architecture.Why the software architecture role should include coding, coaching and collaboration.The things that you really need to think about before... - #4
Cloud Architect: Transform Technology and Organization
2 Books
Architects don't just recite product names and features. They understand the options, decisions, and trade-offs behind them. They earn credibility and maintain authenticity by connecting the penthouse with the engine room. Get two essential books that redefine the role of the software and IT architect at one low price:37 Things One Architect... - #6
Linux Administration Complet
4 Books
Ce lot comprend les quatre volumes du Guide Linux Administration :Linux Administration, Volume 1, Administration fondamentale : Guide pratique de préparation aux examens de certification LPIC 1, Linux Essentials, RHCSA et LFCS. Administration fondamentale. Introduction à Linux. Le Shell. Traitement du texte. Arborescence de fichiers. Sécurité... - #7
The Python Craftsman
3 Books
The Python Craftsman series comprises The Python Apprentice, The Python Journeyman, and The Python Master. The first book is primarily suitable for for programmers with some experience of programming in another language. If you don't have any experience with programming this book may be a bit daunting. You'll be learning not just a programming... - #10
All the Books of The Medical Futurist
6 Books
We put together the most popular books from The Medical Futurist to provide a clear picture about the major trends shaping the future of medicine and healthcare. Digital health technologies, artificial intelligence, the future of 20 medical specialties, big pharma, data privacy, digital health investments and how technology giants such as Amazon...