Exploratory Data Analysis with R
Last updated on 20160720
About the Book
This book covers the essential exploratory techniques for summarizing data with R. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data you have. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing informative data graphics. We will also cover some of the common multivariate statistical techniques used to visualize highdimensional data.
If you are interested in a printed copy of this book, you can purchase one at Lulu.
Some of the topics we cover are
 Making exploratory graphs
 Principles of analytic graphics
 Plotting systems and graphics devices in R
 The base and ggplot2 plotting systems in R
 Clustering methods
 Dimension reduction techniques
Packages
The Book
This package contains just the book in PDF, EPUB, or MOBI formats.
English
PDF
EPUB
MOBI
APP
The Book + Datasets + R Code Files
This package contains the book and R code files corresponding to each of the chapters in the book. The package also contains the datasets used in all of the chapters so that the code can be fully executed.
Includes:
Datasets
R Code Files
English
PDF
EPUB
MOBI
APP
The Book + Lecture Videos (HD) + Datasets + R Code Files
This package includes the book, high definition lecture video files (720p) corresponding to each of the chapters, datasets and R code files for all chapters. The videos are licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International license.
Includes:
Datasets
R Code Files
Lecture Videos (HD)
English
PDF
EPUB
MOBI
APP
Table of Contents
 1. Stay in Touch!
 2. Preface

3. Getting Started with R
 3.1 Installation
 3.2 Getting started with the R interface

4. Managing Data Frames with the
dplyr
package 4.1 Data Frames

4.2 The
dplyr
Package 
4.3
dplyr
Grammar 
4.4 Installing the
dplyr
package 
4.5
select()

4.6
filter()

4.7
arrange()

4.8
rename()

4.9
mutate()

4.10
group_by()

4.11
%>%
 4.12 Summary

5. Exploratory Data Analysis Checklist
 5.1 Formulate your question
 5.2 Read in your data
 5.3 Check the packaging

5.4 Run
str()
 5.5 Look at the top and the bottom of your data
 5.6 Check your “n”s
 5.7 Validate with at least one external data source
 5.8 Try the easy solution first
 5.9 Challenge your solution
 5.10 Follow up questions

6. Principles of Analytic Graphics
 6.1 Show comparisons
 6.2 Show causality, mechanism, explanation, systematic structure
 6.3 Show multivariate data
 6.4 Integrate evidence
 6.5 Describe and document the evidence
 6.6 Content, Content, Content
 6.7 References

7. Exploratory Graphs
 7.1 Characteristics of exploratory graphs
 7.2 Air Pollution in the United States
 7.3 Getting the Data
 7.4 Simple Summaries: One Dimension
 7.5 Five Number Summary
 7.6 Boxplot
 7.7 Histogram
 7.8 Overlaying Features
 7.9 Barplot
 7.10 Simple Summaries: Two Dimensions and Beyond
 7.11 Multiple Boxplots
 7.12 Multiple Histograms
 7.13 Scatterplots
 7.14 Scatterplot  Using Color
 7.15 Multiple Scatterplots
 7.16 Summary

8. Plotting Systems
 8.1 The Base Plotting System
 8.2 The Lattice System
 8.3 The ggplot2 System
 8.4 References

9. Graphics Devices
 9.1 The Process of Making a Plot
 9.2 How Does a Plot Get Created?
 9.3 Graphics File Devices
 9.4 Multiple Open Graphics Devices
 9.5 Copying Plots
 9.6 Summary

10. The Base Plotting System
 10.1 Base Graphics
 10.2 Simple Base Graphics
 10.3 Some Important Base Graphics Parameters
 10.4 Base Plotting Functions
 10.5 Base Plot with Regression Line
 10.6 Multiple Base Plots
 10.7 Summary

11. Plotting and Color in R
 11.1 Colors 1, 2, and 3
 11.2 Connecting colors with data
 11.3 Color Utilities in R

11.4
colorRamp()

11.5
colorRampPalette()
 11.6 RColorBrewer Package
 11.7 Using the RColorBrewer palettes

11.8 The
smoothScatter()
function  11.9 Adding transparency
 11.10 Summary

12. Hierarchical Clustering
 12.1 Hierarchical clustering
 12.2 How do we define close?
 12.3 Example: Euclidean distance
 12.4 Example: Manhattan distance
 12.5 Example: Hierarchical clustering
 12.6 Prettier dendrograms
 12.7 Merging points: Complete
 12.8 Merging points: Average

12.9 Using the
heatmap()
function  12.10 Notes and further resources

13. KMeans Clustering
 13.1 Illustrating the Kmeans algorithm
 13.2 Stopping the algorithm

13.3 Using the
kmeans()
function  13.4 Building heatmaps from Kmeans solutions
 13.5 Notes and further resources

14. Dimension Reduction
 14.1 Matrix data
 14.2 Patterns in rows and columns
 14.3 Related problem
 14.4 SVD and PCA
 14.5 Unpacking the SVD: u and v
 14.6 SVD for data compression
 14.7 Components of the SVD  Variance explained
 14.8 Relationship to principal components
 14.9 What if we add a second pattern?
 14.10 Dealing with missing values
 14.11 Example: Face data
 14.12 Notes and further resources

15. The ggplot2 Plotting System: Part 1

15.1 The Basics:
qplot()
 15.2 Before You Start: Label Your Data
 15.3 ggplot2 “Hello, world!”
 15.4 Modifying aesthetics
 15.5 Adding a geom
 15.6 Histograms
 15.7 Facets
 15.8 Case Study: MAACS Cohort
 15.9 Summary of qplot()

15.1 The Basics:

16. The ggplot2 Plotting System: Part 2
 16.1 Basic Components of a ggplot2 Plot
 16.2 Example: BMI, PM2.5, Asthma
 16.3 Building Up in Layers
 16.4 First Plot with Point Layer
 16.5 Adding More Layers: Smooth
 16.6 Adding More Layers: Facets
 16.7 Modifying Geom Properties
 16.8 Modifying Labels
 16.9 Customizing the Smooth
 16.10 Changing the Theme
 16.11 More Complex Example
 16.12 A Quick Aside about Axis Limits
 16.13 Resources

17. Data Analysis Case Study: Changes in Fine Particle Air Pollution in the U.S.
 17.1 Synopsis
 17.2 Loading and Processing the Raw Data
 17.3 Results
 18. About the Author
The Leanpub 45day 100% Happiness Guarantee
Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms...