Exploratory Data Analysis with R
Free!
Minimum price
$15.00
Suggested price

Exploratory Data Analysis with R

About the Book

This book covers the essential exploratory techniques for summarizing data with R. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data you have. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing informative data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.

If you are interested in a printed copy of this book, you can purchase one at Lulu.

Some of the topics we cover are

  • Making exploratory graphs
  • Principles of analytic graphics
  • Plotting systems and graphics devices in R
  • The base and ggplot2 plotting systems in R
  • Clustering methods
  • Dimension reduction techniques

About the Author

Roger D. Peng
Roger D. Peng

Roger D. Peng is a Professor of Statistics and Data Sciences at the University of Texas, Austin. Previously, he was Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. His research focuses on the development of statistical methods for addressing environmental health problems and on developing tools for doing better data analysis. He is the author of the popular book R Programming for Data Science and 10 other books on data science and statistics. He is also the co-creator of the Johns Hopkins Data Science Specialization, the Simply Statistics blog where he writes about statistics for the public, the Not So Standard Deviations podcast with Hilary Parker, and The Effort Report podcast with Elizabeth Matsui. Roger is a Fellow of the American Statistical Association and is the recipient of the Mortimer Spiegelman Award from the American Public Health Association, which honors a statistician who has made outstanding contributions to public health. He can be found on Twitter and GitHub at @rdpeng.

Roger D. Peng

Episode 16

Packages

The Book

This package contains just the book in PDF, EPUB, or MOBI formats.

  • PDF

  • EPUB

  • WEB

  • English

Free!
Minimum price
$15.00
Suggested price
The Book + Datasets + R Code Files

This package contains the book and R code files corresponding to each of the chapters in the book. The package also contains the datasets used in all of the chapters so that the code can be fully executed.

Includes:

  • extras
    Datasets
  • extras
    R Code Files
  • PDF

  • EPUB

  • WEB

  • English

$15.00
Minimum price
$25.00
Suggested price
The Book + Lecture Videos (HD) + Datasets + R Code Files

This package includes the book, high definition lecture video files (720p) corresponding to each of the chapters, datasets and R code files for all chapters. The videos are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.

Includes:

  • extras
    Datasets
  • extras
    R Code Files
  • extras
    Lecture Videos (HD)
  • PDF

  • EPUB

  • WEB

  • English

$30.00
Minimum price
$35.00
Suggested price

Table of Contents

  • 1. Stay in Touch!
  • 2. Preface
  • 3. Getting Started with R
    • 3.1 Installation
    • 3.2 Getting started with the R interface
  • 4. Managing Data Frames with the dplyr package
    • 4.1 Data Frames
    • 4.2 The dplyr Package
    • 4.3 dplyr Grammar
    • 4.4 Installing the dplyr package
    • 4.5 select()
    • 4.6 filter()
    • 4.7 arrange()
    • 4.8 rename()
    • 4.9 mutate()
    • 4.10 group_by()
    • 4.11 %>%
    • 4.12 Summary
  • 5. Exploratory Data Analysis Checklist
    • 5.1 Formulate your question
    • 5.2 Read in your data
    • 5.3 Check the packaging
    • 5.4 Run str()
    • 5.5 Look at the top and the bottom of your data
    • 5.6 Check your “n”s
    • 5.7 Validate with at least one external data source
    • 5.8 Try the easy solution first
    • 5.9 Challenge your solution
    • 5.10 Follow up questions
  • 6. Principles of Analytic Graphics
    • 6.1 Show comparisons
    • 6.2 Show causality, mechanism, explanation, systematic structure
    • 6.3 Show multivariate data
    • 6.4 Integrate evidence
    • 6.5 Describe and document the evidence
    • 6.6 Content, Content, Content
    • 6.7 References
  • 7. Exploratory Graphs
    • 7.1 Characteristics of exploratory graphs
    • 7.2 Air Pollution in the United States
    • 7.3 Getting the Data
    • 7.4 Simple Summaries: One Dimension
    • 7.5 Five Number Summary
    • 7.6 Boxplot
    • 7.7 Histogram
    • 7.8 Overlaying Features
    • 7.9 Barplot
    • 7.10 Simple Summaries: Two Dimensions and Beyond
    • 7.11 Multiple Boxplots
    • 7.12 Multiple Histograms
    • 7.13 Scatterplots
    • 7.14 Scatterplot - Using Color
    • 7.15 Multiple Scatterplots
    • 7.16 Summary
  • 8. Plotting Systems
    • 8.1 The Base Plotting System
    • 8.2 The Lattice System
    • 8.3 The ggplot2 System
    • 8.4 References
  • 9. Graphics Devices
    • 9.1 The Process of Making a Plot
    • 9.2 How Does a Plot Get Created?
    • 9.3 Graphics File Devices
    • 9.4 Multiple Open Graphics Devices
    • 9.5 Copying Plots
    • 9.6 Summary
  • 10. The Base Plotting System
    • 10.1 Base Graphics
    • 10.2 Simple Base Graphics
    • 10.3 Some Important Base Graphics Parameters
    • 10.4 Base Plotting Functions
    • 10.5 Base Plot with Regression Line
    • 10.6 Multiple Base Plots
    • 10.7 Summary
  • 11. Plotting and Color in R
    • 11.1 Colors 1, 2, and 3
    • 11.2 Connecting colors with data
    • 11.3 Color Utilities in R
    • 11.4 colorRamp()
    • 11.5 colorRampPalette()
    • 11.6 RColorBrewer Package
    • 11.7 Using the RColorBrewer palettes
    • 11.8 The smoothScatter() function
    • 11.9 Adding transparency
    • 11.10 Summary
  • 12. Hierarchical Clustering
    • 12.1 Hierarchical clustering
    • 12.2 How do we define close?
    • 12.3 Example: Euclidean distance
    • 12.4 Example: Manhattan distance
    • 12.5 Example: Hierarchical clustering
    • 12.6 Prettier dendrograms
    • 12.7 Merging points: Complete
    • 12.8 Merging points: Average
    • 12.9 Using the heatmap() function
    • 12.10 Notes and further resources
  • 13. K-Means Clustering
    • 13.1 Illustrating the K-means algorithm
    • 13.2 Stopping the algorithm
    • 13.3 Using the kmeans() function
    • 13.4 Building heatmaps from K-means solutions
    • 13.5 Notes and further resources
  • 14. Dimension Reduction
    • 14.1 Matrix data
    • 14.2 Patterns in rows and columns
    • 14.3 Related problem
    • 14.4 SVD and PCA
    • 14.5 Unpacking the SVD: u and v
    • 14.6 SVD for data compression
    • 14.7 Components of the SVD - Variance explained
    • 14.8 Relationship to principal components
    • 14.9 What if we add a second pattern?
    • 14.10 Dealing with missing values
    • 14.11 Example: Face data
    • 14.12 Notes and further resources
  • 15. The ggplot2 Plotting System: Part 1
    • 15.1 The Basics: qplot()
    • 15.2 Before You Start: Label Your Data
    • 15.3 ggplot2 “Hello, world!”
    • 15.4 Modifying aesthetics
    • 15.5 Adding a geom
    • 15.6 Histograms
    • 15.7 Facets
    • 15.8 Case Study: MAACS Cohort
    • 15.9 Summary of qplot()
  • 16. The ggplot2 Plotting System: Part 2
    • 16.1 Basic Components of a ggplot2 Plot
    • 16.2 Example: BMI, PM2.5, Asthma
    • 16.3 Building Up in Layers
    • 16.4 First Plot with Point Layer
    • 16.5 Adding More Layers: Smooth
    • 16.6 Adding More Layers: Facets
    • 16.7 Modifying Geom Properties
    • 16.8 Modifying Labels
    • 16.9 Customizing the Smooth
    • 16.10 Changing the Theme
    • 16.11 More Complex Example
    • 16.12 A Quick Aside about Axis Limits
    • 16.13 Resources
  • 17. Data Analysis Case Study: Changes in Fine Particle Air Pollution in the U.S.
    • 17.1 Synopsis
    • 17.2 Loading and Processing the Raw Data
    • 17.3 Results
  • 18. About the Author

The Leanpub 60-day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $12 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub