Exploratory Data Analysis with R
Exploratory Data Analysis with R
About the Book
This book covers the essential exploratory techniques for summarizing data with R. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data you have. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing informative data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.
If you are interested in a printed copy of this book, you can purchase one at Lulu.
Some of the topics we cover are
- Making exploratory graphs
- Principles of analytic graphics
- Plotting systems and graphics devices in R
- The base and ggplot2 plotting systems in R
- Clustering methods
- Dimension reduction techniques
This package contains just the book in PDF, EPUB, or MOBI formats.
The Book + Datasets + R Code Files
This package contains the book and R code files corresponding to each of the chapters in the book. The package also contains the datasets used in all of the chapters so that the code can be fully executed.
The Book + Lecture Videos (HD) + Datasets + R Code Files
This package includes the book, high definition lecture video files (720p) corresponding to each of the chapters, datasets and R code files for all chapters. The videos are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.
- 1. Stay in Touch!
- 2. Preface
3. Getting Started with R
- 3.1 Installation
- 3.2 Getting started with the R interface
4. Managing Data Frames with the
- 4.1 Data Frames
4.4 Installing the
- 4.12 Summary
5. Exploratory Data Analysis Checklist
- 5.1 Formulate your question
- 5.2 Read in your data
- 5.3 Check the packaging
- 5.5 Look at the top and the bottom of your data
- 5.6 Check your “n”s
- 5.7 Validate with at least one external data source
- 5.8 Try the easy solution first
- 5.9 Challenge your solution
- 5.10 Follow up questions
6. Principles of Analytic Graphics
- 6.1 Show comparisons
- 6.2 Show causality, mechanism, explanation, systematic structure
- 6.3 Show multivariate data
- 6.4 Integrate evidence
- 6.5 Describe and document the evidence
- 6.6 Content, Content, Content
- 6.7 References
7. Exploratory Graphs
- 7.1 Characteristics of exploratory graphs
- 7.2 Air Pollution in the United States
- 7.3 Getting the Data
- 7.4 Simple Summaries: One Dimension
- 7.5 Five Number Summary
- 7.6 Boxplot
- 7.7 Histogram
- 7.8 Overlaying Features
- 7.9 Barplot
- 7.10 Simple Summaries: Two Dimensions and Beyond
- 7.11 Multiple Boxplots
- 7.12 Multiple Histograms
- 7.13 Scatterplots
- 7.14 Scatterplot - Using Color
- 7.15 Multiple Scatterplots
- 7.16 Summary
8. Plotting Systems
- 8.1 The Base Plotting System
- 8.2 The Lattice System
- 8.3 The ggplot2 System
- 8.4 References
9. Graphics Devices
- 9.1 The Process of Making a Plot
- 9.2 How Does a Plot Get Created?
- 9.3 Graphics File Devices
- 9.4 Multiple Open Graphics Devices
- 9.5 Copying Plots
- 9.6 Summary
10. The Base Plotting System
- 10.1 Base Graphics
- 10.2 Simple Base Graphics
- 10.3 Some Important Base Graphics Parameters
- 10.4 Base Plotting Functions
- 10.5 Base Plot with Regression Line
- 10.6 Multiple Base Plots
- 10.7 Summary
11. Plotting and Color in R
- 11.1 Colors 1, 2, and 3
- 11.2 Connecting colors with data
- 11.3 Color Utilities in R
- 11.6 RColorBrewer Package
- 11.7 Using the RColorBrewer palettes
- 11.9 Adding transparency
- 11.10 Summary
12. Hierarchical Clustering
- 12.1 Hierarchical clustering
- 12.2 How do we define close?
- 12.3 Example: Euclidean distance
- 12.4 Example: Manhattan distance
- 12.5 Example: Hierarchical clustering
- 12.6 Prettier dendrograms
- 12.7 Merging points: Complete
- 12.8 Merging points: Average
12.9 Using the
- 12.10 Notes and further resources
13. K-Means Clustering
- 13.1 Illustrating the K-means algorithm
- 13.2 Stopping the algorithm
13.3 Using the
- 13.4 Building heatmaps from K-means solutions
- 13.5 Notes and further resources
14. Dimension Reduction
- 14.1 Matrix data
- 14.2 Patterns in rows and columns
- 14.3 Related problem
- 14.4 SVD and PCA
- 14.5 Unpacking the SVD: u and v
- 14.6 SVD for data compression
- 14.7 Components of the SVD - Variance explained
- 14.8 Relationship to principal components
- 14.9 What if we add a second pattern?
- 14.10 Dealing with missing values
- 14.11 Example: Face data
- 14.12 Notes and further resources
15. The ggplot2 Plotting System: Part 1
15.1 The Basics:
- 15.2 Before You Start: Label Your Data
- 15.3 ggplot2 “Hello, world!”
- 15.4 Modifying aesthetics
- 15.5 Adding a geom
- 15.6 Histograms
- 15.7 Facets
- 15.8 Case Study: MAACS Cohort
- 15.9 Summary of qplot()
- 15.1 The Basics:
16. The ggplot2 Plotting System: Part 2
- 16.1 Basic Components of a ggplot2 Plot
- 16.2 Example: BMI, PM2.5, Asthma
- 16.3 Building Up in Layers
- 16.4 First Plot with Point Layer
- 16.5 Adding More Layers: Smooth
- 16.6 Adding More Layers: Facets
- 16.7 Modifying Geom Properties
- 16.8 Modifying Labels
- 16.9 Customizing the Smooth
- 16.10 Changing the Theme
- 16.11 More Complex Example
- 16.12 A Quick Aside about Axis Limits
- 16.13 Resources
17. Data Analysis Case Study: Changes in Fine Particle Air Pollution in the U.S.
- 17.1 Synopsis
- 17.2 Loading and Processing the Raw Data
- 17.3 Results
- 18. About the Author
The Leanpub 45-day 100% Happiness Guarantee
Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
El Manual del ManagerKeyvan Akbary, Félix López, and Álvaro Salazar
¿Has deseado alguna vez el haber tenido una buena introducción al rol del Engineering Manager? En este libro aprenderás lo necesario para ejercer el rol de una manera efectiva: Expectativas y Responsabilidades del Rol, 1-1s, Ayudar a Crecer, Objetivos, Planes de Carrera, Cultura, Feedback, Contratación, Cultura de Producto y mucho más.
Functional Design and ArchitectureAlexander Granin
Software Design in Functional Programming, Design Patterns and Practices, Methodologies and Application Architectures. How to build real software in Haskell with less efforts and low risks. The first complete source of knowledge.
Ansible for KubernetesJeff Geerling
Ansible is a powerful infrastructure automation tool. Kubernetes is a powerful application deployment platform. Learn how to use these tools to automate massively-scalable, highly-available infrastructure.
CCIE Service Provider Version 4 Written and Lab Exam Comprehensive GuideNicholas Russo
The service provider landscape has changed rapidly over the past several years. Networking vendors are continuing to propose new standards, techniques, and procedures for overcoming new challenges while concurrently reducing costs and delivering new services. Cisco has recently updated the CCIE Service Provider track to reflect these changes; this book represents the author's personal journey in achieving that certification.
CCIE SP v4.1 - WorkbookŁukasz Bromirski, Piotr Jablonski, and Nicholas Russo
Are you striving to prepare to and pass CCIE SP lab exam? Take the opportunity and get this workbook! With the attached initial cfg files you will prepare yourself for the CCIE SP exam as well as learn SP technologies applicable to all kinds of today modern networks! This workbook covers blueprint topics and provides challenging examples.
Practical FP in Scala: A hands-on approachGabriel Volpe
A practical book aimed for those familiar with functional programming in Scala who are yet not confident about architecting an application from scratch.
Together, we will develop a purely functional application using the best libraries in the Cats ecosystem, while learning about design patterns and best practices.
Ansible for DevOpsJeff Geerling
Ansible is a simple, but powerful, server and configuration management tool. Learn to use Ansible effectively, whether you manage one server—or thousands.
C++ Best PracticesJason Turner
Level up your C++, get the tools working for you, eliminate common problems, and move on to more exciting things!
Tame your Work FlowSteve Tendon and Daniel Doiron
Do you need a high performance enterprise governance approach improving management, execution and delivery while dealing with multiple projects/products, events, stakeholders and teams? Giving you better bottom line results, faster time to market, less work, better predictability, happier employees, and delighted clients? Then learn about TameFlow!
R Programming for Data ScienceRoger D. Peng
This book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science. Printed copies of this book are available through Lulu.
11 BooksThe Quality Software Bundle is for managers, would-be managers, and any of us who find themselves being managed and confused. This comprehensive bundle covers the entire span of software development approaches, from hacking through waterfall, cascade, prototyping, Iterative enhancement, reusable code, off-the-shelf, to Agile teams. The bundle...
The Node.js Bundle
3 BooksThis bundle combines three bestselling Leanpub Node.js books into a package that gives you everything you need to get started with developing Node.js applications at an unbeatable price.
The Tester's Library
8 BooksThe Tester's Library consists of eight five-star books that every software tester should read and re-read. As bound books, this collection would cost over $200. Even as e-books, their price would exceed $80, but in this bundle, their cost is only $49.99. Here are the books, and why they should be in your library: Perfect Software and Other...
11 BooksIn this bundle, you will find 10 different agile books. They are about different aspects of being agile. - finding a job - doing coding dojo's - Retrospectives - Personal kanban - a non-typical coaching book and even a book that gives you an insight in the lives of some agile people.
WTFlop 6M + HU - Beta Bundle
Growing Agile: Coach's Guide Series
4 BooksThis bundle provides a collection of training and workshop plans for a variety of agile topics. The series is aimed at agile coaches, trainers and ScrumMasters who often find themselves needing to help teams understand agile concepts. Each book in the series provides the plans, slides, handouts and activity instructions to run a number of...
Marionette.js A to Z
Complete Scala Bundle
3 BooksScala is a general-purpose programming language and it's getting extremely popular these days. Some say that learning Scala could be a challenging task. My experience, however, suggests that this is actually a myth that has very little to do with reality. With the right approach, learning Scala can be easy, fun and rewarding.The first book from...
Build A Better Backbone App
3 BooksThe best way to learn new development skills is through experience, but that takes time you don't have.Get the best of both worlds with this bundle: you'll learn how to produce modern web applications by learning from experienced developers like Derick Bailey and David Sulc. BackboneJS is one of the favorite tools on the web today, but it...
People Skills—Soft but Difficult
7 BooksPerhaps you've been told that "lack of people skills" has been holding you back. No wonder: you may have had hundreds of hours of technical training, but little or no "people skills" guidance.You've heard it said that people skills are "soft," whereas technical skills are "hard." For you, though, technical skills are "easy," but people skills...