Exploratory Data Analysis with R
Exploratory Data Analysis with R
About the Book
This book covers the essential exploratory techniques for summarizing data with R. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data you have. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing informative data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.
If you are interested in a printed copy of this book, you can purchase one at Lulu.
Some of the topics we cover are
- Making exploratory graphs
- Principles of analytic graphics
- Plotting systems and graphics devices in R
- The base and ggplot2 plotting systems in R
- Clustering methods
- Dimension reduction techniques
This package contains just the book in PDF, EPUB, or MOBI formats.
The Book + Datasets + R Code Files
This package contains the book and R code files corresponding to each of the chapters in the book. The package also contains the datasets used in all of the chapters so that the code can be fully executed.
The Book + Lecture Videos (HD) + Datasets + R Code Files
This package includes the book, high definition lecture video files (720p) corresponding to each of the chapters, datasets and R code files for all chapters. The videos are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.
- 1. Stay in Touch!
- 2. Preface
3. Getting Started with R
- 3.1 Installation
- 3.2 Getting started with the R interface
4. Managing Data Frames with the
- 4.1 Data Frames
4.4 Installing the
- 4.12 Summary
5. Exploratory Data Analysis Checklist
- 5.1 Formulate your question
- 5.2 Read in your data
- 5.3 Check the packaging
- 5.5 Look at the top and the bottom of your data
- 5.6 Check your “n”s
- 5.7 Validate with at least one external data source
- 5.8 Try the easy solution first
- 5.9 Challenge your solution
- 5.10 Follow up questions
6. Principles of Analytic Graphics
- 6.1 Show comparisons
- 6.2 Show causality, mechanism, explanation, systematic structure
- 6.3 Show multivariate data
- 6.4 Integrate evidence
- 6.5 Describe and document the evidence
- 6.6 Content, Content, Content
- 6.7 References
7. Exploratory Graphs
- 7.1 Characteristics of exploratory graphs
- 7.2 Air Pollution in the United States
- 7.3 Getting the Data
- 7.4 Simple Summaries: One Dimension
- 7.5 Five Number Summary
- 7.6 Boxplot
- 7.7 Histogram
- 7.8 Overlaying Features
- 7.9 Barplot
- 7.10 Simple Summaries: Two Dimensions and Beyond
- 7.11 Multiple Boxplots
- 7.12 Multiple Histograms
- 7.13 Scatterplots
- 7.14 Scatterplot - Using Color
- 7.15 Multiple Scatterplots
- 7.16 Summary
8. Plotting Systems
- 8.1 The Base Plotting System
- 8.2 The Lattice System
- 8.3 The ggplot2 System
- 8.4 References
9. Graphics Devices
- 9.1 The Process of Making a Plot
- 9.2 How Does a Plot Get Created?
- 9.3 Graphics File Devices
- 9.4 Multiple Open Graphics Devices
- 9.5 Copying Plots
- 9.6 Summary
10. The Base Plotting System
- 10.1 Base Graphics
- 10.2 Simple Base Graphics
- 10.3 Some Important Base Graphics Parameters
- 10.4 Base Plotting Functions
- 10.5 Base Plot with Regression Line
- 10.6 Multiple Base Plots
- 10.7 Summary
11. Plotting and Color in R
- 11.1 Colors 1, 2, and 3
- 11.2 Connecting colors with data
- 11.3 Color Utilities in R
- 11.6 RColorBrewer Package
- 11.7 Using the RColorBrewer palettes
- 11.9 Adding transparency
- 11.10 Summary
12. Hierarchical Clustering
- 12.1 Hierarchical clustering
- 12.2 How do we define close?
- 12.3 Example: Euclidean distance
- 12.4 Example: Manhattan distance
- 12.5 Example: Hierarchical clustering
- 12.6 Prettier dendrograms
- 12.7 Merging points: Complete
- 12.8 Merging points: Average
12.9 Using the
- 12.10 Notes and further resources
13. K-Means Clustering
- 13.1 Illustrating the K-means algorithm
- 13.2 Stopping the algorithm
13.3 Using the
- 13.4 Building heatmaps from K-means solutions
- 13.5 Notes and further resources
14. Dimension Reduction
- 14.1 Matrix data
- 14.2 Patterns in rows and columns
- 14.3 Related problem
- 14.4 SVD and PCA
- 14.5 Unpacking the SVD: u and v
- 14.6 SVD for data compression
- 14.7 Components of the SVD - Variance explained
- 14.8 Relationship to principal components
- 14.9 What if we add a second pattern?
- 14.10 Dealing with missing values
- 14.11 Example: Face data
- 14.12 Notes and further resources
15. The ggplot2 Plotting System: Part 1
15.1 The Basics:
- 15.2 Before You Start: Label Your Data
- 15.3 ggplot2 “Hello, world!”
- 15.4 Modifying aesthetics
- 15.5 Adding a geom
- 15.6 Histograms
- 15.7 Facets
- 15.8 Case Study: MAACS Cohort
- 15.9 Summary of qplot()
- 15.1 The Basics:
16. The ggplot2 Plotting System: Part 2
- 16.1 Basic Components of a ggplot2 Plot
- 16.2 Example: BMI, PM2.5, Asthma
- 16.3 Building Up in Layers
- 16.4 First Plot with Point Layer
- 16.5 Adding More Layers: Smooth
- 16.6 Adding More Layers: Facets
- 16.7 Modifying Geom Properties
- 16.8 Modifying Labels
- 16.9 Customizing the Smooth
- 16.10 Changing the Theme
- 16.11 More Complex Example
- 16.12 A Quick Aside about Axis Limits
- 16.13 Resources
17. Data Analysis Case Study: Changes in Fine Particle Air Pollution in the U.S.
- 17.1 Synopsis
- 17.2 Loading and Processing the Raw Data
- 17.3 Results
- 18. About the Author
The Leanpub 45-day 100% Happiness Guarantee
Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
C++ Best PracticesJason Turner
Level up your C++, get the tools working for you, eliminate common problems, and move on to more exciting things!
Digital-First EventsJoep Piscaer and Jana Boruta
The only resource you will ever need to launch your digital events program.
Algebra-Driven DesignSandy Maguire
A how-to field guide on building leak-free abstractions and algebraically designing real-world applications.
Ansible for DevOpsJeff Geerling
Ansible is a simple, but powerful, server and configuration management tool. Learn to use Ansible effectively, whether you manage one server—or thousands.
R Programming for Data ScienceRoger D. Peng
This book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science. Printed copies of this book are available through Lulu.
Continuous Delivery PipelinesDave Farley
This practical handbook provides a step-by-step guide for you to get the best continuous delivery pipeline for your software.
Cloud StrategyGregor Hohpe
“Strategy is the difference between making a wish and making it come true.” A successful migration to the cloud can transform your organization, but it shouldn’t be driven by wishes. This book tells you how to develop a sound strategy guided by frameworks and decision models without being overly abstract nor getting lost in product details.
node-opcua by exampleEtienne Rossignon
Get the best out of node-opcua through a set of documented examples by the author himself that will allow you to create stunning OPCUA Servers or Clients.
Technical leadership and the balance with agilitySimon Brown
A developer-friendly, practical and pragmatic guide to lightweight software architecture, technical leadership and the balance with agility.
Everyday Rails - RSpecによるRailsテスト入門Junichi Ito (伊藤淳一), AKIMOTO Toshiharu, 魚振江, and Aaron Sumner
RSpecを使ってRailsアプリケーションに信頼性の高いテストを書く実践的なアドバイスを提供します。詳細で丁寧な説明は本書のオリジナルコンテンツです。また、説明には実際に動かせるサンプルアプリケーションも使用します。本書は2017年版にアップデートされ、RSpec 3.6やRails 5.1といった新しい環境に対応しています！さあ、自信をもってテストできるようになりましょう！
Software Architecture for Developers: Volumes 1 & 2 - Technical leadership and communication
2 Books"Software Architecture for Developers" is a practical and pragmatic guide to modern, lightweight software architecture, specifically aimed at developers. You'll learn:The essence of software architecture.Why the software architecture role should include coding, coaching and collaboration.The things that you really need to think about before...
CCIE Service Provider Ultimate Study Bundle
2 BooksPiotr Jablonski, Lukasz Bromirski, and Nick Russo have joined forces to deliver the only CCIE Service Provider training resource you'll ever need. This bundle contains a detailed and challenging collection of workbook labs, plus an extensively detailed technical reference guide. All of us have earned the CCIE Service Provider certification...
Modern C++ by Nicolai Josuttis
Django for Beginners/APIs/Professionals
Modern Management Made Easy
3 BooksRead all three Modern Management Made Easy books. Learn to manage yourself, lead and serve others, and lead the organization.
Cisco CCNA 200-301 Complet
4 BooksCe lot comprend les quatre volumes du guide préparation à l'examen de certification Cisco CCNA 200-301.
2 BooksDocker and Kubernetes are taking the world by storm! These books will get you up-to-speed fast! Docker Deep Dive is over 400 pages long, and covers all objectives on the Docker Certified Associate exam.The Kubernetes Book includes everything you need to get up and running with Kubernetes!
The Python Craftsman
3 BooksThe Python Craftsman series comprises The Python Apprentice, The Python Journeyman, and The Python Master. The first book is primarily suitable for for programmers with some experience of programming in another language. If you don't have any experience with programming this book may be a bit daunting. You'll be learning not just a programming...
CCDE Practical Studies (All labs)
3 BooksCCDE lab
Linux Administration Complet
4 BooksCe lot comprend les quatre volumes du Guide Linux Administration :Linux Administration, Volume 1, Administration fondamentale : Guide pratique de préparation aux examens de certification LPIC 1, Linux Essentials, RHCSA et LFCS. Administration fondamentale. Introduction à Linux. Le Shell. Traitement du texte. Arborescence de fichiers. Sécurité...