Data Wrangling with R
Data Wrangling with R
About the Book
Welcome to Data Wrangling with R! In this book, I will help you learn the essentials of preprocessing data leveraging the R programming language to easily and quickly turn noisy data into usable pieces of information. Data wrangling, which is also commonly referred to as data munging, transformation, manipulation, janitor work, etc. can be a painstakenly laborious process. In fact, its been stated that up to 80% of data analysis is spent on the process of cleaning and preparing data. However, being a prerequisite to the rest of the data analysis workflow (visualization, analysis, reporting), it's essential that you become fluent and efficient in data wrangling techniques.
This book will guide you through the data wrangling process along with give you a solid foundation of working with data in R. My goal is to teach you how to easily wrangle your data, so you can spend more time focused on understanding the content of your data via visualization, analysis, and reporting. By the time you finish reading this book, you will have learned:
- How to work with the different types of data such as numerics, characters, regular expressions, factors, and dates
- The difference between the different data structures and how to create, add additional components to, and how to subset each data structure
- How to acquire and parse data from locations you may not have been able to access before such as web scraping
- How to develop your own functions and use loop control structures to reduce code redundancy
- How to use pipe operators to simplify your code and make it more readable
- How to reshape the layout of your data, and manipulate, summarize, and join data sets
- Not only will you learn many base R functions, you'll also learn how to use some of the latest data wrangling packages such as tidyr, dplyr, httr, stringr, lubridate, readr, rvest, magrittr, xlsx, readxl and others.
In essence, you will have the data wrangling toolbox required for modern day data analysis.
Table of Contents
-
-
Preface
- Who this Book is For
- What You Need For this Book
- Reader Feedback
- Colophon
-
Preface
-
Introduction
- The Role of Data Wrangling
-
Introduction to R
- Open Source
- Flexibility
- Community
-
R Basics
- Assignment & Evaluation
- Vectorization
- Getting help
- Workspace
- Working with packages
- Style guide
-
Working with Different Types of Data in R
-
Dealing with Numbers
- Integer vs. Double
- Generating sequence of non-random numbers
- Generating sequence of random numbers
- Setting the seed for reproducible random numbers
- Comparing numeric values
- Rounding numbers
-
Dealing with Character Strings
- Character string basics
- String manipulation with base R
-
String manipulation with
stringr
- Set operatons for character strings
-
Dealing with Regular Expressions
- Regex Syntax
- Regex Functions
- Additional resources
-
Dealing with Factors
- Creating, converting & inspecting factors
- Ordering levels
- Revalue levels
- Dropping levels
-
Dealing with Dates
- Getting current date & time
- Converting strings to dates
- Extract & manipulate parts of dates
- Creating date sequences
- Calculations with dates
- Dealing with time zones & daylight savings
- Additional resources
-
Dealing with Numbers
-
Managing Data Structures in R
-
Data Structure Basics
- Identifying the Structure
- Attributes
-
Managing Vectors
- Creating
- Adding on to
- Adding attributes
- Subsetting
-
Managing Lists
- Creating
- Adding on to
- Adding attributes
- Subsetting
-
Managing Matrices
- Creating
- Adding on to
- Adding attributes
- Subsetting
-
Managing Data Frames
- Creating
- Adding on to
- Adding attributes
- Subsetting
-
Dealing with Missing Values
- Testing for missing values
- Recoding missing values
- Excluding missing values
-
Data Structure Basics
-
Importing, Scraping, and Exporting Data with R
-
Importing Data
- Reading data from text files
- Reading data from Excel files
- Load data from saved R object file
- Additional resources
-
Scraping Data
- Importing tabular and Excel files stored online
- Scraping HTML text
- Scraping HTML table data
- Working with APIs
- Additional Resources
-
Exporting Data
- Writing data to text files
- Writing data to Excel files
- Saving data as an R object file
- Additional resources
-
Importing Data
-
Creating Efficient & Readable Code in R
-
Functions
- Function Components
- Arguments
- Scoping Rules
- Lazy Evaluation
- Returning Multiple Outputs from a Function
- Dealing with Invalid Parameters
- Saving and Sourcing Functions
- Additional Resources
-
Loop Control Statements
-
Basic control statements (i.e.
if
,for
,while
, etc.) - Apply family
- Other useful “loop-like” functions
- Additional Resources
-
Basic control statements (i.e.
-
Simplify Your Code with
%>%
- Pipe (%>%) Operator
- Additional Functions
- Additional Pipe Operators
- Additional Resources
-
Functions
-
Shaping & Transforming Your Data with R
-
Reshaping Your Data with
tidyr
- Making wide data long
- Making long data wide
- Splitting a single column into multiple columns
- Combining multiple columns into a single column
-
Additional
tidyr
functions -
Sequencing your
tidyr
operations - Additional resources
-
Transforming Your Data with
dplyr
- Selecting variables of interest
- Filtering rows
- Grouping data by categorical variables
- Performing summary statistics on variables
- Arranging variables by value
- Joining datasets
- Creating new variables
- Additional resources
-
Reshaping Your Data with
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
Earn $8 on a $10 Purchase, and $16 on a $20 Purchase
We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them