Data Wrangling with R
Data Wrangling with R


This book is no longer available for sale.

Data Wrangling with R

Last updated on 2016-02-28

About the Book

Welcome to Data Wrangling with R! In this book, I will help you learn the essentials of preprocessing data leveraging the R programming language to easily and quickly turn noisy data into usable pieces of information. Data wrangling, which is also commonly referred to as data munging, transformation, manipulation, janitor work, etc. can be a painstakenly laborious process. In fact, its been stated that up to 80% of data analysis is spent on the process of cleaning and preparing data. However, being a prerequisite to the rest of the data analysis workflow (visualization, analysis, reporting), it's essential that you become fluent and efficient in data wrangling techniques.

This book will guide you through the data wrangling process along with give you a solid foundation of working with data in R. My goal is to teach you how to easily wrangle your data, so you can spend more time focused on understanding the content of your data via visualization, analysis, and reporting. By the time you finish reading this book, you will have learned:

  • How to work with the different types of data such as numerics, characters, regular expressions, factors, and dates
  • The difference between the different data structures and how to create, add additional components to, and how to subset each data structure
  • How to acquire and parse data from locations you may not have been able to access before such as web scraping
  • How to develop your own functions and use loop control structures to reduce code redundancy
  • How to use pipe operators to simplify your code and make it more readable
  • How to reshape the layout of your data, and manipulate, summarize, and join data sets
  • Not only will you learn many base R functions, you'll also learn how to use some of the latest data wrangling packages such as tidyr, dplyr, httr, stringr, lubridate, readr, rvest, magrittr, xlsx, readxl and others.

In essence, you will have the data wrangling toolbox required for modern day data analysis.

About the Author

Bradley C. Boehmke
Bradley C. Boehmke

Brad Boehmke is an Operations Research Analyst at Headquarters Air Force Materiel Command, Studies and Analyses Division. He is also an Assistant Professor in the Operational Sciences department at the Air Force Institute of Technology.  His research interests are in the areas of cost analysis, economic modeling, decision analysis, and developing applied modeling applications through the R statistical language.

Table of Contents

    • Preface
      • Who this Book is For
      • What You Need For this Book
      • Reader Feedback
      • Colophon
  • Introduction
    • The Role of Data Wrangling
    • Introduction to R
      • Open Source
      • Flexibility
      • Community
    • R Basics
      • Assignment & Evaluation
      • Vectorization
      • Getting help
      • Workspace
      • Working with packages
      • Style guide
  • Working with Different Types of Data in R
    • Dealing with Numbers
      • Integer vs. Double
      • Generating sequence of non-random numbers
      • Generating sequence of random numbers
      • Setting the seed for reproducible random numbers
      • Comparing numeric values
      • Rounding numbers
    • Dealing with Character Strings
      • Character string basics
      • String manipulation with base R
      • String manipulation with stringr
      • Set operatons for character strings
    • Dealing with Regular Expressions
      • Regex Syntax
      • Regex Functions
      • Additional resources
    • Dealing with Factors
      • Creating, converting & inspecting factors
      • Ordering levels
      • Revalue levels
      • Dropping levels
    • Dealing with Dates
      • Getting current date & time
      • Converting strings to dates
      • Extract & manipulate parts of dates
      • Creating date sequences
      • Calculations with dates
      • Dealing with time zones & daylight savings
      • Additional resources
  • Managing Data Structures in R
    • Data Structure Basics
      • Identifying the Structure
      • Attributes
    • Managing Vectors
      • Creating
      • Adding on to
      • Adding attributes
      • Subsetting
    • Managing Lists
      • Creating
      • Adding on to
      • Adding attributes
      • Subsetting
    • Managing Matrices
      • Creating
      • Adding on to
      • Adding attributes
      • Subsetting
    • Managing Data Frames
      • Creating
      • Adding on to
      • Adding attributes
      • Subsetting
    • Dealing with Missing Values
      • Testing for missing values
      • Recoding missing values
      • Excluding missing values
  • Importing, Scraping, and Exporting Data with R
    • Importing Data
      • Reading data from text files
      • Reading data from Excel files
      • Load data from saved R object file
      • Additional resources
    • Scraping Data
      • Importing tabular and Excel files stored online
      • Scraping HTML text
      • Scraping HTML table data
      • Working with APIs
      • Additional Resources
    • Exporting Data
      • Writing data to text files
      • Writing data to Excel files
      • Saving data as an R object file
      • Additional resources
  • Creating Efficient & Readable Code in R
    • Functions
      • Function Components
      • Arguments
      • Scoping Rules
      • Lazy Evaluation
      • Returning Multiple Outputs from a Function
      • Dealing with Invalid Parameters
      • Saving and Sourcing Functions
      • Additional Resources
    • Loop Control Statements
      • Basic control statements (i.e. if, for, while, etc.)
      • Apply family
      • Other useful “loop-like” functions
      • Additional Resources
    • Simplify Your Code with %>%
      • Pipe (%>%) Operator
      • Additional Functions
      • Additional Pipe Operators
      • Additional Resources
  • Shaping & Transforming Your Data with R
    • Reshaping Your Data with tidyr
      • Making wide data long
      • Making long data wide
      • Splitting a single column into multiple columns
      • Combining multiple columns into a single column
      • Additional tidyr functions
      • Sequencing your tidyr operations
      • Additional resources
    • Transforming Your Data with dplyr
      • Selecting variables of interest
      • Filtering rows
      • Grouping data by categorical variables
      • Performing summary statistics on variables
      • Arranging variables by value
      • Joining datasets
      • Creating new variables
      • Additional resources

Authors have earned$9,108,966writing, publishing and selling on Leanpub,
earning 80% royalties while saving up to 25 million pounds of CO2 and up to 46,000 trees.

Learn more about writing on Leanpub

The Leanpub 45-day 100% Happiness Guarantee

Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses! Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks. Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. It really is that easy.

Learn more about writing on Leanpub