Effective Data Wrangling and Exploration with R
Minimum price
Suggested price

Effective Data Wrangling and Exploration with R

About the Book

Data wrangling is one of the most important steps in data science and analytics, for it is claimed that it takes between 80% to 90% of an analyst’s time. Data wrangling goes by many names including data munging, data manipulation, data preparation and data transformations. Just as there are many names to data wrangling, there are also many definitions to it. Below we look at two of the most important ones:

TRIFACTA which is a leading provider of data wrangling software by the same name defines data wrangling as:

“Data wrangling is the process of cleaning, structuring and enriching raw data into a desired format for better decision making in less time”.

Gartner defines data wrangling as:

“Data preparation is an iterative-agile process for exploring, combining, cleaning, and transforming raw data into curated datasets for self-service data integration, data science, data discovery, and BI/analytics”.

Clearly from the above, we can deduce that data wrangling is the process of converting raw data from one form to another that is appropriate for a specific task at hand. It is rare in analytics to receive data in the form and shape that we want to perform our analysis. Most often, we will be required to transform, clean, enrich and explore that data before we move to our analysis. 

Data wrangling involves:

  • Importing and exporting data: to and from csv, excel, databases etc.
  • Cleaning data: identifying and dealing with missing data, outliers, and duplicates
  • Manipulating text and categorical data
  • Manipulating dates
  • Encoding and enriching data
  • Manipulating columns and rows
  • Split-apply-combine data
  • Merging data
  • Reshaping data
  • Grouping and Aggregating data
  • Exploring data

Data exploration is and should be the initial step of any data analysis project. It is a mini form of data analysis in which we make use of both descriptive statistics and data visualization techniques to better understand our dataset. With traditional analysis and research, we know with exactitude what we are after (that is the hypothesis is known) before collecting data. With exploratory analysis, the process is reversed; we assume little or no information about the outcome of the analysis but instead explore the data to come up with some meaningful insight or hypothesis. Data exploration involves:

  • looking at the structure and size of the data
  • looking at the completeness and correctness of the data
  • looking at the possible relationships that may exist between data elements

As can be observed, the boundary between data exploration and data wrangling is blurred because both make use of data cleaning techniques to make sure that the data is correct and complete for data analysis.

This book is all about data wrangling and exploration as important steps leading up to data analysis.

How is this Book Structured

It is divided into seven parts which include:

  • Part1: Programming with R (chapter 1 to 15)
  • Part2: Import and export data (chapter 16 to 18)
  • Part3: String and categorical data manipulation (chapter 19 to 21)
  • Part4: Date manipulation (chapter 22 to 24)
  • Part5: Data manipulation (chapter 25 to 28)
  • Part6: Data cleaning (chapter 29 to 30)
  • Part7: Data exploration (chapter 31 to 32)

About the Author

Table of Contents

  • Part1: Programming with R (chapter 1 to 15)
      Chapter 1: Introduction
      Chapter 2: Variables and Data types
      Chapter 3: Operators
      Chapter 4: Data Structures I - Atomic Vectors
      Chapter 5: Data Structures II - Matrices and Arrays
      Chapter 6: Data Structures III - factors
      Chapter 7: Data Structures IV - Recursive Vectors (lists)
      Chapter 8: Data Structures V - data frames
      Chapter 9: Control flows
      Chapter 10: Functions I - Built-in functions
      Chapter 11: Functions II - User-defined functions
      Chapter 12: Importing and exporting data
      Chapter 13: Packages
      Chapter 14: Introduction to plotting with base graphics
      Chapter 15: Statistical plots with base graphics
  • Part2: Import and export data (chapter 16 to 18)
      Chapter 16: Import and export data from a delimited text file
      Chapter 17: Import and export data from excel
      Chapter 18: Import and export data from statistical software files and others
  • Part3: String and categorical data manipulation (chapter 19 to 21)
      Chapter 19: String manipulation with base R
      Chapter 20: String manipulation with stringr
      Chapter 21: Manipulating categorical data with forcats
  • Part4: Date manipulation (chapter 22 to 24)
      Chapter 22: Date manipulation with base R
      Chapter 23: Date Manipulation with chron
      Chapter 24: Date Manipulation with lubridate
  • Part5: Data manipulation (chapter 25 to 28)
      Chapter 25: Data Manipulation with Base R
      Chapter 26: Data Manipulation with dplyr and tidyr
      Chapter 27: Data Manipulation with data.table
      Chapter 28: Data Manipulation with SQL in R
  • Part6: Data cleaning (chapter 29 to 30)
      Chapter 29: Detecting and dealing with missing values and outliers
      Chapter 30: Dealing with duplicate values
  • Part7: Data exploration (chapter 31 to 32)
      Chapter 31: Intro to plotting with ggplot2
      Chapter 32: Statistical plots with ggplot2

The Leanpub 60-day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $12 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub