Business Intelligence with R
Minimum price
Suggested price

Business Intelligence with R

From Acquiring Data to Pattern Exploration

About the Book

This is now coming up on 2 years old, and lots has changed in the R ecosystem in those 2 years, so I am dropping the suggested price to free. I don't have any current plans to update it, though if someone wanted to partner with me, I'd love to join forces!

The growth of R has been phenomenal over the past several years, reaching out past academia and research labs into the daily activities of business and industry. More and more line-level analysts are supplementing their SQL or Excel skills with R, and more and more businesses are hiring analysts with R skills to bridge the increasing gap between business needs and on-hand analytic resources.

While there are already many books that explore the use of R in statistics and data science, many of them are set as a series of case studies or are written predominantly for academic audiences as textbooks. There are also many R cookbooks, but they tend to cover the breadth of R's capabilities and options. This book aims to provide a cookbook of R recipes that are specifically of use in the daily workflow of data scientists and analysts in business and industry.

Business Intelligence with R is a practical, hands-on overview of many of the major BI/analytic tasks that can be accomplished with R. It is not meant to be exhaustive--there is always more than one way to accomplish a given task in R, so this book aims to provide the simplest and/or most robust approaches to meet daily workflow needs. It can serve as the go-to desk reference for the professional analyst who needs to get things done in R.

From setting up a project under version control to creating an interactive dashboard of a data product, this book will provide you with the pieces to put it all together.

About the Author

Dwight Barry
Dwight Barry

Dwight Barry is a Lead Data Scientist at Seattle Children's Hospital in Seattle, Washington, USA. 

Table of Contents

    • About
        • About the Author
        • Session Info
        • Version Info
        • Code and Data
        • Cover Image
        • Proceeds
        • Contact
    • install.packages
    • Introduction
      • Overview
      • Conventions
      • What you need
      • Acknowledgments
      • Website/Code
      • Happy coding!
    • Chapter 1: An entire project in a few lines of code
      • The analytics problem
      • Set up
      • Acquire data
      • Wrangling data
      • Analytics
        • Explore the data
        • Run a forecasting model
      • Reporting
        • Create an interactive HTML plot
      • Documenting the project
      • Summary
    • Chapter 2: Getting Data
      • Working with files
        • Reading flat files from disk or the web
        • Reading big files with data.table
        • Unzipping files within R
        • Reading Excel files
        • Creating a dataframe from the clipboard or direct entry
        • Reading XML files
        • Reading JSON files
      • Working with databases
        • Connecting to a database
        • Creating data frames from a database
        • Disconnecting from a database
        • Creating a SQLite database inside R
        • Creating a dataframe from a SQLite database
      • Getting data from the web
        • Working through a proxy
        • Scraping data from a web table
        • Working with APIs
      • Creating fake data to test code
      • Writing files to disk
    • Chapter 3: Cleaning and Preparing Data
      • Understanding your data
        • Identifying data types and overall structure
        • Identifying (and ordering) factor levels
        • Identifying unique values or duplicates
        • Sorting
      • Cleaning up
        • Converting between data types
        • Cleaning up a web-scraped table: Regular expressions and more
        • Missing data: Converting dummy NA values
        • Rounding
        • Concatenation
      • Merging dataframes
        • Joins
        • Unions and bindings
      • Subsetting: filter and select from a dataframe
      • Creating a derived column
      • Peeking at the outcome with dplyr
      • Reshaping a dataframe between wide and long
        • Reshaping: melt and cast
        • Reshaping: Crossing variables with ~ and +
        • Summarizing while reshaping
      • Piping with %>%: Stringing it together in dplyr
    • Chapter 4: Know Thy Data—Exploratory Data Analysis
      • Creating summary plots
        • Everything at once: ggpairs
        • Create histograms of all numeric variables in one plot
        • A better “pairs” plot
        • Mosaic plot matrix: “Scatterplot” matrix for categorical data
      • Plotting univariate distributions
        • Histograms and density plots
        • Bar and dot plots
        • Plotting multiple univariate distributions with faceting
      • Plotting bivariate and comparative distributions
        • Double density plots
        • Boxplots
        • Beanplots
        • Scatterplots and marginal distributions
        • Mosaic plots
        • Multiple bivariate comparisons with faceting
      • Pareto charts
      • Plotting survey data
      • Obtaining summary and conditional statistics
      • Finding the mode or local maxima/minima
      • Inference on summary statistics
        • Confidence intervals
        • Tolerance intervals
      • Dealing with missing data
        • Visualizing missing data
        • Imputation for missing values
    • Chapter 5: Effect Sizes
      • Overview
      • Effect sizes: Measuring differences between groups
        • Basic differences
        • Standardized differences
        • Determining the probability of a difference
      • Effect sizes: Measuring similarities between groups
        • Correlation
        • Bootstrapping BCa CIs for non-parametric correlation
        • Determining the probability of a correlation
        • Partial correlations
        • Polychoric and polyserial correlation for ordinal data
        • Associations between categorical variables
        • Cohen’s kappa for comparisons of agreement
        • Regression coefficient
        • R: Proportion of variance explained
    • Chapter 6: Trends and Time
      • Describing trends in non-temporal data
        • Smoothed trends
        • Quantile trends
        • Simple linear trends
        • Segmented linear trends
        • The many flavors of regression
        • Plotting regression coefficients
      • Working with temporal data
        • Calculate a mean or correlation in circular time (clock time)
        • Plotting time-series data
        • Detecting autocorrelation
        • Plotting monthly and seasonal patterns
        • Plotting seasonal adjustment on the fly
        • Decomposing time series into components
        • Using spectral analysis to identify periodicity
        • Plotting survival curves
        • Evaluating quality with control charts
        • Identifying possible breakpoints in a time series
        • Exploring relationships between time series: cross-correlation
        • Basic forecasting
    • Chapter 7: A Dog’s Breakfast of Dataviz
      • Plotting multivariate distributions
        • Heatmaps
        • Creating calendar heatmaps
        • Parallel coordinates plots
        • Peeking at multivariate data with dplyr and a bubblechart
      • Plotting a table
      • Interactive dataviz
        • Basic interactive plots
        • Scatterplot matrix
        • Motionchart: a moving bubblechart
        • Interactive parallel coordinates
        • Interactive tables with DT
      • Making maps in R
        • Basic point maps
        • Chloropleth maps
        • Chloropleth mapping with the American Community Survey
        • Using shapefiles and raw data
        • Using ggmap for point data and heatmaps
        • Interactive maps with leaflet
        • Why map with R?
    • Chapter 8: Pattern Discovery and Dimension Reduction
      • Mapping multivariate relationships
        • Non-metric multidimensional scaling (nMDS)
        • Diagnostics for nMDS results
        • Vector mapping influential variables over the nMDS plot
        • Contour mapping influential variables over the nMDS plot
        • Principal Components Analysis (PCA)
        • nMDS for Categories: Correspondence Analysis
      • Cluster analysis
        • Grouping observations with hierarchical clustering
        • Plotting a cluster dendrogram with ggplot
        • Exploring hierarchical clustering of nMDS results
        • How to partition the results of a hierarchical cluster analysis
        • Identifying and describing group membership with kMeans and PAM
        • How to choose an optimal number of clusters with bootstrapping
        • Determining optimal numbers of clusters with model-based clustering
        • Identifying group membership with irregular clusters
        • Variable selection in cluster analysis
        • Error-checking cluster results with known outcomes
      • Exploring outliers
        • Identifying outliers with distance functions
        • Identifying outliers with the local outlier factor
        • Anomaly detection
        • Extreme value analysis
      • Finding associations in shopping carts
    • Chapter 9: Reporting and Dashboarding
        • Output formats for .Rmd documents
      • Dashboards
        • Simple dashboarding with R Markdown
        • Dashboarding and reporting with flexdashboard
      • Reports and technical memos
      • Slide decks
      • purl: scraping the raw code from your .Rmd files
      • Shiny apps
      • Tweaking the YAML headers
    • Appendix 1: Setting up projects and using “Make” files
      • Setting up a project with RStudio and Git
        • Committing your changes
        • Rolling back to a previous state
        • Packaging a project
      • “Make” files in R
    • Appendix 2: .Rmd File for the Chapter 1 final report example
    • Appendix 3: .R file for Chapter 9 Shiny app example
    • Appendix 4: .Rmd file for the Chapter 9 flexdashboard example
    • Appendix 5: R Markdown Quick Reference
  • Notes

The Leanpub 45-day 100% Happiness Guarantee

Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

Do Well. Do Good.

Authors have earned$10,589,243writing, publishing and selling on Leanpub, earning 80% royalties while saving up to 25 million pounds of CO2 and up to 46,000 trees.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF, EPUB and/or MOBI files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub