Business Intelligence with R
With Membership
Minimum paid price

Business Intelligence with R

From Acquiring Data to Pattern Exploration

About the Book

This is now coming up on 2 years old, and lots has changed in the R ecosystem in those 2 years, so I am dropping the suggested price to free. I don't have any current plans to update it, though if someone wanted to partner with me, I'd love to join forces!

The growth of R has been phenomenal over the past several years, reaching out past academia and research labs into the daily activities of business and industry. More and more line-level analysts are supplementing their SQL or Excel skills with R, and more and more businesses are hiring analysts with R skills to bridge the increasing gap between business needs and on-hand analytic resources.

While there are already many books that explore the use of R in statistics and data science, many of them are set as a series of case studies or are written predominantly for academic audiences as textbooks. There are also many R cookbooks, but they tend to cover the breadth of R's capabilities and options. This book aims to provide a cookbook of R recipes that are specifically of use in the daily workflow of data scientists and analysts in business and industry.

Business Intelligence with R is a practical, hands-on overview of many of the major BI/analytic tasks that can be accomplished with R. It is not meant to be exhaustive--there is always more than one way to accomplish a given task in R, so this book aims to provide the simplest and/or most robust approaches to meet daily workflow needs. It can serve as the go-to desk reference for the professional analyst who needs to get things done in R.

From setting up a project under version control to creating an interactive dashboard of a data product, this book will provide you with the pieces to put it all together.

  Categories

    • Management
    • Software
    • Textbooks
    • R
    • Graphics
    • Data Science
    • Education
    • Reference
    • Programming Cookbooks
  License

About the Author

Dwight Barry
Dwight Barry

Dwight Barry is a Lead Data Scientist at Seattle Children's Hospital in Seattle, Washington, USA. 

Table of Contents

    • About
        • About the Author
        • Session Info
        • Version Info
        • Code and Data
        • Cover Image
        • Proceeds
        • Contact
    • install.packages
    • Introduction
      • Overview
      • Conventions
      • What you need
      • Acknowledgments
      • Website/Code
      • Happy coding!
    • Chapter 1: An entire project in a few lines of code
      • The analytics problem
      • Set up
      • Acquire data
      • Wrangling data
      • Analytics
        • Explore the data
        • Run a forecasting model
      • Reporting
        • Create an interactive HTML plot
      • Documenting the project
      • Summary
    • Chapter 2: Getting Data
      • Working with files
        • Reading flat files from disk or the web
        • Reading big files with data.table
        • Unzipping files within R
        • Reading Excel files
        • Creating a dataframe from the clipboard or direct entry
        • Reading XML files
        • Reading JSON files
      • Working with databases
        • Connecting to a database
        • Creating data frames from a database
        • Disconnecting from a database
        • Creating a SQLite database inside R
        • Creating a dataframe from a SQLite database
      • Getting data from the web
        • Working through a proxy
        • Scraping data from a web table
        • Working with APIs
      • Creating fake data to test code
      • Writing files to disk
    • Chapter 3: Cleaning and Preparing Data
      • Understanding your data
        • Identifying data types and overall structure
        • Identifying (and ordering) factor levels
        • Identifying unique values or duplicates
        • Sorting
      • Cleaning up
        • Converting between data types
        • Cleaning up a web-scraped table: Regular expressions and more
        • Missing data: Converting dummy NA values
        • Rounding
        • Concatenation
      • Merging dataframes
        • Joins
        • Unions and bindings
      • Subsetting: filter and select from a dataframe
      • Creating a derived column
      • Peeking at the outcome with dplyr
      • Reshaping a dataframe between wide and long
        • Reshaping: melt and cast
        • Reshaping: Crossing variables with ~ and +
        • Summarizing while reshaping
      • Piping with %>%: Stringing it together in dplyr
    • Chapter 4: Know Thy Data—Exploratory Data Analysis
      • Creating summary plots
        • Everything at once: ggpairs
        • Create histograms of all numeric variables in one plot
        • A better “pairs” plot
        • Mosaic plot matrix: “Scatterplot” matrix for categorical data
      • Plotting univariate distributions
        • Histograms and density plots
        • Bar and dot plots
        • Plotting multiple univariate distributions with faceting
      • Plotting bivariate and comparative distributions
        • Double density plots
        • Boxplots
        • Beanplots
        • Scatterplots and marginal distributions
        • Mosaic plots
        • Multiple bivariate comparisons with faceting
      • Pareto charts
      • Plotting survey data
      • Obtaining summary and conditional statistics
      • Finding the mode or local maxima/minima
      • Inference on summary statistics
        • Confidence intervals
        • Tolerance intervals
      • Dealing with missing data
        • Visualizing missing data
        • Imputation for missing values
    • Chapter 5: Effect Sizes
      • Overview
      • Effect sizes: Measuring differences between groups
        • Basic differences
        • Standardized differences
        • Determining the probability of a difference
      • Effect sizes: Measuring similarities between groups
        • Correlation
        • Bootstrapping BCa CIs for non-parametric correlation
        • Determining the probability of a correlation
        • Partial correlations
        • Polychoric and polyserial correlation for ordinal data
        • Associations between categorical variables
        • Cohen’s kappa for comparisons of agreement
        • Regression coefficient
        • R: Proportion of variance explained
    • Chapter 6: Trends and Time
      • Describing trends in non-temporal data
        • Smoothed trends
        • Quantile trends
        • Simple linear trends
        • Segmented linear trends
        • The many flavors of regression
        • Plotting regression coefficients
      • Working with temporal data
        • Calculate a mean or correlation in circular time (clock time)
        • Plotting time-series data
        • Detecting autocorrelation
        • Plotting monthly and seasonal patterns
        • Plotting seasonal adjustment on the fly
        • Decomposing time series into components
        • Using spectral analysis to identify periodicity
        • Plotting survival curves
        • Evaluating quality with control charts
        • Identifying possible breakpoints in a time series
        • Exploring relationships between time series: cross-correlation
        • Basic forecasting
    • Chapter 7: A Dog’s Breakfast of Dataviz
      • Plotting multivariate distributions
        • Heatmaps
        • Creating calendar heatmaps
        • Parallel coordinates plots
        • Peeking at multivariate data with dplyr and a bubblechart
      • Plotting a table
      • Interactive dataviz
        • Basic interactive plots
        • Scatterplot matrix
        • Motionchart: a moving bubblechart
        • Interactive parallel coordinates
        • Interactive tables with DT
      • Making maps in R
        • Basic point maps
        • Chloropleth maps
        • Chloropleth mapping with the American Community Survey
        • Using shapefiles and raw data
        • Using ggmap for point data and heatmaps
        • Interactive maps with leaflet
        • Why map with R?
    • Chapter 8: Pattern Discovery and Dimension Reduction
      • Mapping multivariate relationships
        • Non-metric multidimensional scaling (nMDS)
        • Diagnostics for nMDS results
        • Vector mapping influential variables over the nMDS plot
        • Contour mapping influential variables over the nMDS plot
        • Principal Components Analysis (PCA)
        • nMDS for Categories: Correspondence Analysis
      • Cluster analysis
        • Grouping observations with hierarchical clustering
        • Plotting a cluster dendrogram with ggplot
        • Exploring hierarchical clustering of nMDS results
        • How to partition the results of a hierarchical cluster analysis
        • Identifying and describing group membership with kMeans and PAM
        • How to choose an optimal number of clusters with bootstrapping
        • Determining optimal numbers of clusters with model-based clustering
        • Identifying group membership with irregular clusters
        • Variable selection in cluster analysis
        • Error-checking cluster results with known outcomes
      • Exploring outliers
        • Identifying outliers with distance functions
        • Identifying outliers with the local outlier factor
        • Anomaly detection
        • Extreme value analysis
      • Finding associations in shopping carts
    • Chapter 9: Reporting and Dashboarding
        • Output formats for .Rmd documents
      • Dashboards
        • Simple dashboarding with R Markdown
        • Dashboarding and reporting with flexdashboard
      • Reports and technical memos
      • Slide decks
      • purl: scraping the raw code from your .Rmd files
      • Shiny apps
      • Tweaking the YAML headers
    • Appendix 1: Setting up projects and using “Make” files
      • Setting up a project with RStudio and Git
        • Committing your changes
        • Rolling back to a previous state
        • Packaging a project
      • “Make” files in R
    • Appendix 2: .Rmd File for the Chapter 1 final report example
    • Appendix 3: .R file for Chapter 9 Shiny app example
    • Appendix 4: .Rmd file for the Chapter 9 flexdashboard example
    • Appendix 5: R Markdown Quick Reference
  • Notes

