Business Intelligence with R
Business Intelligence with R
From Acquiring Data to Pattern Exploration
About the Book
This is now coming up on 2 years old, and lots has changed in the R ecosystem in those 2 years, so I am dropping the suggested price to free. I don't have any current plans to update it, though if someone wanted to partner with me, I'd love to join forces!
The growth of R has been phenomenal over the past several years, reaching out past academia and research labs into the daily activities of business and industry. More and more line-level analysts are supplementing their SQL or Excel skills with R, and more and more businesses are hiring analysts with R skills to bridge the increasing gap between business needs and on-hand analytic resources.
While there are already many books that explore the use of R in statistics and data science, many of them are set as a series of case studies or are written predominantly for academic audiences as textbooks. There are also many R cookbooks, but they tend to cover the breadth of R's capabilities and options. This book aims to provide a cookbook of R recipes that are specifically of use in the daily workflow of data scientists and analysts in business and industry.
Business Intelligence with R is a practical, hands-on overview of many of the major BI/analytic tasks that can be accomplished with R. It is not meant to be exhaustive--there is always more than one way to accomplish a given task in R, so this book aims to provide the simplest and/or most robust approaches to meet daily workflow needs. It can serve as the go-to desk reference for the professional analyst who needs to get things done in R.
From setting up a project under version control to creating an interactive dashboard of a data product, this book will provide you with the pieces to put it all together.
Table of Contents
-
-
About
-
- About the Author
- Session Info
- Version Info
- Code and Data
- Cover Image
- Proceeds
- Contact
-
- install.packages
-
Introduction
- Overview
- Conventions
- What you need
- Acknowledgments
- Website/Code
- Happy coding!
-
Chapter 1: An entire project in a few lines of code
- The analytics problem
- Set up
- Acquire data
- Wrangling data
-
Analytics
- Explore the data
- Run a forecasting model
-
Reporting
- Create an interactive HTML plot
- Documenting the project
- Summary
-
Chapter 2: Getting Data
-
Working with files
- Reading flat files from disk or the web
- Reading big files with data.table
- Unzipping files within R
- Reading Excel files
- Creating a dataframe from the clipboard or direct entry
- Reading XML files
- Reading JSON files
-
Working with databases
- Connecting to a database
- Creating data frames from a database
- Disconnecting from a database
- Creating a SQLite database inside R
- Creating a dataframe from a SQLite database
-
Getting data from the web
- Working through a proxy
- Scraping data from a web table
- Working with APIs
- Creating fake data to test code
- Writing files to disk
-
Working with files
-
Chapter 3: Cleaning and Preparing Data
-
Understanding your data
- Identifying data types and overall structure
- Identifying (and ordering) factor levels
- Identifying unique values or duplicates
- Sorting
-
Cleaning up
- Converting between data types
- Cleaning up a web-scraped table: Regular expressions and more
- Missing data: Converting dummy NA values
- Rounding
- Concatenation
-
Merging dataframes
- Joins
- Unions and bindings
- Subsetting: filter and select from a dataframe
- Creating a derived column
-
Peeking at the outcome with
dplyr
-
Reshaping a dataframe between wide and long
- Reshaping: melt and cast
- Reshaping: Crossing variables with ~ and +
- Summarizing while reshaping
- Piping with %>%: Stringing it together in dplyr
-
Understanding your data
-
Chapter 4: Know Thy Data—Exploratory Data Analysis
-
Creating summary plots
- Everything at once: ggpairs
- Create histograms of all numeric variables in one plot
- A better “pairs” plot
- Mosaic plot matrix: “Scatterplot” matrix for categorical data
-
Plotting univariate distributions
- Histograms and density plots
- Bar and dot plots
- Plotting multiple univariate distributions with faceting
-
Plotting bivariate and comparative distributions
- Double density plots
- Boxplots
- Beanplots
- Scatterplots and marginal distributions
- Mosaic plots
- Multiple bivariate comparisons with faceting
- Pareto charts
- Plotting survey data
- Obtaining summary and conditional statistics
- Finding the mode or local maxima/minima
-
Inference on summary statistics
- Confidence intervals
- Tolerance intervals
-
Dealing with missing data
- Visualizing missing data
- Imputation for missing values
-
Creating summary plots
-
Chapter 5: Effect Sizes
- Overview
-
Effect sizes: Measuring differences between groups
- Basic differences
- Standardized differences
- Determining the probability of a difference
-
Effect sizes: Measuring similarities between groups
- Correlation
- Bootstrapping BCa CIs for non-parametric correlation
- Determining the probability of a correlation
- Partial correlations
- Polychoric and polyserial correlation for ordinal data
- Associations between categorical variables
- Cohen’s kappa for comparisons of agreement
- Regression coefficient
- R: Proportion of variance explained
-
Chapter 6: Trends and Time
-
Describing trends in non-temporal data
- Smoothed trends
- Quantile trends
- Simple linear trends
- Segmented linear trends
- The many flavors of regression
- Plotting regression coefficients
-
Working with temporal data
- Calculate a mean or correlation in circular time (clock time)
- Plotting time-series data
- Detecting autocorrelation
- Plotting monthly and seasonal patterns
- Plotting seasonal adjustment on the fly
- Decomposing time series into components
- Using spectral analysis to identify periodicity
- Plotting survival curves
- Evaluating quality with control charts
- Identifying possible breakpoints in a time series
- Exploring relationships between time series: cross-correlation
- Basic forecasting
-
Describing trends in non-temporal data
-
Chapter 7: A Dog’s Breakfast of Dataviz
-
Plotting multivariate distributions
- Heatmaps
- Creating calendar heatmaps
- Parallel coordinates plots
- Peeking at multivariate data with dplyr and a bubblechart
- Plotting a table
-
Interactive dataviz
- Basic interactive plots
- Scatterplot matrix
- Motionchart: a moving bubblechart
- Interactive parallel coordinates
- Interactive tables with DT
-
Making maps in R
- Basic point maps
- Chloropleth maps
- Chloropleth mapping with the American Community Survey
- Using shapefiles and raw data
- Using ggmap for point data and heatmaps
- Interactive maps with leaflet
- Why map with R?
-
Plotting multivariate distributions
-
Chapter 8: Pattern Discovery and Dimension Reduction
-
Mapping multivariate relationships
- Non-metric multidimensional scaling (nMDS)
- Diagnostics for nMDS results
- Vector mapping influential variables over the nMDS plot
- Contour mapping influential variables over the nMDS plot
- Principal Components Analysis (PCA)
- nMDS for Categories: Correspondence Analysis
-
Cluster analysis
- Grouping observations with hierarchical clustering
- Plotting a cluster dendrogram with ggplot
- Exploring hierarchical clustering of nMDS results
- How to partition the results of a hierarchical cluster analysis
- Identifying and describing group membership with kMeans and PAM
- How to choose an optimal number of clusters with bootstrapping
- Determining optimal numbers of clusters with model-based clustering
- Identifying group membership with irregular clusters
- Variable selection in cluster analysis
- Error-checking cluster results with known outcomes
-
Exploring outliers
- Identifying outliers with distance functions
- Identifying outliers with the local outlier factor
- Anomaly detection
- Extreme value analysis
- Finding associations in shopping carts
-
Mapping multivariate relationships
-
Chapter 9: Reporting and Dashboarding
-
- Output formats for .Rmd documents
-
Dashboards
- Simple dashboarding with R Markdown
- Dashboarding and reporting with flexdashboard
- Reports and technical memos
- Slide decks
- purl: scraping the raw code from your .Rmd files
- Shiny apps
- Tweaking the YAML headers
-
-
Appendix 1: Setting up projects and using “Make” files
-
Setting up a project with RStudio and Git
- Committing your changes
- Rolling back to a previous state
- Packaging a project
- “Make” files in R
-
Setting up a project with RStudio and Git
- Appendix 2: .Rmd File for the Chapter 1 final report example
- Appendix 3: .R file for Chapter 9 Shiny app example
- Appendix 4: .Rmd file for the Chapter 9 flexdashboard example
- Appendix 5: R Markdown Quick Reference
-
About
- Notes
The Leanpub 60-day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
Do Well. Do Good.
Authors have earned$11,583,453writing, publishing and selling on Leanpub, earning 80% royalties while saving up to 25 million pounds of CO2 and up to 46,000 trees.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them
Top Books
SignalR on .NET 6 - the Complete Guide
Fiodar SazanavetsLearn everything there is to learn about SignalR and how to integrate it with the latest .NET 6 and C# 10 features. Learn how to connect any type of client to SignalR, including plain WebSocket client. Learn how to build interactive applications that can communicate with each other in real time without making excessive calls.
The easiest way to learn design patterns
Fiodar SazanavetsLearn design patterns in the easiest way possible. You will no longer have to brute-force your way through each one of them while trying to figure out how it works. The book provides a unique methodology that will make your understanding of design patterns stick. It can also be used as a reference book where you can find design patterns in seconds.
Functional event-driven architecture: Powered by Scala 3
Gabriel VolpeExplore the event-driven architecture (EDA) in a purely functional way, mainly powered by Fs2 streams in Scala 3!
Leverage your functional programming skills by designing and writing stateless microservices that scale, powered by stateful message brokers.
Tech Giants in Healthcare
Dr. Bertalan MeskoThis comprehensive guide, Tech Giants in Healthcare, clarifies how and why big tech companies step into healthcare, and breaks it down from one market player to the other in what direction they are going, what tools they are using and what horizons they have in front of them.
OpenIntro Statistics
David Diez, Christopher Barr, Mine Cetinkaya-Rundel, and OpenIntroA complete foundation for Statistics, also serving as a foundation for Data Science.
Leanpub revenue supports OpenIntro (US-based nonprofit) so we can provide free desk copies to teachers interested in using OpenIntro Statistics in the classroom and expand the project to support free textbooks in other subjects.
More resources: openintro.org.
Ansible for DevOps
Jeff GeerlingAnsible is a simple, but powerful, server and configuration management tool. Learn to use Ansible effectively, whether you manage one server—or thousands.
Recipes for Decoupling
Matthias NobackCCIE Service Provider Version 4 Written and Lab Exam Comprehensive Guide
Nicholas RussoThe service provider landscape has changed rapidly over the past several years. Networking vendors are continuing to propose new standards, techniques, and procedures for overcoming new challenges while concurrently reducing costs and delivering new services. Cisco has recently updated the CCIE Service Provider track to reflect these changes; this book represents the author's personal journey in achieving that certification.
Jetpack Compose internals
Jorge CastilloJetpack Compose is the future of Android UI. Master how it works internally and become a more efficient developer with it. You'll also find it valuable if you are not an Android dev. This book provides all the details to understand how the Compose compiler & runtime work, and how to create a client library using them.
C++20 - The Complete Guide
Nicolai M. JosuttisAll the new language and library features of C++20 (for those who know previous versions).
The book presents all new language and library features of C++20. Learn how this impacts day-to-day programming, to benefit in practice, to combine new features, and to avoid all new traps.
Buy early, pay less, free updates.
Other books:
Top Bundles
- #1
All the Books of The Medical Futurist
6 Books
We put together the most popular books from The Medical Futurist to provide a clear picture about the major trends shaping the future of medicine and healthcare. Digital health technologies, artificial intelligence, the future of 20 medical specialties, big pharma, data privacy, digital health investments and how technology giants such as Amazon... - #2
Practical FP in Scala + Functional event-driven architecture
2 Books
Practical FP in Scala (A hands-on approach) & Functional event-driven architecture, aka FEDA, (Powered by Scala 3), together as a bundle! The content of PFP in Scala is a requirement to understand FEDA so why not take advantage of this bundle!? - #3
CCIE Service Provider Ultimate Study Bundle
2 Books
Piotr Jablonski, Lukasz Bromirski, and Nick Russo have joined forces to deliver the only CCIE Service Provider training resource you'll ever need. This bundle contains a detailed and challenging collection of workbook labs, plus an extensively detailed technical reference guide. All of us have earned the CCIE Service Provider certification... - #5
Software Architecture for Developers: Volumes 1 & 2 - Technical leadership and communication
2 Books
"Software Architecture for Developers" is a practical and pragmatic guide to modern, lightweight software architecture, specifically aimed at developers. You'll learn:The essence of software architecture.Why the software architecture role should include coding, coaching and collaboration.The things that you really need to think about before... - #6
Pattern-Oriented Memory Forensics and Malware Detection
2 Books
This training bundle for security engineers and researchers, malware and memory forensics analysts includes two accelerated training courses for Windows memory dump analysis using WinDbg. It is also useful for technical support and escalation engineers who analyze memory dumps from complex software environments and need to check for possible... - #8
Modern C++ Collection
3 Books
Get All about Modern C++C++ Standard Library, including C++20Concurrency with Modern C++, including C++20C++20Each book has about 200 complete code examples. Updates are included. When I update one of the books, you immediately get the updated bundle. You can expect significant updates to each new C++ standard (C++23, C++26, .. ) and also... - #9
Linux Administration Complet
4 Books
Ce lot comprend les quatre volumes du Guide Linux Administration :Linux Administration, Volume 1, Administration fondamentale : Guide pratique de préparation aux examens de certification LPIC 1, Linux Essentials, RHCSA et LFCS. Administration fondamentale. Introduction à Linux. Le Shell. Traitement du texte. Arborescence de fichiers. Sécurité...