Business Intelligence with R
Business Intelligence with R
From Acquiring Data to Pattern Exploration
About the Book
This is now coming up on 2 years old, and lots has changed in the R ecosystem in those 2 years, so I am dropping the suggested price to free. I don't have any current plans to update it, though if someone wanted to partner with me, I'd love to join forces!
The growth of R has been phenomenal over the past several years, reaching out past academia and research labs into the daily activities of business and industry. More and more line-level analysts are supplementing their SQL or Excel skills with R, and more and more businesses are hiring analysts with R skills to bridge the increasing gap between business needs and on-hand analytic resources.
While there are already many books that explore the use of R in statistics and data science, many of them are set as a series of case studies or are written predominantly for academic audiences as textbooks. There are also many R cookbooks, but they tend to cover the breadth of R's capabilities and options. This book aims to provide a cookbook of R recipes that are specifically of use in the daily workflow of data scientists and analysts in business and industry.
Business Intelligence with R is a practical, hands-on overview of many of the major BI/analytic tasks that can be accomplished with R. It is not meant to be exhaustive--there is always more than one way to accomplish a given task in R, so this book aims to provide the simplest and/or most robust approaches to meet daily workflow needs. It can serve as the go-to desk reference for the professional analyst who needs to get things done in R.
From setting up a project under version control to creating an interactive dashboard of a data product, this book will provide you with the pieces to put it all together.
Table of Contents
-
-
About
-
- About the Author
- Session Info
- Version Info
- Code and Data
- Cover Image
- Proceeds
- Contact
-
- install.packages
-
Introduction
- Overview
- Conventions
- What you need
- Acknowledgments
- Website/Code
- Happy coding!
-
Chapter 1: An entire project in a few lines of code
- The analytics problem
- Set up
- Acquire data
- Wrangling data
-
Analytics
- Explore the data
- Run a forecasting model
-
Reporting
- Create an interactive HTML plot
- Documenting the project
- Summary
-
Chapter 2: Getting Data
-
Working with files
- Reading flat files from disk or the web
- Reading big files with data.table
- Unzipping files within R
- Reading Excel files
- Creating a dataframe from the clipboard or direct entry
- Reading XML files
- Reading JSON files
-
Working with databases
- Connecting to a database
- Creating data frames from a database
- Disconnecting from a database
- Creating a SQLite database inside R
- Creating a dataframe from a SQLite database
-
Getting data from the web
- Working through a proxy
- Scraping data from a web table
- Working with APIs
- Creating fake data to test code
- Writing files to disk
-
Working with files
-
Chapter 3: Cleaning and Preparing Data
-
Understanding your data
- Identifying data types and overall structure
- Identifying (and ordering) factor levels
- Identifying unique values or duplicates
- Sorting
-
Cleaning up
- Converting between data types
- Cleaning up a web-scraped table: Regular expressions and more
- Missing data: Converting dummy NA values
- Rounding
- Concatenation
-
Merging dataframes
- Joins
- Unions and bindings
- Subsetting: filter and select from a dataframe
- Creating a derived column
-
Peeking at the outcome with
dplyr
-
Reshaping a dataframe between wide and long
- Reshaping: melt and cast
- Reshaping: Crossing variables with ~ and +
- Summarizing while reshaping
- Piping with %>%: Stringing it together in dplyr
-
Understanding your data
-
Chapter 4: Know Thy Data—Exploratory Data Analysis
-
Creating summary plots
- Everything at once: ggpairs
- Create histograms of all numeric variables in one plot
- A better “pairs” plot
- Mosaic plot matrix: “Scatterplot” matrix for categorical data
-
Plotting univariate distributions
- Histograms and density plots
- Bar and dot plots
- Plotting multiple univariate distributions with faceting
-
Plotting bivariate and comparative distributions
- Double density plots
- Boxplots
- Beanplots
- Scatterplots and marginal distributions
- Mosaic plots
- Multiple bivariate comparisons with faceting
- Pareto charts
- Plotting survey data
- Obtaining summary and conditional statistics
- Finding the mode or local maxima/minima
-
Inference on summary statistics
- Confidence intervals
- Tolerance intervals
-
Dealing with missing data
- Visualizing missing data
- Imputation for missing values
-
Creating summary plots
-
Chapter 5: Effect Sizes
- Overview
-
Effect sizes: Measuring differences between groups
- Basic differences
- Standardized differences
- Determining the probability of a difference
-
Effect sizes: Measuring similarities between groups
- Correlation
- Bootstrapping BCa CIs for non-parametric correlation
- Determining the probability of a correlation
- Partial correlations
- Polychoric and polyserial correlation for ordinal data
- Associations between categorical variables
- Cohen’s kappa for comparisons of agreement
- Regression coefficient
- R: Proportion of variance explained
-
Chapter 6: Trends and Time
-
Describing trends in non-temporal data
- Smoothed trends
- Quantile trends
- Simple linear trends
- Segmented linear trends
- The many flavors of regression
- Plotting regression coefficients
-
Working with temporal data
- Calculate a mean or correlation in circular time (clock time)
- Plotting time-series data
- Detecting autocorrelation
- Plotting monthly and seasonal patterns
- Plotting seasonal adjustment on the fly
- Decomposing time series into components
- Using spectral analysis to identify periodicity
- Plotting survival curves
- Evaluating quality with control charts
- Identifying possible breakpoints in a time series
- Exploring relationships between time series: cross-correlation
- Basic forecasting
-
Describing trends in non-temporal data
-
Chapter 7: A Dog’s Breakfast of Dataviz
-
Plotting multivariate distributions
- Heatmaps
- Creating calendar heatmaps
- Parallel coordinates plots
- Peeking at multivariate data with dplyr and a bubblechart
- Plotting a table
-
Interactive dataviz
- Basic interactive plots
- Scatterplot matrix
- Motionchart: a moving bubblechart
- Interactive parallel coordinates
- Interactive tables with DT
-
Making maps in R
- Basic point maps
- Chloropleth maps
- Chloropleth mapping with the American Community Survey
- Using shapefiles and raw data
- Using ggmap for point data and heatmaps
- Interactive maps with leaflet
- Why map with R?
-
Plotting multivariate distributions
-
Chapter 8: Pattern Discovery and Dimension Reduction
-
Mapping multivariate relationships
- Non-metric multidimensional scaling (nMDS)
- Diagnostics for nMDS results
- Vector mapping influential variables over the nMDS plot
- Contour mapping influential variables over the nMDS plot
- Principal Components Analysis (PCA)
- nMDS for Categories: Correspondence Analysis
-
Cluster analysis
- Grouping observations with hierarchical clustering
- Plotting a cluster dendrogram with ggplot
- Exploring hierarchical clustering of nMDS results
- How to partition the results of a hierarchical cluster analysis
- Identifying and describing group membership with kMeans and PAM
- How to choose an optimal number of clusters with bootstrapping
- Determining optimal numbers of clusters with model-based clustering
- Identifying group membership with irregular clusters
- Variable selection in cluster analysis
- Error-checking cluster results with known outcomes
-
Exploring outliers
- Identifying outliers with distance functions
- Identifying outliers with the local outlier factor
- Anomaly detection
- Extreme value analysis
- Finding associations in shopping carts
-
Mapping multivariate relationships
-
Chapter 9: Reporting and Dashboarding
-
- Output formats for .Rmd documents
-
Dashboards
- Simple dashboarding with R Markdown
- Dashboarding and reporting with flexdashboard
- Reports and technical memos
- Slide decks
- purl: scraping the raw code from your .Rmd files
- Shiny apps
- Tweaking the YAML headers
-
-
Appendix 1: Setting up projects and using “Make” files
-
Setting up a project with RStudio and Git
- Committing your changes
- Rolling back to a previous state
- Packaging a project
- “Make” files in R
-
Setting up a project with RStudio and Git
- Appendix 2: .Rmd File for the Chapter 1 final report example
- Appendix 3: .R file for Chapter 9 Shiny app example
- Appendix 4: .Rmd file for the Chapter 9 flexdashboard example
- Appendix 5: R Markdown Quick Reference
-
About
- Notes
The Leanpub 60-day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
80% Royalties. Earn $16 on a $20 book.
We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earned$12,307,240writing, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them
Top Books
OpenIntro Statistics
David Diez, Christopher Barr, Mine Cetinkaya-Rundel, and OpenIntroA complete foundation for Statistics, also serving as a foundation for Data Science.
Leanpub revenue supports OpenIntro (US-based nonprofit) so we can provide free desk copies to teachers interested in using OpenIntro Statistics in the classroom and expand the project to support free textbooks in other subjects.
More resources: openintro.org.
Personal Finance
Jason AndersonThis textbook provides an in-depth analysis on personal finance that is both practical and straightforward in its approach. It has been written in such a way that the readers can gain knowledge without getting overwhelmed by the technical terms. Suitable for both beginners and advanced learners.
Getting to Know IntelliJ IDEA
Trisha Gee and Helen ScottIf we treat our IDE as a text editor, we are doing ourselves a disservice. Using a combination of tutorials and a questions-and-answers approach, Getting to Know IntelliJ IDEA will help you find ways to use IntelliJ IDEA that enable you to work comfortably and productively as a professional developer.
C++20 - The Complete Guide
Nicolai M. JosuttisAll new language and library features of C++20 (for those who know previous C++ versions).
The book presents all new language and library features of C++20. Learn how this impacts day-to-day programming, to benefit in practice, to combine new features, and to avoid all new traps.
Buy early, pay less, free updates.
Other books:
Mastering STM32 - Second Edition
Carmine NovielloWith more than 1200 microcontrollers, STM32 is probably the most complete ARM Cortex-M platform on the market. This book aims to be the most complete guide around introducing the reader to this exciting MCU portfolio from ST Microelectronics and its official CubeHAL and STM32CubeIDE development environment.
R Programming for Data Science
Roger D. PengThis book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science. Printed copies of this book are available through Lulu.
Machine Learning Q and AI
Sebastian Raschka, PhDHave you recently completed a machine learning or deep learning course and wondered what to learn next? With 30 questions and answers on key concepts in machine learning and AI, this book provides bite-sized bits of knowledge for your journey to becoming a machine learning expert.
Stats One
William FooteThe Rails 7 Way
Obie Fernandez, Lucas Dohmen, and Tom Henrik AadlandThe Rails™ 7 Way is the comprehensive, authoritative reference guide for professionals delivering production-quality code using modern Ruby on Rails. It illuminates the entire Rails 7 API, its most powerful idioms, design approaches, and libraries. Building on the previous editions, this edition has been heavily refactored and updated.
Gradual Modularization for Ruby and Rails
Stephan HagemannGet yourself a new tool to manage your Rails application and your growing engineering organization! Prevent the ball-of-mud (and fix it!). Go for microservices or SOA if it makes sense not just because you don't have any other tool. Do all this through a low-overhead tool: packages. Enable better conversations to make practical changes today.
Top Bundles
- #1
Software Architecture
2 Books
"Software Architecture for Developers" is a practical and pragmatic guide to modern, lightweight software architecture, specifically aimed at developers. You'll learn:The essence of software architecture.Why the software architecture role should include coding, coaching and collaboration.The things that you really need to think about before... - #2
CCIE Service Provider Ultimate Study Bundle
2 Books
Piotr Jablonski, Lukasz Bromirski, and Nick Russo have joined forces to deliver the only CCIE Service Provider training resource you'll ever need. This bundle contains a detailed and challenging collection of workbook labs, plus an extensively detailed technical reference guide. All of us have earned the CCIE Service Provider certification... - #3
1500 QUIZ COMMENTATI (3 libri)
3 Books
Tre libri dei QUIZ MMG Commentati al prezzo di DUE! I QUIZ dei concorsi ufficiali di Medicina Generale relativi agli anni: 2000-2001-2003-2012-2013-2014-2015-2016-2017-2018-2019-2020-2021 +100 inediti Raccolti in unico bundle per aiutarvi nello studio e nella preparazione al concorso. All'interno di ogni libro i quiz sono stati suddivisi per... - #4
Pattern-Oriented Memory Forensics and Malware Detection
2 Books
This training bundle for security engineers and researchers, malware and memory forensics analysts includes two accelerated training courses for Windows memory dump analysis using WinDbg. It is also useful for technical support and escalation engineers who analyze memory dumps from complex software environments and need to check for possible... - #5
Practical FP in Scala + Functional event-driven architecture
2 Books
Practical FP in Scala (A hands-on approach) & Functional event-driven architecture, aka FEDA, (Powered by Scala 3), together as a bundle! The content of PFP in Scala is a requirement to understand FEDA so why not take advantage of this bundle!? - #6
Modern C++ Collection
3 Books
Get All about Modern C++C++ Standard Library, including C++20Concurrency with Modern C++, including C++20C++20Each book has about 200 complete code examples. Updates are included. When I update one of the books, you immediately get the updated bundle. You can expect significant updates to each new C++ standard (C++23, C++26, .. ) and also... - #7
Linux Administration Complet
4 Books
Ce lot comprend les quatre volumes du Guide Linux Administration :Linux Administration, Volume 1, Administration fondamentale : Guide pratique de préparation aux examens de certification LPIC 1, Linux Essentials, RHCSA et LFCS. Administration fondamentale. Introduction à Linux. Le Shell. Traitement du texte. Arborescence de fichiers. Sécurité... - #9
Development and Deployment of Multiplayer Online Games, Part ARCH. Architecture (Vol. I-III)
3 Books
What's the Big Idea? The idea behind this book is to summarize the body of knowledge that already exists on multiplayer games but is not available in one single place.And quite a fewof the issues discussed within this series (planned as three nine volumes ~300 pages each), while known in the industry, have not been published at all (except for... - #10
Growing Agile: The Complete Coach's Guide
7 Books
Growing Agile: Coach's Guide Series This bundle provides a collection of training and workshop plans for a variety of agile topics. The series is aimed at agile coaches, trainers and ScrumMasters who often find themselves needing to help teams understand agile concepts. Each book in the series provides the plans, slides, handouts and activity...