Introduction to Data Science
Introduction to Data Science
Minimum price
Suggested price
Introduction to Data Science

Last updated on 2019-03-17

About the Book

The demand for skilled data science practitioners in industry, academia, and government is rapidly growing. This book introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression and machine learning. It also helps you develop skills such as R programming, data wrangling with dplyr, data visualization with ggplot2, algorithm building with caret, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation with knitr and R markdown. The book is divided into six parts: R, Data Visualization, Data Wrangling, Probability, Inference and Regression with R, Machine Learning, and Productivity Tools. Each part has several chapters meant to be presented as one lecture. The book includes dozens of exercises distributed across most chapters. 

About the Author

Rafael A Irizarry
Rafael A Irizarry

Rafael Irizarry is a Professor of Biostatistics and Computational Biology at the Dana Farber Cancer Institute and Biostatistics at the Harvard T.H. Chan School of Public Health . For the past 17 years, Dr. Irizarry’s research has focused on the analysis of genomics data. 

Table of Contents

Part I R

1 Installing R and RStudio 

2 Getting Started with R and RStudio 

3 R Basics 

4 Programming basics 

5 The tidyverse

6 Importing data 

Part II Data Visualization 

7 Introduction to data visualization

8 ggplot2 

9 Visualizing data distributions 

10 Data visualization in practice 

11 Data visualization principles 

12 Robust summaries 

Part III Statistics with R 

13 Introduction to Statistics with R 

14 Probability 

15 Random variables 

16 Statistical Inference 

17 Statistical models 

18 Regression

19 Linear Models

20 Association is not causation

Part IV Data Wrangling

21 Introduction to Data Wrangling

22 Reshaping data

23 Joining tables

24 Web Scraping

25 String Processing

26 Parsing Dates and Times

27 Text mining

Part V Machine Learning

28 Introduction to Machine Learning

29 Smoothing

30 Cross validation

31 The caret package

32 Examples of algorithms

33 Machine learning in practice

34 Large datasets

35 Clustering

Part VI Productivity tools 

36 Introduction to productivity tools

37 Accessing the terminal and installing Git

38 Organizing with Unix

39 Git and GitHub

40 Reproducible projects with RStudio and R markdown

The Leanpub 45-day 100% Happiness Guarantee

Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

Free Updates. Free App. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets), MOBI (for Kindle) and in the free Leanpub App (for Mac, Windows, iOS and Android). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

Authors, publishers and universities use Leanpub to publish amazing in-progress and completed books and courses, just like this one. You can use Leanpub to write, publish and sell your book or course as well! Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks. Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. It really is that easy.

Learn more about writing on Leanpub