About the Book
The demand for skilled data science practitioners in industry, academia, and government is rapidly growing. This book introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression and machine learning. It also helps you develop skills such as R programming, data wrangling with dplyr, data visualization with ggplot2, algorithm building with caret, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation with knitr and R markdown. The book is divided into six parts: R, Data Visualization, Data Wrangling, Probability, Inference and Regression with R, Machine Learning, and Productivity Tools. Each part has several chapters meant to be presented as one lecture. The book includes dozens of exercises distributed across most chapters.
About the Author
Rafael Irizarry is a Professor of Biostatistics and Computational Biology at the Dana Farber Cancer Institute and Biostatistics at the Harvard T.H. Chan School of Public Health . For the past 17 years, Dr. Irizarry’s research has focused on the analysis of genomics data.