About the Book
The unprecedented advance in digital technology during the second half of the 20th century has produced a measurement revolution that is transforming science. In the life sciences, data analysis is now part of practically every research project. Genomics, in particular, is being driven by new measurement technologies that permit us to observe certain molecular entities for the first time. These observations are leading to discoveries analogous to identifying microorganisms and other breakthroughs permitted by the invention of the microscope. Choice examples of these technologies are microarrays and next generation sequencing. This book will cover several of the statistical concepts and data analytic skills needed to succeed in data-driven life science research. We go from relatively basic concepts related to computing p-values to advanced topics related to analyzing high-throughput data.
While statistics textbooks focus on mathematics, this book focuses on using a computer to perform data analysis. Instead of explaining the mathematics and theory, and then showing examples, we start by stating a practical data-related challenge. This book also includes the computer code that provides a solution to the problem and helps illustrate the concepts behind the solution. By running the code yourself, and seeing data generation and analysis happen live, you will get a better intuition for the concepts, the mathematics, and the theory. The book was created using the R markdown language and we make all this code available to the reader. This means that readers can replicate all the figures and analyses used to create the book.
About the Authors
Rafael Irizarry is a Professor of Biostatistics and Computational Biology at the Dana Farber Cancer Institute and Biostatistics at the Harvard T.H. Chan School of Public Health . For the past 17 years, Dr. Irizarry’s research has focused on the analysis of genomics data.
Michael Love is an Assistant Professor in the Departments of Biostatistics and Genetics at the University of North Carolina at Chapel Hill. Dr. Love uses statistical models to discover biologically relevant patterns in genomic datasets, and develops open-source statistical software for the Bioconductor Project.