Data Analysis for the Life Sciences
Minimum price
Suggested price

Data Analysis for the Life Sciences

About the Book

The unprecedented advance in digital technology during the second half of the 20th century has produced a measurement revolution that is transforming science. In the life sciences, data analysis is now part of practically every research project. Genomics, in particular, is being driven by new measurement technologies that permit us to observe certain molecular entities for the first time. These observations are leading to discoveries analogous to identifying microorganisms and other breakthroughs permitted by the invention of the microscope. Choice examples of these technologies are microarrays and next generation sequencing. This book will cover several of the statistical concepts and data analytic skills needed to succeed in data-driven life science research. We go from relatively basic concepts related to computing p-values to advanced topics related to analyzing high-throughput data. 

While statistics textbooks focus on mathematics, this book focuses on using a computer to perform data analysis. Instead of explaining the mathematics and theory, and then showing examples, we start by stating a practical data-related challenge. This book also includes the computer code that provides a solution to the problem and helps illustrate the concepts behind the solution. By running the code yourself, and seeing data generation and analysis happen live, you will get a better intuition for the concepts, the mathematics, and the theory. The book was created using the R markdown language and we make all this code available to the reader. This means that readers can replicate all the figures and analyses used to create the book.


About the Authors

Rafael A Irizarry
Rafael A Irizarry

Rafael Irizarry is a Professor of Biostatistics and Computational Biology at the Dana Farber Cancer Institute and Biostatistics at the Harvard T.H. Chan School of Public Health . For the past 17 years, Dr. Irizarry’s research has focused on the analysis of genomics data. 

Michael I Love
Michael I Love

Michael Love is an Assistant Professor in the Departments of Biostatistics and Genetics at the University of North Carolina at Chapel Hill. Dr. Love uses statistical models to discover biologically relevant patterns in genomic datasets, and develops open-source statistical software for the Bioconductor Project.

About the Contributors

Alexandra Nones
Alexandra Nones
Alexandra proofread the book in its various stages.
Heather Sternshein
Heather Sternshein
Heather helped coordinate the online course that gave birth to this book.
Karl Broman
Karl Broman
Karl contributed the "plots to avoid" section.
Stephanie Hicks
Stephanie Hicks
Stephanie contributed some of the exercises.

Table of Contents

  • Acknowledgements
  • Introduction
    • What Does This Book Cover?
    • How Is This Book Different?
  • Getting Started
    • Installing R
    • Installing RStudio
    • Learn R Basics
    • Installing Packages
    • Importing Data into R
    • Brief Introduction to dplyr
    • Mathematical Notation
  • Inference
    • Introduction
    • Random Variables
    • The Null Hypothesis
    • Distributions
    • Probability Distribution
    • Normal Distribution
    • Populations, Samples and Estimates
    • Central Limit Theorem and t-distribution
    • Central Limit Theorem in Practice
    • t-tests in Practice
    • The t-distribution in Practice
    • Confidence Intervals
    • Power Calculations
    • Monte Carlo Simulation
    • Parametric Simulations for the Observations
    • Permutation Tests
    • Association Tests
  • Exploratory Data Analysis
    • Quantile Quantile Plots
    • Boxplots
    • Scatterplots And Correlation
    • Stratification
    • Bi-variate Normal Distribution
    • Plots To Avoid
    • Misunderstanding Correlation (Advanced)
    • Robust Summaries
    • Wilcoxon Rank Sum Test
  • Matrix Algebra
    • Motivating Examples
    • Matrix Notation
    • Solving System of Equations
    • Vectors, Matrices and Scalars
    • Matrix Operations
    • Examples
  • Linear Models
    • The Design Matrix
    • The Mathematics Behind lm()
    • Standard Errors
    • Interactions and Contrasts
    • Linear Model with Interactions
    • Analysis of variance
    • Co-linearity
    • Rank
    • Removing Confounding
    • The QR Factorization (Advanced)
    • Going Further
  • Inference For High Dimensional Data
    • Introduction
    • Inference in Practice
    • Procedures
    • Error Rates
    • The Bonferroni Correction
    • False Discovery Rate
    • Direct Approach to FDR and q-values (Advanced)
    • Basic Exploratory Data Analysis
  • Statistical Models
    • The Binomial Distribution
    • The Poisson Distribution
    • Maximum Likelihood Estimation
    • Distributions for Positive Continuous Values
    • Bayesian Statistics
    • Hierarchical Models
  • Distance and Dimension Reduction
    • Introduction
    • Euclidean Distance
    • Distance in High Dimensions
    • Dimension Reduction Motivation
    • Singular Value Decomposition
    • Projections
    • Rotations
    • Multi-Dimensional Scaling Plots
    • Principal Component Analysis
  • Basic Machine Learning
    • Clustering
    • Conditional Probabilities and Expectations
    • Smoothing
    • Bin Smoothing
    • Loess
    • Class Prediction
    • Cross-validation
  • Batch Effects
    • Confounding
    • Confounding: High-throughput Example
    • Discovering Batch Effects with EDA
    • Gene Expression Data
    • Motivation for Statistical Approaches
    • Adjusting for Batch Effects with Linear Models
    • Factor Analysis
    • Modeling Batch Effects with Factor Analysis

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub