Data Analysis for the Life Sciences
Data Analysis for the Life Sciences
About the Book
The unprecedented advance in digital technology during the second half of the 20th century has produced a measurement revolution that is transforming science. In the life sciences, data analysis is now part of practically every research project. Genomics, in particular, is being driven by new measurement technologies that permit us to observe certain molecular entities for the first time. These observations are leading to discoveries analogous to identifying microorganisms and other breakthroughs permitted by the invention of the microscope. Choice examples of these technologies are microarrays and next generation sequencing. This book will cover several of the statistical concepts and data analytic skills needed to succeed in data-driven life science research. We go from relatively basic concepts related to computing p-values to advanced topics related to analyzing high-throughput data.
While statistics textbooks focus on mathematics, this book focuses on using a computer to perform data analysis. Instead of explaining the mathematics and theory, and then showing examples, we start by stating a practical data-related challenge. This book also includes the computer code that provides a solution to the problem and helps illustrate the concepts behind the solution. By running the code yourself, and seeing data generation and analysis happen live, you will get a better intuition for the concepts, the mathematics, and the theory. The book was created using the R markdown language and we make all this code available to the reader. This means that readers can replicate all the figures and analyses used to create the book.
About the Contributors
Table of Contents
- Acknowledgements
-
Introduction
- What Does This Book Cover?
- How Is This Book Different?
-
Getting Started
- Installing R
- Installing RStudio
- Learn R Basics
- Installing Packages
- Importing Data into R
-
Brief Introduction to
dplyr
- Mathematical Notation
-
Inference
- Introduction
- Random Variables
- The Null Hypothesis
- Distributions
- Probability Distribution
- Normal Distribution
- Populations, Samples and Estimates
- Central Limit Theorem and t-distribution
- Central Limit Theorem in Practice
- t-tests in Practice
- The t-distribution in Practice
- Confidence Intervals
- Power Calculations
- Monte Carlo Simulation
- Parametric Simulations for the Observations
- Permutation Tests
- Association Tests
-
Exploratory Data Analysis
- Quantile Quantile Plots
- Boxplots
- Scatterplots And Correlation
- Stratification
- Bi-variate Normal Distribution
- Plots To Avoid
- Misunderstanding Correlation (Advanced)
- Robust Summaries
- Wilcoxon Rank Sum Test
-
Matrix Algebra
- Motivating Examples
- Matrix Notation
- Solving System of Equations
- Vectors, Matrices and Scalars
- Matrix Operations
- Examples
-
Linear Models
- The Design Matrix
- The Mathematics Behind lm()
- Standard Errors
- Interactions and Contrasts
- Linear Model with Interactions
- Analysis of variance
- Co-linearity
- Rank
- Removing Confounding
- The QR Factorization (Advanced)
- Going Further
-
Inference For High Dimensional Data
- Introduction
- Inference in Practice
- Procedures
- Error Rates
- The Bonferroni Correction
- False Discovery Rate
- Direct Approach to FDR and q-values (Advanced)
- Basic Exploratory Data Analysis
-
Statistical Models
- The Binomial Distribution
- The Poisson Distribution
- Maximum Likelihood Estimation
- Distributions for Positive Continuous Values
- Bayesian Statistics
- Hierarchical Models
-
Distance and Dimension Reduction
- Introduction
- Euclidean Distance
- Distance in High Dimensions
- Dimension Reduction Motivation
- Singular Value Decomposition
- Projections
- Rotations
- Multi-Dimensional Scaling Plots
- Principal Component Analysis
-
Basic Machine Learning
- Clustering
- Conditional Probabilities and Expectations
- Smoothing
- Bin Smoothing
- Loess
- Class Prediction
- Cross-validation
-
Batch Effects
- Confounding
- Confounding: High-throughput Example
- Discovering Batch Effects with EDA
- Gene Expression Data
- Motivation for Statistical Approaches
- Adjusting for Batch Effects with Linear Models
- Factor Analysis
- Modeling Batch Effects with Factor Analysis
The Leanpub 60 Day 100% Happiness Guarantee
Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.
You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!
So, there's no reason not to click the Add to Cart button, is there?
See full terms...
Earn $8 on a $10 Purchase, and $16 on a $20 Purchase
We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.
(Yes, some authors have already earned much more than that on Leanpub.)
In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.
Learn more about writing on Leanpub
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them