Data Analysis for the Life Sciences
This book is 100% complete
Completed on 20150923
About the Book
The unprecedented advance in digital technology during the second half of the 20th century has produced a measurement revolution that is transforming science. In the life sciences, data analysis is now part of practically every research project. Genomics, in particular, is being driven by new measurement technologies that permit us to observe certain molecular entities for the first time. These observations are leading to discoveries analogous to identifying microorganisms and other breakthroughs permitted by the invention of the microscope. Choice examples of these technologies are microarrays and next generation sequencing. This book will cover several of the statistical concepts and data analytic skills needed to succeed in datadriven life science research. We go from relatively basic concepts related to computing pvalues to advanced topics related to analyzing highthroughput data.
While statistics textbooks focus on mathematics, this book focuses on using a computer to perform data analysis. Instead of explaining the mathematics and theory, and then showing examples, we start by stating a practical datarelated challenge. This book also includes the computer code that provides a solution to the problem and helps illustrate the concepts behind the solution. By running the code yourself, and seeing data generation and analysis happen live, you will get a better intuition for the concepts, the mathematics, and the theory. The book was created using the R markdown language and we make all this code available to the reader. This means that readers can replicate all the figures and analyses used to create the book.
About the Contributors
Table of Contents
 Acknowledgements

Introduction
 What Does This Book Cover?
 How Is This Book Different?

Getting Started
 Installing R
 Installing RStudio
 Learn R Basics
 Installing Packages
 Importing Data into R

Brief Introduction to
dplyr
 Mathematical Notation

Inference
 Introduction
 Random Variables
 The Null Hypothesis
 Distributions
 Probability Distribution
 Normal Distribution
 Populations, Samples and Estimates
 Central Limit Theorem and tdistribution
 Central Limit Theorem in Practice
 ttests in Practice
 The tdistribution in Practice
 Confidence Intervals
 Power Calculations
 Monte Carlo Simulation
 Parametric Simulations for the Observations
 Permutation Tests
 Association Tests

Exploratory Data Analysis
 Quantile Quantile Plots
 Boxplots
 Scatterplots And Correlation
 Stratification
 Bivariate Normal Distribution
 Plots To Avoid
 Misunderstanding Correlation (Advanced)
 Robust Summaries
 Wilcoxon Rank Sum Test

Matrix Algebra
 Motivating Examples
 Matrix Notation
 Solving System of Equations
 Vectors, Matrices and Scalars
 Matrix Operations
 Examples

Linear Models
 The Design Matrix
 The Mathematics Behind lm()
 Standard Errors
 Interactions and Contrasts
 Linear Model with Interactions
 Analysis of variance
 Colinearity
 Rank
 Removing Confounding
 The QR Factorization (Advanced)
 Going Further

Inference For High Dimensional Data
 Introduction
 Inference in Practice
 Procedures
 Error Rates
 The Bonferroni Correction
 False Discovery Rate
 Direct Approach to FDR and qvalues (Advanced)
 Basic Exploratory Data Analysis

Statistical Models
 The Binomial Distribution
 The Poisson Distribution
 Maximum Likelihood Estimation
 Distributions for Positive Continuous Values
 Bayesian Statistics
 Hierarchical Models

Distance and Dimension Reduction
 Introduction
 Euclidean Distance
 Distance in High Dimensions
 Dimension Reduction Motivation
 Singular Value Decomposition
 Projections
 Rotations
 MultiDimensional Scaling Plots
 Principal Component Analysis

Basic Machine Learning
 Clustering
 Conditional Probabilities and Expectations
 Smoothing
 Bin Smoothing
 Loess
 Class Prediction
 Crossvalidation

Batch Effects
 Confounding
 Confounding: Highthroughput Example
 Discovering Batch Effects with EDA
 Gene Expression Data
 Motivation for Statistical Approaches
 Adjusting for Batch Effects with Linear Models
 Factor Analysis
 Modeling Batch Effects with Factor Analysis
The Leanpub 45day 100% Happiness Guarantee
Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms...