###### Methods in Biostatistics with R

This book is 90% complete

Last updated on 2018-07-23

# About the Book

Biostatistics is easy to teach poorly. Too often, books focus on methodology with no emphasis on programming and practical implementations. In contrast, books focused on R programming and visualization rarely discuss foundational topics that provide the infrastructure needed by data analysts to make decisions, evaluate analytic tools, and get ready for new and unforeseen challenges. Thus, we are bridging this divide that had no reason to exist in the first place. The book is unapologetic about its focus on Biostatistics, that is Statistics with Biological, Public Health, and Medical applications, though we think that it could be used successfully for large Statistical and Data Science Courses. Data and code can be downloaded here: https://github.com/muschellij2/biostatmethods

#### Table of Contents

# 1 Introduction

## 1.1 Biostatistics

## 1.2 Mathematical prerequisites

## 1.3 R

# 2 Introduction to R

## 2.1 R and RStudio

## 2.2 Reading R code

## 2.3 R Syntax and Jargon

## 2.4 Objects

## 2.5 Assignment

## 2.6 Data Types

## 2.7 Data Containers

## 2.8 Logical Operations

## 2.9 Subsetting

## 2.10 Reassigment

## 2.11 Libraries and Packages

## 2.12 dplyr, ggplot2, and the tidyverse

## 2.13 Problems

# 3 Probability, random variables, distributions

## 3.1 Experiments

## 3.2 An intuitive introduction to the bootstrap

## 3.3 Probability

## 3.4 Probability calculus

## 3.5 Sampling in R

## 3.6 Random variables

## 3.7 Probability mass

## 3.8 Probability density function

## 3.9 Cumulative distribution function

## 3.10 Quantiles

## 3.11 Problems

## 3.12 Supplementary R training

# 4 Mean and Variance

## 4.1 Mean or expected value

## 4.2 Sample mean and bias

## 4.3 Variance, standard deviation, coefficient of variation

## 4.4 Variance interpretation: Chebyshev’s inequality

## 4.5 Supplementary R training

## 4.6 Problems

# 5 Random vectors, independence, covariance, and sample mean

## 5.1 Random vectors

## 5.2 Independent events and variables

## 5.3 Covariance and correlation

## 5.4 Variance of sums of variables

## 5.5 Sample variance

## 5.6 Mixture of distributions

## 5.7 Problems

# 6 Conditional distribution, Bayes’ rule, ROC

## 6.1 Conditional probabilities

## 6.2 Bayes rule

## 6.3 ROC and AUC

## 6.4 Problems

# 7 Likelihood

## 7.1 Likelihood definition and interpretation

## 7.2 Maximum likelihood

## 7.3 Interpreting likelihood ratios

## 7.4 Likelihood for multiple parameters

## 7.5 Profile likelihood

## 7.6 Problems

# 8 Data visualization

## 8.1 Standard visualization tools

## 8.2 Problems

# 9 Approximation results and confidence intervals

## 9.1 Limits

## 9.2 Law of Large Numbers (LLN)

## 9.3 Central Limit Theorem (CLT)

## 9.4 Confidence intervals

## 9.5 Problems

# 10 The χ 2 and t distributions

## 10.1 The χ 2 distribution

## 10.2 Confidence intervals for the variance of a Normal

## 10.3 Student’s t distribution

## 10.4 Confidence intervals for Normal means

## 10.5 Problems

# 11 t and F tests

## 11.1 Independent group t confidence intervals

## 11.2 t intervals for unequal variances

## 11.3 t-tests and confidence intervals in R

## 11.4 The F distribution

## 11.5 Confidence intervals and testing for variance ratios of Normal distributions

## 11.6 Problems

# 12 Data Resampling Techniques

## 12.1 The jackknife

## 12.2 Bootstrap

## 12.3 Problems

# 13 Taking logs of data

## 13.1 Brief review

## 13.2 Taking logs of data

## 13.3 Interpreting logged data

## 13.4 Inference for the Geometric Mean

## 13.5 Summary

## 13.6 Problems

# 14 Interval estimation for binomial probabilities

## 14.1 Introduction

## 14.2 The Wald interval

## 14.3 Bayesian intervals

## 14.4 Connections with the Agresti/Coull interval

## 14.5 Conducting Bayesian inference

## 14.6 The exact, Clopper-Pearson method

## 14.7 Confidence intervals in R

## 14.8 Problems

# 15 Building a Figure in ggplot2

## 15.1 The qplot function

## 15.2 The ggplot function

## 15.3 Making plots better

## 15.4 Make the Axes/Labels Bigger

## 15.5 Make the Labels to be full names

## 15.6 Making a better legend

## 15.7 Legend INSIDE the plot

## 15.8 Saving figures: devices

## 15.9 Interactive graphics with one function

## 15.10 Conclusions

## 15.11 Problems

# 16 Hypothesis testing

## 16.1 Introduction

## 16.2 General hypothesis tests

## 16.3 Connection with confidence intervals

## 16.4 Data Example

## 16.5 P-values

## 16.6 Discussion

## 16.7 Problems

# 17 Power

## 17.1 Introduction

## 17.2 Standard normal power calculations

## 17.3 Power for the t test

## 17.4 Discussion

## 17.5 Problems

# 18 R Programming in the Tidyverse

## 18.1 Data objects in the tidyverse: tibbles

## 18.2 dplyr: pliers for manipulating data

## 18.3 Grouping data

## 18.4 Summarizing grouped

## 18.5 Merging Data Sets

## 18.6 Left Join

## 18.7 Right Join

## 18.8 Right Join: Switching arguments

## 18.9 Full Join

## 18.10 Reshaping Data Sets

## 18.11 Recoding Variables

## 18.12 Cleaning strings: the stringr package

## 18.13 Problems

# 19 Sample size calculations

## 19.1 Introduction

## 19.2 Sample size calculation for continuous data

## 19.3 Sample size calculation for binary data

## 19.4 Sample size calculations using exact tests

## 19.5 Sample size calculation with preliminary data

## 19.6 Problems

# 20 References

### The Leanpub 45-day 100% Happiness Guarantee

Within **45 days of purchase** you can get a **100% refund** on any Leanpub purchase, in **two clicks**.

See full terms...