###### Methods in Biostatistics with R

### Methods in Biostatistics with R

###### A Rigorous and Practical Treatment of Biostatistics Foundations using R

# About the Book

Biostatistics is easy to teach poorly. Too often, books focus on methodology with no emphasis on programming and practical implementations. In contrast, books focused on R programming and visualization rarely discuss foundational topics that provide the infrastructure needed by data analysts to make decisions, evaluate analytic tools, and get ready for new and unforeseen challenges. Thus, we are bridging this divide that had no reason to exist in the first place. The book is unapologetic about its focus on Biostatistics, that is Statistics with Biological, Public Health, and Medical applications, though we think that it could be used successfully for large Statistical and Data Science Courses. Data and code can be downloaded here: https://github.com/muschellij2/biostatmethods

#### Table of Contents

# 1 Introduction

## 1.1 Biostatistics

## 1.2 Mathematical prerequisites

## 1.3 R

# 2 Introduction to R

## 2.1 R and RStudio

## 2.2 Reading R code

## 2.3 R Syntax and Jargon

## 2.4 Objects

## 2.5 Assignment

## 2.6 Data Types

## 2.7 Data Containers

## 2.8 Logical Operations

## 2.9 Subsetting

## 2.10 Reassigment

## 2.11 Libraries and Packages

## 2.12 dplyr, ggplot2, and the tidyverse

## 2.13 Problems

# 3 Probability, random variables, distributions

## 3.1 Experiments

## 3.2 An intuitive introduction to the bootstrap

## 3.3 Probability

## 3.4 Probability calculus

## 3.5 Sampling in R

## 3.6 Random variables

## 3.7 Probability mass

## 3.8 Probability density function

## 3.9 Cumulative distribution function

## 3.10 Quantiles

## 3.11 Problems

## 3.12 Supplementary R training

# 4 Mean and Variance

## 4.1 Mean or expected value

## 4.2 Sample mean and bias

## 4.3 Variance, standard deviation, coefficient of variation

## 4.4 Variance interpretation: Chebyshev’s inequality

## 4.5 Supplementary R training

## 4.6 Problems

# 5 Random vectors, independence, covariance, and sample mean

## 5.1 Random vectors

## 5.2 Independent events and variables

## 5.3 Covariance and correlation

## 5.4 Variance of sums of variables

## 5.5 Sample variance

## 5.6 Mixture of distributions

## 5.7 Problems

# 6 Conditional distribution, Bayes’ rule, ROC

## 6.1 Conditional probabilities

## 6.2 Bayes rule

## 6.3 ROC and AUC

## 6.4 Problems

# 7 Likelihood

## 7.1 Likelihood definition and interpretation

## 7.2 Maximum likelihood

## 7.3 Interpreting likelihood ratios

## 7.4 Likelihood for multiple parameters

## 7.5 Profile likelihood

## 7.6 Problems

# 8 Data visualization

## 8.1 Standard visualization tools

## 8.2 Problems

# 9 Approximation results and confidence intervals

## 9.1 Limits

## 9.2 Law of Large Numbers (LLN)

## 9.3 Central Limit Theorem (CLT)

## 9.4 Confidence intervals

## 9.5 Problems

# 10 The χ 2 and t distributions

## 10.1 The χ 2 distribution

## 10.2 Confidence intervals for the variance of a Normal

## 10.3 Student’s t distribution

## 10.4 Confidence intervals for Normal means

## 10.5 Problems

# 11 t and F tests

## 11.1 Independent group t confidence intervals

## 11.2 t intervals for unequal variances

## 11.3 t-tests and confidence intervals in R

## 11.4 The F distribution

## 11.5 Confidence intervals and testing for variance ratios of Normal distributions

## 11.6 Problems

# 12 Data Resampling Techniques

## 12.1 The jackknife

## 12.2 Bootstrap

## 12.3 Problems

# 13 Taking logs of data

## 13.1 Brief review

## 13.2 Taking logs of data

## 13.3 Interpreting logged data

## 13.4 Inference for the Geometric Mean

## 13.5 Summary

## 13.6 Problems

# 14 Interval estimation for binomial probabilities

## 14.1 Introduction

## 14.2 The Wald interval

## 14.3 Bayesian intervals

## 14.4 Connections with the Agresti/Coull interval

## 14.5 Conducting Bayesian inference

## 14.6 The exact, Clopper-Pearson method

## 14.7 Confidence intervals in R

## 14.8 Problems

# 15 Building a Figure in ggplot2

## 15.1 The qplot function

## 15.2 The ggplot function

## 15.3 Making plots better

## 15.4 Make the Axes/Labels Bigger

## 15.5 Make the Labels to be full names

## 15.6 Making a better legend

## 15.7 Legend INSIDE the plot

## 15.8 Saving figures: devices

## 15.9 Interactive graphics with one function

## 15.10 Conclusions

## 15.11 Problems

# 16 Hypothesis testing

## 16.1 Introduction

## 16.2 General hypothesis tests

## 16.3 Connection with confidence intervals

## 16.4 Data Example

## 16.5 P-values

## 16.6 Discussion

## 16.7 Problems

# 17 Power

## 17.1 Introduction

## 17.2 Standard normal power calculations

## 17.3 Power for the t test

## 17.4 Discussion

## 17.5 Problems

# 18 R Programming in the Tidyverse

## 18.1 Data objects in the tidyverse: tibbles

## 18.2 dplyr: pliers for manipulating data

## 18.3 Grouping data

## 18.4 Summarizing grouped

## 18.5 Merging Data Sets

## 18.6 Left Join

## 18.7 Right Join

## 18.8 Right Join: Switching arguments

## 18.9 Full Join

## 18.10 Reshaping Data Sets

## 18.11 Recoding Variables

## 18.12 Cleaning strings: the stringr package

## 18.13 Problems

# 19 Sample size calculations

## 19.1 Introduction

## 19.2 Sample size calculation for continuous data

## 19.3 Sample size calculation for binary data

## 19.4 Sample size calculations using exact tests

## 19.5 Sample size calculation with preliminary data

## 19.6 Problems

# 20 References

### The Leanpub 60-day 100% Happiness Guarantee

Within **60 days of purchase** you can get a **100% refund** on any Leanpub purchase, in **two clicks**.

See full terms

### 80% Royalties. Earn $16 on a $20 book.

#### We pay **80% royalties**. That's not a typo: **you earn $16 on a $20 sale**. If we sell **5000** non-refunded copies of your book or course for **$20**, you'll earn **$80,000**.

*(Yes, some authors have already earned much more than that on Leanpub.)*

In fact, authors have earnedover $12 million USDwriting, publishing and selling on Leanpub.

**Learn more about writing on Leanpub**

### Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them