Methods in Biostatistics with R
Methods in Biostatistics with R
Minimum price
Suggested price
Methods in Biostatistics with R

This book is 90% complete

Last updated on 2018-07-23

About the Book

Biostatistics is easy to teach poorly. Too often, books focus on methodology with no emphasis on programming and practical implementations. In contrast, books focused on R programming and visualization rarely discuss foundational topics that provide the infrastructure needed by data analysts to make decisions, evaluate analytic tools, and get ready for new and unforeseen challenges. Thus, we are bridging this divide that had no reason to exist in the first place. The book is unapologetic about its focus on Biostatistics, that is Statistics with Biological, Public Health, and Medical applications, though we think that it could be used successfully for large Statistical and Data Science Courses. Data and code can be downloaded here:

Table of Contents

1      Introduction

1.1  Biostatistics

1.2  Mathematical prerequisites

1.3  R

2      Introduction to R

2.1  R and RStudio

2.2  Reading R code

2.3  R Syntax and Jargon

2.4  Objects

2.5  Assignment

2.6  Data Types

2.7  Data Containers

2.8  Logical Operations

2.9  Subsetting

2.10 Reassigment

2.11 Libraries and Packages

2.12 dplyr, ggplot2, and the tidyverse

2.13 Problems

3      Probability, random variables, distributions

3.1  Experiments

3.2  An intuitive introduction to the bootstrap

3.3  Probability

3.4  Probability calculus

3.5  Sampling in R

3.6  Random variables

3.7  Probability mass

3.8  Probability density function

3.9  Cumulative distribution function

3.10 Quantiles

3.11 Problems

3.12 Supplementary R training

4      Mean and Variance

4.1  Mean or expected value

4.2  Sample mean and bias

4.3  Variance, standard deviation, coefficient of variation

4.4  Variance interpretation: Chebyshev’s inequality

4.5  Supplementary R training

4.6  Problems

5      Random vectors, independence, covariance, and sample mean

5.1  Random vectors

5.2  Independent events and variables

5.3  Covariance and correlation

5.4  Variance of sums of variables

5.5  Sample variance

5.6  Mixture of distributions

5.7  Problems

6      Conditional distribution, Bayes’ rule, ROC

6.1  Conditional probabilities

6.2  Bayes rule

6.3  ROC and AUC

6.4  Problems

7      Likelihood

7.1  Likelihood definition and interpretation

7.2  Maximum likelihood

7.3  Interpreting likelihood ratios

7.4  Likelihood for multiple parameters

7.5  Profile likelihood

7.6  Problems

8      Data visualization

8.1  Standard visualization tools

8.2  Problems

9      Approximation results and confidence intervals

9.1  Limits

9.2  Law of Large Numbers (LLN)

9.3  Central Limit Theorem (CLT)

9.4  Confidence intervals

9.5  Problems

10   The χ 2 and t distributions

10.1 The χ 2 distribution

10.2 Confidence intervals for the variance of a Normal

10.3 Student’s t distribution

10.4 Confidence intervals for Normal means

10.5 Problems

11   t and F tests

11.1 Independent group t confidence intervals

11.2 t intervals for unequal variances

11.3 t-tests and confidence intervals in R

11.4 The F distribution

11.5 Confidence intervals and testing for variance ratios of Normal distributions

11.6 Problems

12   Data Resampling Techniques

12.1 The jackknife

12.2 Bootstrap

12.3 Problems

13   Taking logs of data

13.1 Brief review

13.2 Taking logs of data

13.3 Interpreting logged data

13.4 Inference for the Geometric Mean

13.5 Summary

13.6 Problems

14   Interval estimation for binomial probabilities

14.1 Introduction

14.2 The Wald interval

14.3 Bayesian intervals

14.4 Connections with the Agresti/Coull interval

14.5 Conducting Bayesian inference

14.6 The exact, Clopper-Pearson method

14.7 Confidence intervals in R

14.8 Problems

15   Building a Figure in ggplot2

15.1 The qplot function

15.2 The ggplot function

15.3 Making plots better

15.4 Make the Axes/Labels Bigger

15.5 Make the Labels to be full names

15.6 Making a better legend

15.7 Legend INSIDE the plot

15.8 Saving figures: devices

15.9 Interactive graphics with one function

15.10 Conclusions

15.11 Problems

16   Hypothesis testing

16.1 Introduction

16.2 General hypothesis tests

16.3 Connection with confidence intervals

16.4 Data Example

16.5 P-values

16.6 Discussion

16.7 Problems

17   Power

17.1 Introduction

17.2 Standard normal power calculations

17.3 Power for the t test

17.4 Discussion

17.5 Problems

18   R Programming in the Tidyverse

18.1 Data objects in the tidyverse: tibbles

18.2 dplyr: pliers for manipulating data

18.3 Grouping data

18.4 Summarizing grouped

18.5 Merging Data Sets

18.6 Left Join

18.7 Right Join

18.8 Right Join: Switching arguments

18.9 Full Join

18.10 Reshaping Data Sets

18.11 Recoding Variables

18.12 Cleaning strings: the stringr package

18.13 Problems

19   Sample size calculations

19.1 Introduction

19.2 Sample size calculation for continuous data

19.3 Sample size calculation for binary data

19.4 Sample size calculations using exact tests

19.5 Sample size calculation with preliminary data

19.6 Problems

20   References

About the Authors

Ciprian Crainiceanu
Ciprian Crainiceanu

Ciprian Crainiceanu, PhD received his doctorate in statistics from Cornell University in 2003 and is a Professor of Biostatistics at Johns Hopkins University. He has taught the Master level Methods in Biostatistics course using and expanding on materials borrowed from Dr. Caffo, who, in turn, distilled materials developed over many years by other Johns Hopkins University Biostatistics faculty. Dr. Crainiceanu is a generalist, who likes to work in many different scientific areas. He has specialized in wearable and implantable technology (WIT) with application to health studies and Neuroimaging, especially in structural magnetic resonance imaging (MRI) and computed tomography (CT) with application to clinical studies. Drs. Crainiceanu and Caffo are the co-founders and co-directors of the Statistical Methods and Applications for Research in Technology ([SMART]( research group.

Brian Caffo
Brian Caffo

Brian Caffo, PhD is a professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. Along with Roger Peng and Jeff Leek, Dr. Caffo created the Data Science Specialization on Coursera. Dr. Caffo is leading  expert in statistics and biostatistics and is the recipient of the PECASE award, the highest honor given by the US Government for early career scientists and engineers.

John Muschelli
John Muschelli

I am an Assistant Scientist in the Department of Biostatistics at Johns Hopkins Bloomberg School of Public Health. I focus on data science, including teaching a few courses and creating a number of R packages, and the analysis of neuroimaging data.

The Leanpub 45-day 100% Happiness Guarantee

Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms...

Write and Publish on Leanpub

Authors and publishers use Leanpub to publish amazing in-progress and completed ebooks, just like this one. You can use Leanpub to write, publish and sell your book as well! Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks. Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. It really is that easy.

Learn more about writing on Leanpub