Methods in Biostatistics with R

A Rigorous and Practical Treatment of Biostatistics Foundations using R

Brian Caffo,

John Muschelli, and

Ciprian Crainiceanu

Brian Caffo,

John Muschelli, and

Ciprian Crainiceanu

The book provides a modern look at introductory Biostatistical concepts and the associated computational tools using the latest developments in computation and visualization in the R language environment. The book includes practical data analysis based on datasets that can be downloaded here: https://github.com/muschellij2/biostatmethods.

Brian Caffo,

John Muschelli, and

Ciprian Crainiceanu

Minimum price

$9.99

$20.00

You pay

$20.00

Authors earn

$16.00

PDF

EPUB

2,145

Readers

592

Pages

About

About the Book

Biostatistics is easy to teach poorly. Too often, books focus on methodology with no emphasis on programming and practical implementations. In contrast, books focused on R programming and visualization rarely discuss foundational topics that provide the infrastructure needed by data analysts to make decisions, evaluate analytic tools, and get ready for new and unforeseen challenges. Thus, we are bridging this divide that had no reason to exist in the first place. The book is unapologetic about its focus on Biostatistics, that is Statistics with Biological, Public Health, and Medical applications, though we think that it could be used successfully for large Statistical and Data Science Courses. Data and code can be downloaded here: https://github.com/muschellij2/biostatmethods

Share this book

Feedback

Email the Authors

Installments completed

1 / 3

Author

About the Authors

Brian Caffo

Brian Caffo, PhD is a professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. Along with Roger Peng and Jeff Leek, Dr. Caffo created the Data Science Specialization on Coursera. Dr. Caffo is leading expert in statistics and biostatistics and is the recipient of the PECASE award, the highest honor given by the US Government for early career scientists and engineers.

Episode 21

An Interview with Brian Caffo

John Muschelli

I am an Assistant Scientist in the Department of Biostatistics at Johns Hopkins Bloomberg School of Public Health. I focus on data science, including teaching a few courses and creating a number of R packages, and the analysis of neuroimaging data.

Ciprian Crainiceanu

Ciprian Crainiceanu, PhD received his doctorate in statistics from Cornell University in 2003 and is a Professor of Biostatistics at Johns Hopkins University. He has taught the Master level Methods in Biostatistics course using and expanding on materials borrowed from Dr. Caffo, who, in turn, distilled materials developed over many years by other Johns Hopkins University Biostatistics faculty. Dr. Crainiceanu is a generalist, who likes to work in many different scientific areas. He has specialized in wearable and implantable technology (WIT) with application to health studies and Neuroimaging, especially in structural magnetic resonance imaging (MRI) and computed tomography (CT) with application to clinical studies. Drs. Crainiceanu and Caffo are the co-founders and co-directors of the Statistical Methods and Applications for Research in Technology ([SMART](http://www.smart-stats.org/)) research group.

Table of Contents

1 Introduction

1.1 Biostatistics

1.2 Mathematical prerequisites

1.3 R

2 Introduction to R

2.1 R and RStudio

2.2 Reading R code

2.3 R Syntax and Jargon

2.4 Objects

2.5 Assignment

2.6 Data Types

2.7 Data Containers

2.8 Logical Operations

2.9 Subsetting

2.10 Reassigment

2.11 Libraries and Packages

2.12 dplyr, ggplot2, and the tidyverse

2.13 Problems

3 Probability, random variables, distributions

3.1 Experiments

3.2 An intuitive introduction to the bootstrap

3.3 Probability

3.4 Probability calculus

3.5 Sampling in R

3.6 Random variables

3.7 Probability mass

3.8 Probability density function

3.9 Cumulative distribution function

3.10 Quantiles

3.11 Problems

3.12 Supplementary R training

4 Mean and Variance

4.1 Mean or expected value

4.2 Sample mean and bias

4.3 Variance, standard deviation, coefficient of variation

4.4 Variance interpretation: Chebyshev’s inequality

4.5 Supplementary R training

4.6 Problems

5 Random vectors, independence, covariance, and sample mean

5.1 Random vectors

5.2 Independent events and variables

5.3 Covariance and correlation

5.4 Variance of sums of variables

5.5 Sample variance

5.6 Mixture of distributions

5.7 Problems

6 Conditional distribution, Bayes’ rule, ROC

6.1 Conditional probabilities

6.2 Bayes rule

6.3 ROC and AUC

6.4 Problems

7 Likelihood

7.1 Likelihood definition and interpretation

7.2 Maximum likelihood

7.3 Interpreting likelihood ratios

7.4 Likelihood for multiple parameters

7.5 Profile likelihood

7.6 Problems

8 Data visualization

8.1 Standard visualization tools

8.2 Problems

9 Approximation results and confidence intervals

9.1 Limits

9.2 Law of Large Numbers (LLN)

9.3 Central Limit Theorem (CLT)

9.4 Confidence intervals

9.5 Problems

10 The χ 2 and t distributions

10.1 The χ 2 distribution

10.2 Confidence intervals for the variance of a Normal

10.3 Student’s t distribution

10.4 Confidence intervals for Normal means

10.5 Problems

11 t and F tests

11.1 Independent group t confidence intervals

11.2 t intervals for unequal variances

11.3 t-tests and confidence intervals in R

11.4 The F distribution

11.5 Confidence intervals and testing for variance ratios of Normal distributions

11.6 Problems

12 Data Resampling Techniques

12.1 The jackknife

12.2 Bootstrap

12.3 Problems

13 Taking logs of data

13.1 Brief review

13.2 Taking logs of data

13.3 Interpreting logged data

13.4 Inference for the Geometric Mean

13.5 Summary

13.6 Problems

14 Interval estimation for binomial probabilities

14.1 Introduction

14.2 The Wald interval

14.3 Bayesian intervals

14.4 Connections with the Agresti/Coull interval

14.5 Conducting Bayesian inference

14.6 The exact, Clopper-Pearson method

14.7 Confidence intervals in R

14.8 Problems

15 Building a Figure in ggplot2

15.1 The qplot function

15.2 The ggplot function

15.3 Making plots better

15.4 Make the Axes/Labels Bigger

15.5 Make the Labels to be full names

15.6 Making a better legend

15.7 Legend INSIDE the plot

15.8 Saving figures: devices

15.9 Interactive graphics with one function

15.10 Conclusions

15.11 Problems

16 Hypothesis testing

16.1 Introduction

16.2 General hypothesis tests

16.3 Connection with confidence intervals

16.4 Data Example

16.5 P-values

16.6 Discussion

16.7 Problems

17 Power

17.1 Introduction

17.2 Standard normal power calculations

17.3 Power for the t test

17.4 Discussion

17.5 Problems

18 R Programming in the Tidyverse

18.1 Data objects in the tidyverse: tibbles

18.2 dplyr: pliers for manipulating data

18.3 Grouping data

18.4 Summarizing grouped

18.5 Merging Data Sets

18.6 Left Join

18.7 Right Join

18.8 Right Join: Switching arguments

18.9 Full Join

18.10 Reshaping Data Sets

18.11 Recoding Variables

18.12 Cleaning strings: the stringr package

18.13 Problems

19 Sample size calculations

19.1 Introduction

19.2 Sample size calculation for continuous data

19.3 Sample size calculation for binary data

19.4 Sample size calculations using exact tests

19.5 Sample size calculation with preliminary data

19.6 Problems

20 References

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub

About

Share this book

Categories

Feedback

Installments completed

Author

Contents

1 Introduction

1.1 Biostatistics

1.2 Mathematical prerequisites

1.3 R

2 Introduction to R

2.1 R and RStudio

2.2 Reading R code

2.3 R Syntax and Jargon

2.4 Objects

2.5 Assignment

2.6 Data Types

2.7 Data Containers

2.8 Logical Operations

2.9 Subsetting

2.10 Reassigment

2.11 Libraries and Packages

2.12 dplyr, ggplot2, and the tidyverse

2.13 Problems

3 Probability, random variables, distributions

3.1 Experiments

3.2 An intuitive introduction to the bootstrap

3.3 Probability

3.4 Probability calculus

3.5 Sampling in R

3.6 Random variables

3.7 Probability mass

3.8 Probability density function

3.9 Cumulative distribution function

3.10 Quantiles

3.11 Problems

3.12 Supplementary R training

4 Mean and Variance

4.1 Mean or expected value

4.2 Sample mean and bias

4.3 Variance, standard deviation, coefficient of variation

4.4 Variance interpretation: Chebyshev’s inequality

4.5 Supplementary R training

4.6 Problems

5 Random vectors, independence, covariance, and sample mean

5.1 Random vectors

5.2 Independent events and variables

5.3 Covariance and correlation

5.4 Variance of sums of variables

5.5 Sample variance

5.6 Mixture of distributions

5.7 Problems

6 Conditional distribution, Bayes’ rule, ROC

6.1 Conditional probabilities

6.2 Bayes rule

6.3 ROC and AUC

6.4 Problems

7 Likelihood

7.1 Likelihood definition and interpretation

7.2 Maximum likelihood

7.3 Interpreting likelihood ratios

7.4 Likelihood for multiple parameters

7.5 Profile likelihood

7.6 Problems

8 Data visualization

8.1 Standard visualization tools

8.2 Problems

9 Approximation results and confidence intervals

9.1 Limits

9.2 Law of Large Numbers (LLN)

9.3 Central Limit Theorem (CLT)

9.4 Confidence intervals

9.5 Problems

10 The χ 2 and t distributions

10.1 The χ 2 distribution

10.2 Confidence intervals for the variance of a Normal

10.3 Student’s t distribution

10.4 Confidence intervals for Normal means

10.5 Problems