Methods in Biostatistics with R
Methods in Biostatistics with R
A Rigorous and Practical Treatment of Biostatistics Foundations using R
About the Book
Biostatistics is easy to teach poorly. Too often, books focus on methodology with no emphasis on programming and practical implementations. In contrast, books focused on R programming and visualization rarely discuss foundational topics that provide the infrastructure needed by data analysts to make decisions, evaluate analytic tools, and get ready for new and unforeseen challenges. Thus, we are bridging this divide that had no reason to exist in the first place. The book is unapologetic about its focus on Biostatistics, that is Statistics with Biological, Public Health, and Medical applications, though we think that it could be used successfully for large Statistical and Data Science Courses. Data and code can be downloaded here: https://github.com/muschellij2/biostatmethods
Table of Contents
1 Introduction
1.1 Biostatistics
1.2 Mathematical prerequisites
1.3 R
2 Introduction to R
2.1 R and RStudio
2.2 Reading R code
2.3 R Syntax and Jargon
2.4 Objects
2.5 Assignment
2.6 Data Types
2.7 Data Containers
2.8 Logical Operations
2.9 Subsetting
2.10 Reassigment
2.11 Libraries and Packages
2.12 dplyr, ggplot2, and the tidyverse
2.13 Problems
3 Probability, random variables, distributions
3.1 Experiments
3.2 An intuitive introduction to the bootstrap
3.3 Probability
3.4 Probability calculus
3.5 Sampling in R
3.6 Random variables
3.7 Probability mass
3.8 Probability density function
3.9 Cumulative distribution function
3.10 Quantiles
3.11 Problems
3.12 Supplementary R training
4 Mean and Variance
4.1 Mean or expected value
4.2 Sample mean and bias
4.3 Variance, standard deviation, coefficient of variation
4.4 Variance interpretation: Chebyshev’s inequality
4.5 Supplementary R training
4.6 Problems
5 Random vectors, independence, covariance, and sample mean
5.1 Random vectors
5.2 Independent events and variables
5.3 Covariance and correlation
5.4 Variance of sums of variables
5.5 Sample variance
5.6 Mixture of distributions
5.7 Problems
6 Conditional distribution, Bayes’ rule, ROC
6.1 Conditional probabilities
6.2 Bayes rule
6.3 ROC and AUC
6.4 Problems
7 Likelihood
7.1 Likelihood definition and interpretation
7.2 Maximum likelihood
7.3 Interpreting likelihood ratios
7.4 Likelihood for multiple parameters
7.5 Profile likelihood
7.6 Problems
8 Data visualization
8.1 Standard visualization tools
8.2 Problems
9 Approximation results and confidence intervals
9.1 Limits
9.2 Law of Large Numbers (LLN)
9.3 Central Limit Theorem (CLT)
9.4 Confidence intervals
9.5 Problems
10 The χ 2 and t distributions
10.1 The χ 2 distribution
10.2 Confidence intervals for the variance of a Normal
10.3 Student’s t distribution
10.4 Confidence intervals for Normal means
10.5 Problems
11 t and F tests
11.1 Independent group t confidence intervals
11.2 t intervals for unequal variances
11.3 t-tests and confidence intervals in R
11.4 The F distribution
11.5 Confidence intervals and testing for variance ratios of Normal distributions
11.6 Problems
12 Data Resampling Techniques
12.1 The jackknife
12.2 Bootstrap
12.3 Problems
13 Taking logs of data
13.1 Brief review
13.2 Taking logs of data
13.3 Interpreting logged data
13.4 Inference for the Geometric Mean
13.5 Summary
13.6 Problems
14 Interval estimation for binomial probabilities
14.1 Introduction
14.2 The Wald interval
14.3 Bayesian intervals
14.4 Connections with the Agresti/Coull interval
14.5 Conducting Bayesian inference
14.6 The exact, Clopper-Pearson method
14.7 Confidence intervals in R
14.8 Problems
15 Building a Figure in ggplot2
15.1 The qplot function
15.2 The ggplot function
15.3 Making plots better
15.4 Make the Axes/Labels Bigger
15.5 Make the Labels to be full names
15.6 Making a better legend
15.7 Legend INSIDE the plot
15.8 Saving figures: devices
15.9 Interactive graphics with one function
15.10 Conclusions
15.11 Problems
16 Hypothesis testing
16.1 Introduction
16.2 General hypothesis tests
16.3 Connection with confidence intervals
16.4 Data Example
16.5 P-values
16.6 Discussion
16.7 Problems
17 Power
17.1 Introduction
17.2 Standard normal power calculations
17.3 Power for the t test
17.4 Discussion
17.5 Problems
18 R Programming in the Tidyverse
18.1 Data objects in the tidyverse: tibbles
18.2 dplyr: pliers for manipulating data
18.3 Grouping data
18.4 Summarizing grouped
18.5 Merging Data Sets
18.6 Left Join
18.7 Right Join
18.8 Right Join: Switching arguments
18.9 Full Join
18.10 Reshaping Data Sets
18.11 Recoding Variables
18.12 Cleaning strings: the stringr package
18.13 Problems
19 Sample size calculations
19.1 Introduction
19.2 Sample size calculation for continuous data
19.3 Sample size calculation for binary data
19.4 Sample size calculations using exact tests
19.5 Sample size calculation with preliminary data
19.6 Problems
20 References
Authors have earned$9,891,994writing, publishing and selling on Leanpub, earning 80% royalties while saving up to 25 million pounds of CO2 and up to 46,000 trees.
Learn more about writing on Leanpub
The Leanpub 45-day 100% Happiness Guarantee
Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.
See full terms
Free Updates. DRM Free.
If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).
Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.
Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.
Learn more about Leanpub's ebook formats and where to read them
Top Books
C++ Best Practices
Jason TurnerLevel up your C++, get the tools working for you, eliminate common problems, and move on to more exciting things!
OpenIntro Statistics
David Diez, Christopher Barr, Mine Cetinkaya-Rundel, and OpenIntroA complete foundation for Statistics, also serving as a foundation for Data Science.
Leanpub revenue supports OpenIntro (US-based nonprofit) so we can provide free desk copies to teachers interested in using OpenIntro Statistics in the classroom and expand the project to support free textbooks in other subjects.
More resources: openintro.org.
Functional Design and Architecture
Alexander GraninSoftware Design in Functional Programming, Design Patterns and Practices, Methodologies and Application Architectures. How to build real software in Haskell with less efforts and low risks. The first complete source of knowledge.
Atomic Kotlin
Bruce Eckel and Svetlana IsakovaFor both beginning and experienced programmers! From the author of the multi-award-winning Thinking in C++ and Thinking in Java together with a member of the Kotlin language team comes a book that breaks the concepts into small, easy-to-digest "atoms," along with exercises supported by hints and solutions directly inside IntelliJ IDEA!
C++20
Rainer GrimmC++20 is the next big C++ standard after C++11. As C++11 did it, C++20 changes the way we program modern C++. This change is, in particular, due to the big four of C++20: ranges, coroutines, concepts, and modules.
The book is almost daily updated. These incremental updates ease my interaction with the proofreaders.
Java OOP Done Right
Alan MellorObject Oriented Programming is still a great way to create clean, maintainable code. But only if you use it right.
This book gives you 25 years of OO best practice, ready to use.
You'll learn to design objects behaviour-first, use TDD to help, then confidently apply Design Patterns, SOLID principles and Refactoring to make clean, crafted code.
Invest In Digital Health - The Medical Futurist's Guide
Dr. Bertalan MeskoArtificial Intelligence and Digital Health are booming. In this book, we explain why now it's a good time to invest in Digital Health and give recommendations on where to invest by looking at the top 24 technological trends we find the most promising.
Ansible for DevOps
Jeff GeerlingAnsible is a simple, but powerful, server and configuration management tool. Learn to use Ansible effectively, whether you manage one server—or thousands.
R Programming for Data Science
Roger D. PengThis book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science. Printed copies of this book are available through Lulu.
Introductory Statistics with Randomization and Simulation
Mine Cetinkaya-Rundel, Christopher Barr, OpenIntro, and David DiezA complete foundation for Statistics, also serving as a foundation for Data Science, that introduces inference using randomization and simulation while covering traditional methods.
Leanpub revenue supports OpenIntro, so we can provide free desk copies to teachers interested in using our books in the classroom.
More resources: openintro.org.
Top Bundles
- #1
Software Architecture for Developers: Volumes 1 & 2 - Technical leadership and communication
2 Books
"Software Architecture for Developers" is a practical and pragmatic guide to modern, lightweight software architecture, specifically aimed at developers. You'll learn:The essence of software architecture.Why the software architecture role should include coding, coaching and collaboration.The things that you really need to think about before... - #2
CCIE Service Provider Ultimate Study Bundle
2 Books
Piotr Jablonski, Lukasz Bromirski, and Nick Russo have joined forces to deliver the only CCIE Service Provider training resource you'll ever need. This bundle contains a detailed and challenging collection of workbook labs, plus an extensively detailed technical reference guide. All of us have earned the CCIE Service Provider certification... - #3
Cisco CCNA 200-301 Complet
4 Books
Ce lot comprend les quatre volumes du guide préparation à l'examen de certification Cisco CCNA 200-301. - #4
CCDE Practical Studies (All labs)
3 Books
CCDE lab - #5
Modern Management Made Easy
3 Books
Read all three Modern Management Made Easy books. Learn to manage yourself, lead and serve others, and lead the organization. - #6
The Future of Digital Health
6 Books
We put together the most popular books from The Medical Futurist to provide a clear picture about the major trends shaping the future of medicine and healthcare. Digital health technologies, artificial intelligence, the future of 20 medical specialties, big pharma, data privacy and how technology giants such as Amazon or Google want to conquer... - #7
Modern C++ by Nicolai Josuttis
2 Books
- #8
"The C++ Standard Library" and "Concurrency with Modern C++"
2 Books
Get my books "The C++ Standard Library" and "Concurrency with Modern C++" in a bundle. The first book gives you the details you should know about the C++ standard library; the second one dives deeper into concurrency with modern C++. In sum, you get more than 600 pages full of modern C++ and about 250 source files presenting the standard library... - #9
Linux Administration Complet
4 Books
Ce lot comprend les quatre volumes du Guide Linux Administration :Linux Administration, Volume 1, Administration fondamentale : Guide pratique de préparation aux examens de certification LPIC 1, Linux Essentials, RHCSA et LFCS. Administration fondamentale. Introduction à Linux. Le Shell. Traitement du texte. Arborescence de fichiers. Sécurité... - #10
Advanced Product Management
3 Books
Get The Art of Strategy, Product Discovery and Lean Product Management with a 20% discount.