The Hitchhiker's Guide to Linear Models
Free!
With Membership
$10.00
Suggested price

The Hitchhiker's Guide to Linear Models

Based on the famous R programming language

About the Book

This book aims to get straight to the point, and the only thing I assume here is that you have used spreadsheets at some point and that you are motivated to estimate linear models in R. Here I do not assume that you know how to install R or the basics of the R programming language.

About the Author

Mauricio 'Pacha' Vargas Sepúlveda
Mauricio 'Pacha' Vargas Sepúlveda

I am a PhD student in Political Science at the University of Toronto. My research interests include International Relations, Canadian Politics and Public Policy, with a focus on the politics of trade agreements and sanctions. I have a Master of Arts in Political Science from the University of Toronto, a Master of Science in Statistics from the Catholic University of Chile, and an Engineering degree from the University of Chile. I can provide my CV (academic/professional) upon request. You can reach me at m.sepulveda+removethis@mail.utoronto.ca (please remove the +removethis).

See my website (pacha.dev) and my blog (pacha.dev/blog).

Reader Testimonials

Claudia Negri-Ribalta
Claudia Negri-Ribalta

Université Paris 1 Panthéon-Sorbonne

I think it's great that you are teaching them how to write in R. My big problem was always that, the syntax.

Catherine Moez
Catherine Moez

University of Toronto

Grounded book. I like the UofT-related examples.

Badi H. Baltagi
Badi H. Baltagi

Syracuse University

This is a lot of work.

Table of Contents

  • I. Preface
  • II. R Setup
    • 1. R and RStudio
      • 1.1. Windows and Mac
      • 1.2. Linux
    • 2. Installing R
      • 2.1. Windows and Mac
      • 2.2. Linux
    • 3. Installing RStudio
      • 3.1. Windows and Mac
      • 3.2. Linux
    • 4. Installing R Packages
      • 4.1. Windows and Mac
      • 4.2. Linux
    • 5. Changing RStudio colors and font
      • 5.1. Windows and Mac
      • 5.2. Linux
    • 6. Installing Quarto
      • 6.1. Windows and Mac
      • 6.2. Linux
  • III. Linear algebra review
    • 1. Using R as a calculator
    • 2. System of linear equations
    • 3. Matrix
    • 4. Transpose matrix
    • 5. Matrix multiplication
    • 6. Matrix representation of a system of linear equations
    • 7. Identity matrix
    • 8. Inverse matrix
    • 9. Solving systems of linear equations
  • IV. Statistics review
    • 1. Using R as a calculator
      • 1.1. Mean
      • 1.2. Variance
      • 1.3. Standard deviation
      • 1.4. Covariance
      • 1.5. Correlation
      • 1.6. Normal distribution
      • 1.7. Poisson distribution
      • 1.8. Student's t-distribution
      • 1.9. Computing probabilities with the normal distribution
      • 1.10. Computing probabilities with the Poisson distribution
      • 1.11. Computing probabilities with the t-distribution
    • 2. Data and dataset
      • 2.1. Mean
      • 2.2. Variance
      • 2.3. Standard deviation
      • 2.4. Covariance
      • 2.5. Correlation
      • 2.6. Normal distribution
      • 2.7. Poisson distribution
      • 2.8. Student's t-distribution
      • 2.9. Computing probabilities with the normal distribution
      • 2.10. Computing probabilities with the Poisson distribution
      • 2.11. Computing probabilities with the t-distribution
    • 3. Summation
      • 3.1. Mean
      • 3.2. Variance
      • 3.3. Standard deviation
      • 3.4. Covariance
      • 3.5. Correlation
      • 3.6. Normal distribution
      • 3.7. Poisson distribution
      • 3.8. Student's t-distribution
      • 3.9. Computing probabilities with the normal distribution
      • 3.10. Computing probabilities with the Poisson distribution
      • 3.11. Computing probabilities with the t-distribution
    • 4. Probability
      • 4.1. Mean
      • 4.2. Variance
      • 4.3. Standard deviation
      • 4.4. Covariance
      • 4.5. Correlation
      • 4.6. Normal distribution
      • 4.7. Poisson distribution
      • 4.8. Student's t-distribution
      • 4.9. Computing probabilities with the normal distribution
      • 4.10. Computing probabilities with the Poisson distribution
      • 4.11. Computing probabilities with the t-distribution
    • 5. Descriptive statistics
      • 5.1. Mean
      • 5.2. Variance
      • 5.3. Standard deviation
      • 5.4. Covariance
      • 5.5. Correlation
      • 5.6. Normal distribution
      • 5.7. Poisson distribution
      • 5.8. Student's t-distribution
      • 5.9. Computing probabilities with the normal distribution
      • 5.10. Computing probabilities with the Poisson distribution
      • 5.11. Computing probabilities with the t-distribution
    • 6. Distributions
      • 6.1. Mean
      • 6.2. Variance
      • 6.3. Standard deviation
      • 6.4. Covariance
      • 6.5. Correlation
      • 6.6. Normal distribution
      • 6.7. Poisson distribution
      • 6.8. Student's t-distribution
      • 6.9. Computing probabilities with the normal distribution
      • 6.10. Computing probabilities with the Poisson distribution
      • 6.11. Computing probabilities with the t-distribution
    • 7. Sample size
      • 7.1. Mean
      • 7.2. Variance
      • 7.3. Standard deviation
      • 7.4. Covariance
      • 7.5. Correlation
      • 7.6. Normal distribution
      • 7.7. Poisson distribution
      • 7.8. Student's t-distribution
      • 7.9. Computing probabilities with the normal distribution
      • 7.10. Computing probabilities with the Poisson distribution
      • 7.11. Computing probabilities with the t-distribution
  • V. Recommended workflow
    • 1. Creating projects
    • 2. Creating scripts
    • 3. Creating notebooks
    • 4. Organizing code sections
    • 5. Customizing notebooks' output
  • VI. Read, Manipulate, and Plot Data
    • 1. The datasauRus dataset in R format
    • 2. The Quality of Government dataset in CSV format
    • 3. The Quality of Government dataset in SAV (SPSS) format
    • 4. The Quality of Government dataset in DTA (Stata) format
    • 5. The Freedom House dataset in XLSX (Excel) format
  • VII. Linear Model with One Explanatory Variable
    • 1. Model specification
      • 1.1. Linear model as correlation
      • 1.2. Linear model as matrix multiplication
      • 1.3. Relation between correlation and matrix multiplication
      • 1.4. Computational note
    • 2. The Galton dataset
      • 2.1. Linear model as correlation
      • 2.2. Linear model as matrix multiplication
      • 2.3. Relation between correlation and matrix multiplication
      • 2.4. Computational note
    • 3. A word of caution about Galton's work
      • 3.1. Linear model as correlation
      • 3.2. Linear model as matrix multiplication
      • 3.3. Relation between correlation and matrix multiplication
      • 3.4. Computational note
    • 4. Loading the Galton dataset
      • 4.1. Linear model as correlation
      • 4.2. Linear model as matrix multiplication
      • 4.3. Relation between correlation and matrix multiplication
      • 4.4. Computational note
    • 5. Estimating linear models' coefficients
      • 5.1. Linear model as correlation
      • 5.2. Linear model as matrix multiplication
      • 5.3. Relation between correlation and matrix multiplication
      • 5.4. Computational note
    • 6. Logarithmic transformations
      • 6.1. Linear model as correlation
      • 6.2. Linear model as matrix multiplication
      • 6.3. Relation between correlation and matrix multiplication
      • 6.4. Computational note
    • 7. Plotting model results
      • 7.1. Linear model as correlation
      • 7.2. Linear model as matrix multiplication
      • 7.3. Relation between correlation and matrix multiplication
      • 7.4. Computational note
    • 8. Linear model does not equal straight line
      • 8.1. Linear model as correlation
      • 8.2. Linear model as matrix multiplication
      • 8.3. Relation between correlation and matrix multiplication
      • 8.4. Computational note
    • 9. Transforming variables
      • 9.1. Linear model as correlation
      • 9.2. Linear model as matrix multiplication
      • 9.3. Relation between correlation and matrix multiplication
      • 9.4. Computational note
    • 10. Regression with weights
      • 10.1. Linear model as correlation
      • 10.2. Linear model as matrix multiplication
      • 10.3. Relation between correlation and matrix multiplication
      • 10.4. Computational note
  • VIII. Linear Model with Multiple Explanatory Variables
    • 1. Model specification
      • 1.1. Root Mean Squared Error and Mean Absolute Error
      • 1.2. RMSE and MAE interpretation
      • 1.3. Coefficient's standard error
      • 1.4. Coefficient's t-statistic
      • 1.5. Coefficient's p-value
      • 1.6. Residual standard error
      • 1.7. Model's multiple R-squared (or unadjusted R-squared)
      • 1.8. Model's adjusted R-squared
      • 1.9. Model's F-statistic
      • 1.10. Error's normality
      • 1.11. Error's homoscedasticity (homogeneous variance)
    • 2. Life expectancy, GDP and well-being in the Quality of Government dataset
      • 2.1. Root Mean Squared Error and Mean Absolute Error
      • 2.2. RMSE and MAE interpretation
      • 2.3. Coefficient's standard error
      • 2.4. Coefficient's t-statistic
      • 2.5. Coefficient's p-value
      • 2.6. Residual standard error
      • 2.7. Model's multiple R-squared (or unadjusted R-squared)
      • 2.8. Model's adjusted R-squared
      • 2.9. Model's F-statistic
      • 2.10. Error's normality
      • 2.11. Error's homoscedasticity (homogeneous variance)
    • 3. Estimating linear models' coefficients
      • 3.1. Root Mean Squared Error and Mean Absolute Error
      • 3.2. RMSE and MAE interpretation
      • 3.3. Coefficient's standard error
      • 3.4. Coefficient's t-statistic
      • 3.5. Coefficient's p-value
      • 3.6. Residual standard error
      • 3.7. Model's multiple R-squared (or unadjusted R-squared)
      • 3.8. Model's adjusted R-squared
      • 3.9. Model's F-statistic
      • 3.10. Error's normality
      • 3.11. Error's homoscedasticity (homogeneous variance)
    • 4. Model accuracy
      • 4.1. Root Mean Squared Error and Mean Absolute Error
      • 4.2. RMSE and MAE interpretation
      • 4.3. Coefficient's standard error
      • 4.4. Coefficient's t-statistic
      • 4.5. Coefficient's p-value
      • 4.6. Residual standard error
      • 4.7. Model's multiple R-squared (or unadjusted R-squared)
      • 4.8. Model's adjusted R-squared
      • 4.9. Model's F-statistic
      • 4.10. Error's normality
      • 4.11. Error's homoscedasticity (homogeneous variance)
    • 5. Model summary
      • 5.1. Root Mean Squared Error and Mean Absolute Error
      • 5.2. RMSE and MAE interpretation
      • 5.3. Coefficient's standard error
      • 5.4. Coefficient's t-statistic
      • 5.5. Coefficient's p-value
      • 5.6. Residual standard error
      • 5.7. Model's multiple R-squared (or unadjusted R-squared)
      • 5.8. Model's adjusted R-squared
      • 5.9. Model's F-statistic
      • 5.10. Error's normality
      • 5.11. Error's homoscedasticity (homogeneous variance)
    • 6. Error's assumptions
      • 6.1. Root Mean Squared Error and Mean Absolute Error
      • 6.2. RMSE and MAE interpretation
      • 6.3. Coefficient's standard error
      • 6.4. Coefficient's t-statistic
      • 6.5. Coefficient's p-value
      • 6.6. Residual standard error
      • 6.7. Model's multiple R-squared (or unadjusted R-squared)
      • 6.8. Model's adjusted R-squared
      • 6.9. Model's F-statistic
      • 6.10. Error's normality
      • 6.11. Error's homoscedasticity (homogeneous variance)
  • IX. Linear Model with Binary and Categorical Explanatory Variables
    • 1. Model specification with binary variables
      • 1.1. ANOVA is a particular case of a linear model with binary variables
      • 1.2. Corruption and popular vote in the Quality of Government dataset
      • 1.3. Estimating a linear model and ANOVA with one predictor and two categories
      • 1.4. Corruption and regime type in the Quality of Government dataset
      • 1.5. Estimating a linear model and ANOVA with one predictor and multiple categories
      • 1.6. Estimating a linear model with continuous and categorical predictors
      • 1.7. Corruption and interaction variables in the Quality of Government dataset
      • 1.8. Estimating a linear model with binary interactions
      • 1.9. Confidence intervals with binary interactions
      • 1.10. Estimating a linear model with categorical interactions
      • 1.11. Confidence intervals with categorical interactions
    • 2. Model specification with binary interactions
      • 2.1. ANOVA is a particular case of a linear model with binary variables
      • 2.2. Corruption and popular vote in the Quality of Government dataset
      • 2.3. Estimating a linear model and ANOVA with one predictor and two categories
      • 2.4. Corruption and regime type in the Quality of Government dataset
      • 2.5. Estimating a linear model and ANOVA with one predictor and multiple categories
      • 2.6. Estimating a linear model with continuous and categorical predictors
      • 2.7. Corruption and interaction variables in the Quality of Government dataset
      • 2.8. Estimating a linear model with binary interactions
      • 2.9. Confidence intervals with binary interactions
      • 2.10. Estimating a linear model with categorical interactions
      • 2.11. Confidence intervals with categorical interactions
    • 3. Model specification with categorical interactions
      • 3.1. ANOVA is a particular case of a linear model with binary variables
      • 3.2. Corruption and popular vote in the Quality of Government dataset
      • 3.3. Estimating a linear model and ANOVA with one predictor and two categories
      • 3.4. Corruption and regime type in the Quality of Government dataset
      • 3.5. Estimating a linear model and ANOVA with one predictor and multiple categories
      • 3.6. Estimating a linear model with continuous and categorical predictors
      • 3.7. Corruption and interaction variables in the Quality of Government dataset
      • 3.8. Estimating a linear model with binary interactions
      • 3.9. Confidence intervals with binary interactions
      • 3.10. Estimating a linear model with categorical interactions
      • 3.11. Confidence intervals with categorical interactions
  • X. Linear Model with Fixed Effects
    • 1. Year fixed effects
      • 1.1. Model specification
      • 1.2. Corruption and popular vote in the Quality of Government dataset
      • 1.3. Estimating year fixed effects' coefficients
      • 1.4. Estimating country-time fixed effects' coefficients
    • 2. Country fixed effects
      • 2.1. Model specification
      • 2.2. Corruption and popular vote in the Quality of Government dataset
      • 2.3. Estimating year fixed effects' coefficients
      • 2.4. Estimating country-time fixed effects' coefficients
    • 3. Country-year fixed effects
      • 3.1. Model specification
      • 3.2. Corruption and popular vote in the Quality of Government dataset
      • 3.3. Estimating year fixed effects' coefficients
      • 3.4. Estimating country-time fixed effects' coefficients
  • XI. Generalized Linear Model with One Explanatory Variable
    • 1. Model specification
      • 1.1. Gaussian model
      • 1.2. Poisson model
      • 1.3. Quasi-Poisson model
      • 1.4. Binomial model (or logit model)
    • 2. Model families
      • 2.1. Gaussian model
      • 2.2. Poisson model
      • 2.3. Quasi-Poisson model
      • 2.4. Binomial model (or logit model)
  • XII. Generalized Linear Model with Multiple Explanatory Variables
    • 1. Obtaining the original codes and data
    • 2. Loading the original data
    • 3. Ordinary Least Squares
    • 4. Poisson Pseudo Maximum Likelihood
    • 5. Tobit
    • 6. Reporting multiple models

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub