Regression and Other Stories

Introduction

The code and data are provided to fully reproduce the examples and figures in the book. They can be a good way to see what the code does. Different people have different styles of code. The code here is not supposed to be a model. The statistical analyses and graphs in the book are intended to be models for good practice, but the code here is meant to be simple with minimal dependencies.
For R programming basics see Appendix A of Regression and Other Stories. If you want to learn more, see our recommendations for R programming and visualization with R.
The folders below (ending /) point to the code (.R and .Rmd) and data folders (.csv or .txt + codebooks) in github, and .html -files point to knitted notebooks.
Most examples have cleaned data in .csv file in data subfolder for easy experimenting. For completeness and reproducibility, the data subfolders have also the raw data and *_setup.R file showing how the data pre-processing has been done (to do the exercises and follow along with the examples, you don’t need to worry about the setup code). Most data folders hve also some codebook explaining the column names.
For easy access to data sets, there is an R package rosdata. You can install it with a command remotes::install_github("avehtari/ROS-Examples",subdir = "rpackage"). Then you can access data, for example, as library(rosdata), data(wells), head(wells). You can get the list of data sets with ?rosdata.
When running the notebooks, to avoid need to switch the working directory, rprojroot package is used to set the project root directory. The downloaded git repository can be placed anywhere you like and you can rename the ROS-Examples directory if you wish. When running the code, it is sufficient that the working directory is any directory in the ROS-Examples (or renamed). Running

library("rprojroot")
root<-has_file(".ROS-Examples-root")$make_fix_file()

will find the file .ROS-Examples-root which is in the ROS-Examples directory, and will set the full path according to that. Then, for example,

wells <- read.csv(root("Arsenic/data","wells.csv"))

finds the wells.csv file, no matter where you have placed or renamed the ROS-Examples directory. When you switch to another example, there is no need to switch the working directory.

Examples by chapters

1 Introduction

ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
ElectricCompany/
- electric.html - Analysis of “Electric company” data
Peacekeeping/
- peace.html - Outcomes after civil war in countries with and without United Nations peacekeeping
SimpleCausal/
- causal.html - Simple graphs illustrating regression for causal inference
Helicopters/
- helicopters.html - Example data file for helicopter flying time exercise

2 Data and measurement

HDI/
- hdi.html - Human Development Index - Looking at data in different ways
Pew/
- pew.html - Miscellaneous analyses using raw Pew data
HealthExpenditure/
- healthexpenditure.html - Discovery through graphs of data and models
Names/
- names.html - Names - Distributions of names of American babies
- lastletters.html - Last letters - Distributions of last letters of names of American babies
AgePeriodCohort/
- births.html - Age adjustment
Congress/
- congress_plots.html - Predictive uncertainty for congressional elections

3 Some basic methods in mathematics and probability

Mile/
- mile.html - Trend of record times in the mile run
Metabolic/
- metabolic.html - How to interpret a power law or log-log regression
Earnings/
- height_and_weight.html - Predict weight
CentralLimitTheorem/
- heightweight.html - Illustrate central limit theorem and normal distribution
Stents/
- stents.html - Stents - comparing distributions

4 Generative models and statistical inference

Coverage/
- coverage.html - Example of coverage
Death/
- polls.html - Proportion of American adults supporting the death penalty
Coop/
- riverbay.html - Example of hypothesis testing
Girls/

5 Simulation

ProbabilitySimulation/
- probsim.html - Simulation of probability models
Earnings/
- earnings_bootstrap.html - Bootstrapping to simulate the sampling distribution

6 Background on regression modeling

Simplest/
- simplest.html - Linear regression with a single predictor
- simplest_lm.html - Linear least squares regression with a single predictor
Earnings/
- earnings_regression.html - Predict respondents’ yearly earnings using survey data from 1990.
PearsonLee/
- heights.html - The heredity of height. Published in 1903 by Karl Pearson and Alice Lee.
FakeMidtermFinal/
- simulation.html - Fake dataset of 1000 students’ scores on a midterm and final exam

7 Linear regression with a single predictor

ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
- hills.html - Present uncertainty in parameter estimates
- hibbs_coverage.html - Checking the coverage of intervals
- Simplest/
- simplest.html - Linear regression with a single predictor
- simplest_lm.html - Linear least squares regression with a single predictor

8 Fitting regression models

ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
Influence/
- influence.html - Influence of individual points in a fitted regression

9 Prediction and Bayesian inference

ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
- bayes.html - Demonstration of Bayesian information aggregation
SexRatio/
- sexratio.html - Example where an informative prior makes a difference
Earnings/
- height_and_weight.html - Predict weight
- earnings_regression.html - Predict respondents’ yearly earnings using survey data from 1990.

10 Linear regression with multiple predictors

KidIQ/
- kidiq.html - Linear regression with multiple predictors
Earnings/
- height_and_weight.html - Predict weight
Congress/
- congress.html - Predictive uncertainty for congressional elections
NES/
- nes_linear.html - Fitting the same regression to many datasets
Beauty/
- beauty.html - Student evaluations of instructors’ beauty and teaching quality

11 Assumptions, diagnostics, and model evaluation

KidIQ/
- kidiq.html - Linear regression with multiple predictors
- kidiq_loo.html - Linear regression and leave-one-out cross-validation
- kidiq_R2.html - Linear regression and Bayes-R2 and LOO-R2
- kidiq_kcv.html - Linear regression and K-fold cross-validation
Residuals/
- residuals.html - Plotting the data and fitted model
Introclass/
- residual_plots.html - Plot residuals vs. predicted values, or residuals vs. observed values?
Newcomb/
- newcomb.html - Posterior predictive checking of Normal model for Newcomb’s speed of light data
Unemployment/
- unemployment.html - Time series fit and posterior predictive model checking for unemployment series
Rsquared/
- rsquared.html - Bayesian R^2
CrossValidation/
- crossvalidation.html - Demonstration of cross validation
FakeKCV/
- fake_kcv.html - Demonstration of \(K\)-fold cross-validation using simulated data
Pyth/

12 Transformations

KidIQ/
- kidiq.html - Linear regression with multiple predictors
Earnings/
- earnings_regression.html - Predict respondents’ yearly earnings using survey data from 1990.
Gay/
- gay_simple.html - Simple models (linear and discretized age) and political attitudes as a function of age
Mesquite/
- mesquite.html - Predicting the yields of mesquite bushes
Student/
- student.html - Models for regression coefficients
Pollution/
- pollution.html - Pollution data.

13 Logistic regression

NES/
- nes_logistic.html - Logistic regression, identifiability, and separation
LogisticPriors/
- logistic_priors.html - Effect of priors in logistic regression
Arsenic/
- arsenic_logistic_building.html - Building a logistic regression model: wells in Bangladesh

14 Working with logistic regression

LogitGraphs/
- logitgraphs.html - Different ways of displaying logistic regression
NES/
- nes_logistic.html - Logistic regression, identifiability, and separation
Rodents/
Arsenic/
- arsenic_logistic_residuals.html - Residual plots for a logistic regression model: wells in Bangladesh
- arsenic_logistic_apc.html - Average predictice comparisons for a logistic regression model: wells in Bangladesh

15 Other generalized linear models

PoissonExample/
- PoissonExample.html - Demonstrate Poisson regression with simulated data.
Roaches/
- roaches.html - Analyse the effect of integrated pest management on reducing cockroach levels in urban apartments
Storable/
- storable.html - Ordered categorical data analysis with a study from experimental economics, on the topic of ``storable votes.’’
Robit/
- robit.html - Comparison of robit and logit models for binary data
Earnings/
- earnings_compound.html - Compound discrete-continuos model
RiskyBehavior/
- risky.html Risky behavior data.
NES/
Lalonde/
Congress/
AcademyAwards/

16 Design and sample size decisions

ElectricCompany/
- electric.html - Analysis of “Electric company” data
SampleSize/
- simulation.html - Sample size simulation
FakeMidtermFinal/
- simulation_based_design.html - Fake dataset of a randomized experiment on student grades

17 Poststratification and missing-data imputation

Poststrat/
- poststrat.html - Poststratification after estimation
- poststrat2.html - Poststratification after estimation
Imputation/
- imputation.html - Regression-based imputation for the Social Indicators Survey
- imputation_gg.html - Regression-based imputation for the Social Indicators Survey, dplyr/ggplot version

18 Causal inference basics and randomized experiments

Sesame/
- sesame.html - Causal analysis of Sesame Street experiment

19 Causal inference using regression on the treatment variable

ElectricCompany/
- electric.html - Analysis of “Electric company” data
Incentives/
- incentives.html - Simple analysis of incentives data
Cows/

20 Observational studies with all confounders assumed to be measured

ElectricCompany/
- electric.html - Analysis of “Electric company” data
Childcare/
- childcare.html - Infant Health and Development Program (IHDP) example.

21 More advanced topics in causal inference

Sesame/
- sesame.html - Causal analysis of Sesame Street experiment
Bypass/
ChileSchools/
- chile_schools.html - ChileSchools example.

22 Advanced regression and multilevel models

Golf/
- golf.html - Gold putting accuracy: Fitting a nonlinear model using Stan
Gay/
- gay.html - Nonlinear models (Loess, B-spline, GP-spline, and BART) and political attitudes as a function of age
ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
Scalability/
- scalability.html - Demonstrate computation speed with 100 000 observations.

Appendix A

Coins/
Mile/
- mile.html - Trend of record times in the mile run
Parabola/
- parabola.html - Demonstration of using Stan for optimization
Restaurant/
- restaurant.html - Demonstration of using Stan for optimization
DifferentSoftware/
- linear.html - Linear regression using different software options

Examples alphabetically

AcademyAwards/
AgePeriodCohort/
- births.html - Age adjustment
Arsenic/
- arsenic_logistic_building.html - Building a logistic regression model: wells in Bangladesh
- arsenic_logistic_residuals.html - Residual plots for a logistic regression model: wells in Bangladesh
- arsenic_logistic_apc.html - Average predictice comparisons for a logistic regression model: wells in Bangladesh
- arsenic_logistic_building_optimizing.html - Building a logistic regression model: wells in Bangladesh. A version with normal approximation at the mode.
Balance/
- treatcontrol.html
Beauty/
- beauty.html - Student evaluations of instructors’ beauty and teaching quality
Bypass/
CausalDiagram/
- diagrams.html - Plot causal diagram
CentralLimitTheorem/
- heightweight.html - Illustrate central limit theorem and normal distribution
Childcare/
- childcare.html - Infant Health and Development Program (IHDP) example.
ChileSchools/
- chile_schools.html - ChileSchools example.
Coins/
Congress/
- congress.html - Predictive uncertainty for congressional elections
- congress_plots.html - Predictive uncertainty for congressional elections
Coop/
- riverbay.html - Example of hypothesis testing
Coverage/
- coverage.html - Example of coverage
Cows/
CrossValidation/
- crossvalidation.html - Demonstration of cross validation
SampleSize/
- simulation.html - Sample size simulation
Death/
- polls.html - Proportion of American adults supporting the death penalty
DifferentSoftware/
- linear.html - Linear regression using different software options
Earnings/
- earnings_regression.html - Predict respondents’ yearly earnings using survey data from 1990.
- earnings_bootstrap.html - Bootstrapping to simulate the sampling distribution
- earnings_compound.html - Compound discrete-continuos model
- height_and_weight.html - Predict weight
ElectionsEconomy/
- bayes.html - Demonstration of Bayesian information aggregation
- hibbs.html - Predicting presidential vote share from the economy
- hills.html - Present uncertainty in parameter estimates
- hibbs_coverage.html - Checking the model-fitting procedure using fake-data simulation.
ElectricCompany/
- electric.html - Analysis of “Electric company” data
FakeKCV/
- fake_kcv.html - Demonstration of \(K\)-fold cross-validation using simulated data
FakeMidtermFinal/
- simulation.html - Fake dataset of 1000 students’ scores on a midterm and final exam
- simulation_based_design.html - Fake dataset of a randomized experiment on student grades
FrenchElection/
- ps_primaire.html - French Election data
Gay/
- gay_simple.html - Simple models (linear and discretized age) and political attitudes as a function of age
- gay.html - Nonlinear models (Loess, B-spline, GP-spline, and BART) and political attitudes as a function of age
Girls/
Golf/
- golf.html - Gold putting accuracy: Fitting a nonlinear model using Stan
HDI/
- hdi.html - Human Development Index - Looking at data in different ways
HealthExpenditure/
- healthexpenditure.html - Discovery through graphs of data and models
Helicopters/
- helicopters.html - Example data file for helicopter flying time exercise
Imputation/
- imputation.html - Regression-based imputation for the Social Indicators Survey
- imputation_gg.html - Regression-based imputation for the Social Indicators Survey, dplyr/ggplot version
Incentives/
- incentives.html - Simple analysis of incentives data
Influence/
- influence.html - Influence of individual points in a fitted regression
Interactions/
- interactions.html - Plot interaction example figure
Introclass/
- residual_plots.html - Plot residuals vs. predicted values, or residuals vs. observed values?
KidIQ/
- kidiq.html - Linear regression with multiple predictors
- kidiq_loo.html - Linear regression and leave-one-out cross-validation
- kidiq_R2.html - Linear regression and Bayes-R2 and LOO-R2
- kidiq_kcv.html - Linear regression and K-fold cross-validation
Lalonde/
LogisticPriors/
- logistic_priors.html - Effect of priors in logistic regression
Mesquite/
- mesquite.html - Predicting the yields of mesquite bushes
Metabolic/
- metabolic.html - How to interpret a power law or log-log regression
Mile/
- mile.html - Trend of record times in the mile run
Names/
- names.html - Names - Distributions of names of American babies
- lastletters.html - Last letters - Distributions of last letters of names of American babies
NES/
- nes_linear.html - Fitting the same regression to many datasets
- nes_logistic.html - Logistic regression, identifiability, and separation
Newcomb/
- newcomb.html - Posterior predictive checking of Normal model for Newcomb’s speed of light data
Parabola/
- parabola.html - Demonstration of using Stan for optimization
Peacekeeping/
- peace.html - Outcomes after civil war in countries with and without United Nations peacekeeping
PearsonLee/
- heights.html - The heredity of height. Published in 1903 by Karl Pearson and Alice Lee.
Pew/
- pew.html - Miscellaneous analyses using raw Pew data
PoissonExample/
- poissonexample.html - Demonstrate Poisson regression with simulated data.
Pollution/
- pollution.html - Pollution data.
Poststrat/
- poststrat.html - Poststratification after estimation
- poststrat2.html - Poststratification after estimation
ProbabilitySimulation/
- probsim.html - Simulation of probability models
Pyth/
Redistricting/
Residuals/
- residuals.html - Plotting the data and fitted model
Restaurant/
- restaurant.html - Demonstration of using Stan for optimization
RiskyBehavior/
- risky.html Risky behavior data.
Roaches/
- roaches.html - Analyse the effect of integrated pest management on reducing cockroach levels in urban apartments
Robit/
- robit.html - Comparison of robit and logit models for binary data
Rodents/
Rsquared/
- rsquared.html - Bayesian R^2
Sesame/
- sesame.html - Causal analysis of Sesame Street experiment
SexRatio/
- sexratio.html - Example where an informative prior makes a difference
SimpleCausal/
- causal.html - Simple graphs illustrating regression for causal inference
Simplest/
- simplest.html - Linear regression with a single predictor
- simplest_lm.html - Linear least squares regression with a single predictor
Stents/
- stents.html - Stents - comparing distributions
Storable/
- storable.html - Ordered categorical data analysis with a study from experimental economics, on the topic of ``storable votes.’’
Student/
- student.html - Models for regression coefficients
Unemployment/
- unemployment.html - Time series fit and posterior predictive model checking for unemployment series

Download code and data

Tidyverse code

Bill Behrman has revised all the example code to use Tidyverse

brms + tidyverse code

Solomon A. Kurz is revising all the example code to use brms and tidyverse

Working through Regression and other stories

Python code

Ravin Kumar, Tomás Capretto, and Osvaldo Martin are porting ROS examples to Python using bambi (BAyesian Model-Building Interface) which has similar formula syntax as rstanarm and brms.

Bambi resources

Julia code

Rob J. Goedman is porting ROS examples to Julia.

RegressionAndOtherStories.jl

Regression and Other Stories - Examples

Andrew Gelman, Jennifer Hill, Aki Vehtari

Page updated: 2022-11-06