Home page for the book Regression and Other Stories by Andrew Gelman, Jennifer Hill, and Aki Vehtari, including the code and data.

Back cover text: *Many textbooks on regression focus on theory and the simplest of examples. Real statistical problems, however, are complex and subtle. This is not a book about the theory of regression. It is a book about how to use regression to solve real problems of comparison, estimation, prediction, and causal inference. It focuses on practical issues such as sample size and missing data and a wide range of goals and techniques. It jumps right in to methods and computer code you can use fresh out of the box.*

The book has been sent to the publisher and will be available in July 2020.

- Sample exams
See also an article Teaching Bayes to Graduate Students in Political Science, Sociology, Public Health, Education, Economics, ...

If you notice an error, submit an issue at https://github.com/avehtari/ROS-Examples/issues or send an email.

The folders below (ending /) point to the code (.R and .Rmd) and data folders in github, and .html -files point to pretty notebooks. Most examples have cleaned data in csv file in data subfolder for easy experimenting. The data subfolders have also the raw data and *_setup.R file showing how the data cleaning has been done.

- ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy

- ElectricCompany/
- electric.html - Analysis of "Electric company" data

- Peacekeeping/
- peace.html - Outcomes after civil war in countries with and without United Nations peacekeeping

- SimpleCausal/
- causal.html - Simple graphs illustrating regression for causal inference

- Helicopters/
- helicopters.html - Example data file for helicopter flying time exercise

- HDI/
- hdi.html - Human Development Index - Looking at data in different ways

- Pew/
- pew.html - Miscellaneous analyses using raw Pew data

- HealthExpenditure/
- healthexpenditure.html - Discovery through graphs of data and models

- Names/
- names.html - Names - Distributions of names of American babies
- lastletters.html - Last letters - Distributions of last letters of names of American babies

- AgePeriodCohort/
- births.html - Age adjustment

- Congress/
- congress_plots.html - Predictive uncertainty for congressional elections

- Mile/
- mile.html - Trend of record times in the mile run

- Metabolic/
- metabolic.html - How to interpret a power law or log-log regression

- Earnings/
- height_and_weight.html - Predict weight

- CentralLimitTheorem/
- heightweight.html - Illustrate central limit theorem and normal distribution

- Stents/
- stents.html - Stents - comparing distributions

- Coverage/
- coverage.html - Example of coverage

- Death/
- polls.html - Proportion of American adults supporting the death penalty

- Coop/
- riverbay.html - Example of hypothesis testing

- Girls/

- ProbabilitySimulation/
- probsim.html - Simulation of probability models

- Earnings/
- earnings_bootstrap.html - Bootstrapping to simulate the sampling distribution

- Simplest/
- simplest.html - Linear regression with a single predictor
- simplest_lm.html - Linear least squares regression with a single predictor

- Earnings/
- earnings_regression.html - Predict respondents' yearly earnings using survey data from 1990.

- PearsonLee/
- heights.html - The heredity of height. Published in 1903 by Karl Pearson and Alice Lee.

- FakeMidtermFinal/
- simulation.html - Fake dataset of 1000 students' scores on a midterm and final exam

- ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
- hills.html - Present uncertainty in parameter estimates
- hibbs_coverage.html - Checking the coverage of intervals

- Simplest/
- simplest.html - Linear regression with a single predictor
- simplest_lm.html - Linear least squares regression with a single predictor

- ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy

- Influence/
- influence.html - Influence of individual points in a fitted regression

- ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
- bayes.html - Demonstration of Bayesian information aggregation

- SexRatio/
- sexratio.html - Example where an informative prior makes a difference

- Earnings/
- height_and_weight.html - Predict weight
- earnings_regression.html - Predict respondents' yearly earnings using survey data from 1990.

- KidIQ/
- kidiq.html - Linear regression with multiple predictors

- Earnings/
- height_and_weight.html - Predict weight

- Congress/
- congress.html - Predictive uncertainty for congressional elections

- NES/
- nes_linear.html - Fitting the same regression to many datasets

- Beauty/
- beauty.html - Student evaluations of instructors’ beauty and teaching quality

- KidIQ/
- kidiq.html - Linear regression with multiple predictors
- kidiq_loo.html - Linear regression and leave-one-out cross-validation
- kidiq_R2.html - Linear regression and Bayes-R2 and LOO-R2
- kidiq_kcv.html - Linear regression and K-fold cross-validation

- Residuals/
- residuals.html - Plotting the data and fitted model

- Introclass/
- residual_plots.html - Plot residuals vs. predicted values, or residuals vs. observed values?

- Newcomb/
- newcomb.html - Posterior predictive checking of Normal model for Newcomb's speed of light data

- Unemployment/
- unemployment.html - Time series fit and posterior predictive model checking for unemployment series

- Rsquared/
- rsquared.html - Bayesian R^2

- CrossValidation/
- crossvalidation.html - Demonstration of cross validation

- FakeKCV/
- fake_kcv.html - Demonstration of \(K\)-fold cross-validation using simulated data

- Pyth/

- KidIQ/
- kidiq.html - Linear regression with multiple predictors

- Earnings/
- earnings_regression.html - Predict respondents' yearly earnings using survey data from 1990.

- Gay/
- gay_simple.html - Simple models (linear and discretized age) and political attitudes as a function of age

- Mesquite/
- mesquite.html - Predicting the yields of mesquite bushes

- Student/
- student.html - Models for regression coefficients

- Pollution/
- pollution.html - Pollution data.

- NES/
- nes_logistic.html - Logistic regression, identifiability, and separation

- LogisticPriors/
- logistic_priors.html - Effect of priors in logistic regression

- Arsenic/
- arsenic_logistic_building.html - Building a logistic regression model: wells in Bangladesh

- LogitGraphs/
- logitgraphs.html - Different ways of displaying logistic regression

- NES/
- nes_logistic.html - Logistic regression, identifiability, and separation

- Rodents/
- Arsenic/
- arsenic_logistic_residuals.html - Residual plots for a logistic regression model: wells in Bangladesh
- arsenic_logistic_apc.html - Average predictice comparisons for a logistic regression model: wells in Bangladesh

- PoissonExample/
- PoissonExample.html - Demonstrate Poisson regression with simulated data.

- Roaches/
- roaches.html - Analyse the effect of integrated pest management on reducing cockroach levels in urban apartments

- Storable/
- storable.html - Ordered categorical data analysis with a study from experimental economics, on the topic of ``storable votes.''

- Earnings/
- earnings_compound.html - Compound discrete-continuos model

- RiskyBehavior/
- risky.html Risky behavior data.

- NES/
- Lalonde/
- Congress/
- AcademyAwards/

- ElectricCompany/
- electric.html - Analysis of "Electric company" data

- SampleSize/
- simulation.html - Sample size simulation

- FakeMidtermFinal/
- simulation_based_design.html - Fake dataset of a randomized experiment on student grades

- Poststrat/
- poststrat.html - Poststratification after estimation
- poststrat2.html - Poststratification after estimation

- Imputation/
- imputation.html - Regression-based imputation for the Social Indicators Survey
- imputation_gg.html - Regression-based imputation for the Social Indicators Survey, dplyr/ggplot version

- Sesame/
- sesame.html - Causal analysis of Sesame Street experiment

- ElectricCompany/
- electric.html - Analysis of "Electric company" data

- Incentives/
- incentives.html - Simple analysis of incentives data

- Cows/

- ElectricCompany/
- electric.html - Analysis of "Electric company" data

- Childcare/
- childcare.html - Infant Health and Development Program (IHDP) example.

- Sesame/
- sesame.html - Causal analysis of Sesame Street experiment

- Bypass/
- ChileSchools/
- chile_schools.html - ChileSchools example.

- Golf/
- golf.html - Gold putting accuracy: Fitting a nonlinear model using Stan

- Gay/
- gay.html - Nonlinear models (Loess, B-spline, GP-spline, and BART) and political attitudes as a function of age

- ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy

- Scalability/
- scalability.html - Demonstrate computation speed with 100 000 observations.

- Coins/
- Mile/
- mile.html - Trend of record times in the mile run

- Parabola/
- parabola.html - Demonstration of using Stan for optimization

- Restaurant/
- restaurant.html - Demonstration of using Stan for optimization

- DifferentSoftware/
- linear.html - Linear regression using different software options

The folders below (ending /) point to the code (.R and .Rmd) and data folders in github, and .html -files point to pretty notebooks. Most examples have cleaned data in csv file in data subfolder for easy experimenting. The data subfolders have also the raw data and *_setup.R file showing how the data cleaning has been done.

- AcademyAwards/
- AgePeriodCohort/
- births.html - Age adjustment
- Arsenic/
- arsenic_logistic_building.html - Building a logistic regression model: wells in Bangladesh
- arsenic_logistic_residuals.html - Residual plots for a logistic regression model: wells in Bangladesh
- arsenic_logistic_apc.html - Average predictice comparisons for a logistic regression model: wells in Bangladesh
- arsenic_logistic_building_optimizing.html - Building a logistic regression model: wells in Bangladesh. A version with normal approximation at the mode.
- Balance/
- treatcontrol.html
- Beauty/
- beauty.html - Student evaluations of instructors’ beauty and teaching quality
- Bypass/
- CausalDiagram/
- diagrams.html - Plot causal diagram
- CentralLimitTheorem/
- heightweight.html - Illustrate central limit theorem and normal distribution
- Childcare/
- childcare.html - Infant Health and Development Program (IHDP) example.
- ChileSchools/
- chile_schools.html - ChileSchools example.
- Coins/
- Congress/
- congress.html - Predictive uncertainty for congressional elections
- congress_plots.html - Predictive uncertainty for congressional elections
- Coop/
- riverbay.html - Example of hypothesis testing
- Coverage/
- coverage.html - Example of coverage
- Cows/
- CrossValidation/
- crossvalidation.html - Demonstration of cross validation
- SampleSize/
- simulation.html - Sample size simulation
- Death/
- polls.html - Proportion of American adults supporting the death penalty
- DifferentSoftware/
- linear.html - Linear regression using different software options
- Earnings/
- earnings_regression.html - Predict respondents' yearly earnings using survey data from 1990.
- earnings_bootstrap.html - Bootstrapping to simulate the sampling distribution
- earnings_compound.html - Compound discrete-continuos model
- height_and_weight.html - Predict weight
- ElectionsEconomy/
- bayes.html - Demonstration of Bayesian information aggregation
- hibbs.html - Predicting presidential vote share from the economy
- hills.html - Present uncertainty in parameter estimates
- hibbs_coverage.html - Checking the model-fitting procedure using fake-data simulation.
- ElectricCompany/
- electric.html - Analysis of "Electric company" data
- FakeKCV/
- fake_kcv.html - Demonstration of \(K\)-fold cross-validation using simulated data
- FakeMidtermFinal/
- simulation.html - Fake dataset of 1000 students' scores on a midterm and final exam
- simulation_based_design.html - Fake dataset of a randomized experiment on student grades
- FrenchElection/
- ps_primaire.html - French Election data
- Gay/
- gay_simple.html - Simple models (linear and discretized age) and political attitudes as a function of age
- gay.html - Nonlinear models (Loess, B-spline, GP-spline, and BART) and political attitudes as a function of age
- Girls/
- Golf/
- golf.html - Gold putting accuracy: Fitting a nonlinear model using Stan
- HDI/
- hdi.html - Human Development Index - Looking at data in different ways
- HealthExpenditure/
- healthexpenditure.html - Discovery through graphs of data and models
- Helicopters/
- helicopters.html - Example data file for helicopter flying time exercise
- Imputation/
- imputation.html - Regression-based imputation for the Social Indicators Survey
- imputation_gg.html - Regression-based imputation for the Social Indicators Survey, dplyr/ggplot version
- Incentives/
- incentives.html - Simple analysis of incentives data
- Influence/
- influence.html - Influence of individual points in a fitted regression
- Interactions/
- interactions.html - Plot interaction example figure
- Introclass/
- residual_plots.html - Plot residuals vs. predicted values, or residuals vs. observed values?
- KidIQ/
- kidiq.html - Linear regression with multiple predictors
- kidiq_loo.html - Linear regression and leave-one-out cross-validation
- kidiq_R2.html - Linear regression and Bayes-R2 and LOO-R2
- kidiq_kcv.html - Linear regression and K-fold cross-validation
- Lalonde/
- LogisticPriors/
- logistic_priors.html - Effect of priors in logistic regression
- Mesquite/
- mesquite.html - Predicting the yields of mesquite bushes
- Metabolic/
- metabolic.html - How to interpret a power law or log-log regression
- Mile/
- mile.html - Trend of record times in the mile run
- Names/
- names.html - Names - Distributions of names of American babies
- lastletters.html - Last letters - Distributions of last letters of names of American babies
- NES/
- nes_linear.html - Fitting the same regression to many datasets
- nes_logistic.html - Logistic regression, identifiability, and separation
- Newcomb/
- newcomb.html - Posterior predictive checking of Normal model for Newcomb's speed of light data
- Parabola/
- parabola.html - Demonstration of using Stan for optimization
- Peacekeeping/
- peace.html - Outcomes after civil war in countries with and without United Nations peacekeeping
- PearsonLee/
- heights.html - The heredity of height. Published in 1903 by Karl Pearson and Alice Lee.
- Pew/
- pew.html - Miscellaneous analyses using raw Pew data
- PoissonExample/
- poissonexample.html - Demonstrate Poisson regression with simulated data.
- Pollution/
- pollution.html - Pollution data.
- Poststrat/
- poststrat.html - Poststratification after estimation
- poststrat2.html - Poststratification after estimation
- ProbabilitySimulation/
- probsim.html - Simulation of probability models
- Pyth/
- Redistricting/
- Residuals/
- residuals.html - Plotting the data and fitted model
- Restaurant/
- restaurant.html - Demonstration of using Stan for optimization
- RiskyBehavior/
- risky.html Risky behavior data.
- Roaches/
- roaches.html - Analyse the effect of integrated pest management on reducing cockroach levels in urban apartments
- Rodents/
- Rsquared/
- rsquared.html - Bayesian R^2
- Sesame/
- sesame.html - Causal analysis of Sesame Street experiment
- SexRatio/
- sexratio.html - Example where an informative prior makes a difference
- SimpleCausal/
- causal.html - Simple graphs illustrating regression for causal inference
- Simplest/
- simplest.html - Linear regression with a single predictor
- simplest_lm.html - Linear least squares regression with a single predictor
- Stents/
- stents.html - Stents - comparing distributions
- Storable/
- storable.html - Ordered categorical data analysis with a study from experimental economics, on the topic of ``storable votes.''
- Student/
- student.html - Models for regression coefficients
- Unemployment/
- unemployment.html - Time series fit and posterior predictive model checking for unemployment series