Linear regression and K-fold cross-validation. See Chapter 11 in Regression and Other Stories.
library("rprojroot")
root<-has_file(".ROS-Examples-root")$make_fix_file()
library("rstanarm")
library("foreign")
# for reproducability
SEED <- 1507
kidiq <- read.csv(root("KidIQ/data","kidiq.csv"))
head(kidiq)
kid_score mom_hs mom_iq mom_work mom_age
1 65 1 121.11753 4 27
2 98 1 89.36188 4 25
3 85 1 115.44316 4 27
4 83 1 99.44964 3 25
5 115 1 92.74571 4 27
6 98 0 107.90184 1 18
stan_fit_3 <- stan_glm(kid_score ~ mom_hs + mom_iq, data=kidiq,
seed=SEED, refresh = 0)
print(stan_fit_3)
stan_glm
family: gaussian [identity]
formula: kid_score ~ mom_hs + mom_iq
observations: 434
predictors: 3
------
Median MAD_SD
(Intercept) 25.8 5.9
mom_hs 5.9 2.2
mom_iq 0.6 0.1
Auxiliary parameter(s):
Median MAD_SD
sigma 18.2 0.6
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
loo3 <- loo(stan_fit_3)
loo3
Computed from 4000 by 434 log-likelihood matrix
Estimate SE
elpd_loo -1876.1 14.2
p_loo 4.0 0.4
looic 3752.1 28.5
------
Monte Carlo SE of elpd_loo is 0.0.
All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.
kcv3 <- kfold(stan_fit_3)
kcv3
Based on 10-fold cross-validation
Estimate SE
elpd_kfold -1875.0 14.1
p_kfold NA NA
kfoldic 3750.0 28.2