Linear regression and K-fold cross-validation. See Chapter 11 in Regression and Other Stories.


Load packages

library("rprojroot")
root<-has_file(".ROS-Examples-root")$make_fix_file()
library("rstanarm")
library("foreign")
# for reproducability
SEED <- 1507

Load children's test scores data

kidiq <- read.csv(root("KidIQ/data","kidiq.csv"))
head(kidiq)
  kid_score mom_hs    mom_iq mom_work mom_age
1        65      1 121.11753        4      27
2        98      1  89.36188        4      25
3        85      1 115.44316        4      27
4        83      1  99.44964        3      25
5       115      1  92.74571        4      27
6        98      0 107.90184        1      18

Bayesian regression with the original predictors

stan_fit_3 <- stan_glm(kid_score ~ mom_hs + mom_iq, data=kidiq,
                       seed=SEED, refresh = 0)
print(stan_fit_3)
stan_glm
 family:       gaussian [identity]
 formula:      kid_score ~ mom_hs + mom_iq
 observations: 434
 predictors:   3
------
            Median MAD_SD
(Intercept) 25.8    5.9  
mom_hs       5.9    2.2  
mom_iq       0.6    0.1  

Auxiliary parameter(s):
      Median MAD_SD
sigma 18.2    0.6  

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

Leave-one-out cross-validation

loo3 <- loo(stan_fit_3)
loo3

Computed from 4000 by 434 log-likelihood matrix

         Estimate   SE
elpd_loo  -1876.1 14.2
p_loo         4.0  0.4
looic      3752.1 28.5
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

K-fold cross-validation

kcv3 <- kfold(stan_fit_3)
kcv3

Based on 10-fold cross-validation

           Estimate   SE
elpd_kfold  -1875.0 14.1
p_kfold          NA   NA
kfoldic      3750.0 28.2
LS0tCnRpdGxlOiAiUmVncmVzc2lvbiBhbmQgT3RoZXIgU3RvcmllczogS2lkSVEgY3Jvc3MtdmFsaWRhdGlvbiIKYXV0aG9yOiAiQW5kcmV3IEdlbG1hbiwgSmVubmlmZXIgSGlsbCwgQWtpIFZlaHRhcmkiCmRhdGU6ICJgciBmb3JtYXQoU3lzLkRhdGUoKSlgIgpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIHRoZW1lOiByZWFkYWJsZQogICAgdG9jOiB0cnVlCiAgICB0b2NfZGVwdGg6IDIKICAgIHRvY19mbG9hdDogdHJ1ZQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQotLS0KTGluZWFyIHJlZ3Jlc3Npb24gYW5kIEstZm9sZCBjcm9zcy12YWxpZGF0aW9uLiBTZWUgQ2hhcHRlciAxMSBpbgpSZWdyZXNzaW9uIGFuZCBPdGhlciBTdG9yaWVzLgoKLS0tLS0tLS0tLS0tLQoKCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQobWVzc2FnZT1GQUxTRSwgZXJyb3I9RkFMU0UsIHdhcm5pbmc9RkFMU0UsIGNvbW1lbnQ9TkEpCmBgYAoKIyMjIyBMb2FkIHBhY2thZ2VzCgpgYGB7ciB9CmxpYnJhcnkoInJwcm9qcm9vdCIpCnJvb3Q8LWhhc19maWxlKCIuUk9TLUV4YW1wbGVzLXJvb3QiKSRtYWtlX2ZpeF9maWxlKCkKbGlicmFyeSgicnN0YW5hcm0iKQpsaWJyYXJ5KCJmb3JlaWduIikKIyBmb3IgcmVwcm9kdWNhYmlsaXR5ClNFRUQgPC0gMTUwNwpgYGAKCiMjIyMgTG9hZCBjaGlsZHJlbidzIHRlc3Qgc2NvcmVzIGRhdGEKCmBgYHtyIH0Ka2lkaXEgPC0gcmVhZC5jc3Yocm9vdCgiS2lkSVEvZGF0YSIsImtpZGlxLmNzdiIpKQpoZWFkKGtpZGlxKQpgYGAKCiMjIyMgQmF5ZXNpYW4gcmVncmVzc2lvbiB3aXRoIHRoZSBvcmlnaW5hbCBwcmVkaWN0b3JzCgpgYGB7ciB9CnN0YW5fZml0XzMgPC0gc3Rhbl9nbG0oa2lkX3Njb3JlIH4gbW9tX2hzICsgbW9tX2lxLCBkYXRhPWtpZGlxLAogICAgICAgICAgICAgICAgICAgICAgIHNlZWQ9U0VFRCwgcmVmcmVzaCA9IDApCnByaW50KHN0YW5fZml0XzMpCmBgYAoKIyMjIyBMZWF2ZS1vbmUtb3V0IGNyb3NzLXZhbGlkYXRpb24KCmBgYHtyIH0KbG9vMyA8LSBsb28oc3Rhbl9maXRfMykKbG9vMwpgYGAKCiMjIyMgSy1mb2xkIGNyb3NzLXZhbGlkYXRpb24KCmBgYHtyIH0Ka2N2MyA8LSBrZm9sZChzdGFuX2ZpdF8zKQprY3YzCmBgYAoK