Demonstration of \(K\)-fold cross-validation using simulated data. See Chapter 11 in Regression and Other Stories.


Load packages

library("MASS")      # needed for mvrnorm()
library("rstanarm")
library("loo")

Set random seed for reproducability

SEED <- 1754

Generate fake data

\(60\times 30\) matrix representing 30 predictors that are random but not independent; rather, we draw them from a multivariate normal distribution with correlations 0.8:

set.seed(SEED)
n <- 60
k <- 30
rho <- 0.8
Sigma <- rho*array(1, c(k,k)) + (1-rho)*diag(k)
X <- mvrnorm(n, rep(0,k), Sigma)
b <- c(c(-1, 1, 2), rep(0,k-3))
y <- X %*% b + rnorm(n)*2
fake <- data.frame(X, y)

Weakly informative prior

fit_1 <- stan_glm(y ~ ., prior=normal(0, 10, autoscale=FALSE),
                  data=fake, seed=SEED, refresh=0)
(loo_1 <- loo(fit_1))

Computed from 4000 by 60 log-likelihood matrix

         Estimate   SE
elpd_loo   -153.8  5.4
p_loo        28.7  3.6
looic       307.5 10.7
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     16    26.7%   829       
 (0.5, 0.7]   (ok)       29    48.3%   169       
   (0.7, 1]   (bad)      15    25.0%   21        
   (1, Inf)   (very bad)  0     0.0%   <NA>      
See help('pareto-k-diagnostic') for details.

In this case, Pareto smoothed importance sampling LOO fails, but the diagnostic recognizes this with many high Pareto k values. We can run slower, but more robust K-fold-CV

kfold_1 <- kfold(fit_1)

An alternative weakly informative prior

The regularized horseshoe prior hs() is weakly informative, stating that it is likely that only small number of predictors are relevant, but we donโ€™t know which ones.

k0 <- 2 # prior guess for the number of relevant variables
tau0 <- k0/(k-k0) * 1/sqrt(n)
hs_prior <- hs(df=1, global_df=1, global_scale=tau0, slab_scale=3, slab_df=7)
fit_2 <- stan_glm(y ~ ., prior=hs_prior, data=fake, seed=SEED,
                  control=list(adapt_delta=0.995), refresh=200)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 4.5e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.45 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 5.98948 seconds (Warm-up)
Chain 1:                5.81409 seconds (Sampling)
Chain 1:                11.8036 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 3.5e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.35 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 4.87941 seconds (Warm-up)
Chain 2:                4.88352 seconds (Sampling)
Chain 2:                9.76293 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 3.4e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 4.87019 seconds (Warm-up)
Chain 3:                5.84669 seconds (Sampling)
Chain 3:                10.7169 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.000115 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.15 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 4.93557 seconds (Warm-up)
Chain 4:                5.97002 seconds (Sampling)
Chain 4:                10.9056 seconds (Total)
Chain 4: 
(loo_2 <- loo(fit_2))

Computed from 4000 by 60 log-likelihood matrix

         Estimate   SE
elpd_loo   -143.3  7.9
p_loo        10.8  2.9
looic       286.5 15.9
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     55    91.7%   390       
 (0.5, 0.7]   (ok)        4     6.7%   225       
   (0.7, 1]   (bad)       1     1.7%   60        
   (1, Inf)   (very bad)  0     0.0%   <NA>      
See help('pareto-k-diagnostic') for details.

PSIS-LOO performs better now, but there is still one bad diagnostic value, and thus we run again slower, but more robust K-fold-CV

kfold_2 <- kfold(fit_2)

Comparison of models

loo_compare(loo_1,loo_2)
      elpd_diff se_diff
fit_2   0.0       0.0  
fit_1 -10.5       5.2  
loo_compare(kfold_1,kfold_2)
      elpd_diff se_diff
fit_2   0.0       0.0  
fit_1 -17.7       5.6  

As PSIS-LOO fails, PSIS-LOO comparison underestimates the difference between the models. The Pareto k diagnostic correctly identified the problem, and more robust K-fold-CV shows that by using a better prior we can get better predictions.

LS0tCnRpdGxlOiAiUmVncmVzc2lvbiBhbmQgT3RoZXIgU3RvcmllczogRmFrZUtDViIKYXV0aG9yOiAiQW5kcmV3IEdlbG1hbiwgSmVubmlmZXIgSGlsbCwgQWtpIFZlaHRhcmkiCmRhdGU6ICJgciBmb3JtYXQoU3lzLkRhdGUoKSlgIgpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIHRoZW1lOiByZWFkYWJsZQogICAgdG9jOiB0cnVlCiAgICB0b2NfZGVwdGg6IDIKICAgIHRvY19mbG9hdDogdHJ1ZQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQotLS0KRGVtb25zdHJhdGlvbiBvZiAkSyQtZm9sZCBjcm9zcy12YWxpZGF0aW9uIHVzaW5nIHNpbXVsYXRlZApkYXRhLiBTZWUgQ2hhcHRlciAxMSBpbiBSZWdyZXNzaW9uIGFuZCBPdGhlciBTdG9yaWVzLgoKLS0tLS0tLS0tLS0tLQoKCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQobWVzc2FnZT1GQUxTRSwgZXJyb3I9RkFMU0UsIHdhcm5pbmc9RkFMU0UsIGNvbW1lbnQ9TkEpCmBgYAoKIyMjIyBMb2FkIHBhY2thZ2VzCgpgYGB7ciB9CmxpYnJhcnkoIk1BU1MiKSAgICAgICMgbmVlZGVkIGZvciBtdnJub3JtKCkKbGlicmFyeSgicnN0YW5hcm0iKQpsaWJyYXJ5KCJsb28iKQpgYGAKClNldCByYW5kb20gc2VlZCBmb3IgcmVwcm9kdWNhYmlsaXR5CgpgYGB7ciB9ClNFRUQgPC0gMTc1NApgYGAKCiMjIyMgR2VuZXJhdGUgZmFrZSBkYXRhCgokNjBcdGltZXMgMzAkIG1hdHJpeCByZXByZXNlbnRpbmcgMzAgcHJlZGljdG9ycyB0aGF0IGFyZSByYW5kb20gYnV0Cm5vdCBpbmRlcGVuZGVudDsgcmF0aGVyLCB3ZSBkcmF3IHRoZW0gZnJvbSBhIG11bHRpdmFyaWF0ZSBub3JtYWwKZGlzdHJpYnV0aW9uIHdpdGggY29ycmVsYXRpb25zIDAuODoKCmBgYHtyIH0Kc2V0LnNlZWQoU0VFRCkKbiA8LSA2MAprIDwtIDMwCnJobyA8LSAwLjgKU2lnbWEgPC0gcmhvKmFycmF5KDEsIGMoayxrKSkgKyAoMS1yaG8pKmRpYWcoaykKWCA8LSBtdnJub3JtKG4sIHJlcCgwLGspLCBTaWdtYSkKYiA8LSBjKGMoLTEsIDEsIDIpLCByZXAoMCxrLTMpKQp5IDwtIFggJSolIGIgKyBybm9ybShuKSoyCmZha2UgPC0gZGF0YS5mcmFtZShYLCB5KQpgYGAKCiMjIyMgV2Vha2x5IGluZm9ybWF0aXZlIHByaW9yCgpgYGB7ciB9CmZpdF8xIDwtIHN0YW5fZ2xtKHkgfiAuLCBwcmlvcj1ub3JtYWwoMCwgMTAsIGF1dG9zY2FsZT1GQUxTRSksCiAgICAgICAgICAgICAgICAgIGRhdGE9ZmFrZSwgc2VlZD1TRUVELCByZWZyZXNoPTApCihsb29fMSA8LSBsb28oZml0XzEpKQpgYGAKCkluIHRoaXMgY2FzZSwgUGFyZXRvIHNtb290aGVkIGltcG9ydGFuY2Ugc2FtcGxpbmcgTE9PIGZhaWxzLCBidXQKdGhlIGRpYWdub3N0aWMgcmVjb2duaXplcyB0aGlzIHdpdGggbWFueSBoaWdoIFBhcmV0byBrIHZhbHVlcy4gV2UKY2FuIHJ1biBzbG93ZXIsIGJ1dCBtb3JlIHJvYnVzdCBLLWZvbGQtQ1YKCmBgYHtyIH0Ka2ZvbGRfMSA8LSBrZm9sZChmaXRfMSkKYGBgCgojIyMjIEFuIGFsdGVybmF0aXZlIHdlYWtseSBpbmZvcm1hdGl2ZSBwcmlvcjxicj4KVGhlIHJlZ3VsYXJpemVkIGhvcnNlc2hvZSBwcmlvciBgaHMoKWAgaXMgd2Vha2x5IGluZm9ybWF0aXZlLApzdGF0aW5nIHRoYXQgaXQgaXMgbGlrZWx5IHRoYXQgb25seSBzbWFsbCBudW1iZXIgb2YgcHJlZGljdG9ycyBhcmUKcmVsZXZhbnQsIGJ1dCB3ZSBkb24ndCBrbm93IHdoaWNoIG9uZXMuCgpgYGB7ciB9CmswIDwtIDIgIyBwcmlvciBndWVzcyBmb3IgdGhlIG51bWJlciBvZiByZWxldmFudCB2YXJpYWJsZXMKdGF1MCA8LSBrMC8oay1rMCkgKiAxL3NxcnQobikKaHNfcHJpb3IgPC0gaHMoZGY9MSwgZ2xvYmFsX2RmPTEsIGdsb2JhbF9zY2FsZT10YXUwLCBzbGFiX3NjYWxlPTMsIHNsYWJfZGY9NykKZml0XzIgPC0gc3Rhbl9nbG0oeSB+IC4sIHByaW9yPWhzX3ByaW9yLCBkYXRhPWZha2UsIHNlZWQ9U0VFRCwKICAgICAgICAgICAgICAgICAgY29udHJvbD1saXN0KGFkYXB0X2RlbHRhPTAuOTk1KSwgcmVmcmVzaD0yMDApCihsb29fMiA8LSBsb28oZml0XzIpKQpgYGAKClBTSVMtTE9PIHBlcmZvcm1zIGJldHRlciBub3csIGJ1dCB0aGVyZSBpcyBzdGlsbCBvbmUgYmFkIGRpYWdub3N0aWMKdmFsdWUsIGFuZCB0aHVzIHdlIHJ1biBhZ2FpbiBzbG93ZXIsIGJ1dCBtb3JlIHJvYnVzdCBLLWZvbGQtQ1YKCmBgYHtyIH0Ka2ZvbGRfMiA8LSBrZm9sZChmaXRfMikKYGBgCgojIyMjIENvbXBhcmlzb24gb2YgbW9kZWxzCgpgYGB7ciB9Cmxvb19jb21wYXJlKGxvb18xLGxvb18yKQpsb29fY29tcGFyZShrZm9sZF8xLGtmb2xkXzIpCmBgYAoKQXMgUFNJUy1MT08gZmFpbHMsIFBTSVMtTE9PIGNvbXBhcmlzb24gdW5kZXJlc3RpbWF0ZXMgdGhlCmRpZmZlcmVuY2UgYmV0d2VlbiB0aGUgbW9kZWxzLiBUaGUgUGFyZXRvIGsgZGlhZ25vc3RpYyBjb3JyZWN0bHkKaWRlbnRpZmllZCB0aGUgcHJvYmxlbSwgYW5kIG1vcmUgcm9idXN0IEstZm9sZC1DViBzaG93cyB0aGF0IGJ5CnVzaW5nIGEgYmV0dGVyIHByaW9yIHdlIGNhbiBnZXQgYmV0dGVyIHByZWRpY3Rpb25zLgo=