Setup

Load packages

library("loo")
library("rstanarm")
options(mc.cores = parallel::detectCores())
library("ggplot2")
library("bayesplot")
theme_set(bayesplot::theme_default(base_family = "sans"))

1 Introduction

In this case study, we demonstrate different cross-validation variants for hierarchical/multilevel models. A related video discusses the here used examples starting at 22:53. For time series specific cross-validation, see Bürkner, Gabry and Vehtari (2019).

2 Rats data

Throughout, we will use a simple grouped data. The example data is taken from Section 6 of Gelfand et al. (1990), and concerns 30 young rats whose weights were measured weekly for five consecutive weeks.

Load data

sourceToList = function(file){
  source(file, local = TRUE)
  d = mget(ls())
  d$file = NULL
  d
}
#
rats = sourceToList("rats.data.R")
rats = with(rats, list(
  N = N,
  Npts = length(y),
  rat = rep(1:nrow(y), ncol(y)),
  x = rep(x, each = nrow(y)),
  y = as.numeric(y),
  xbar = xbar
  ))

Create dataframe

dfrats <- with(rats, data.frame(age=x, age_c=x-22, weight=y, rat=rat))
N <- dim(dfrats)[1]

Plot data

pr <- ggplot(data=dfrats, aes(x=age, y=weight)) +
    geom_line(aes(group=rat), color="black", size=0.1) +
    geom_point(color="black", size=2) +
    labs(x="Age (days)", y="Weight (g)", title="Rats data")
pr

Just by looking at the data, it seems that if the rat growth would be modelled with a linear model (up to an age of 36 days). Individual intercepts are likely and possibly also individual slopes.

3 Models

We are going to compare three models: One with population effect only, another with an additional varying intercept term, and a third one with both varying intercept and slope terms.

Simple linear model

fit_1 <- stan_glm(weight ~ age, data=dfrats, refresh=0)

Linear model with hierarchical intercept

fit_2 <- stan_glmer(weight ~ age + (1 | rat), data=dfrats, refresh=0)

Linear model with hierarchical intercept and slope

fit_3 <- stan_glmer(weight ~ age + (age | rat), data=dfrats, refresh=0)

4 Leave-one-out cross-validation

In leave-one-out cross-validation (LOO), one observation is left out at a time and predicted given all the other observations.

pr1 <- pr +
    geom_point(data=dfrats[69,], color="red", size=5, shape=1) +
    ggtitle('Leave-one-out')
pr1

This is useful and valid if we are interested in model fit in general. We would use the model to predict random missing data, or if we were comparing different conditional observation models.

The loo package offers a fast Pareto smoothed importance sampling approximation of LOO (Vehtari, Gelman and Gabry, 2017; Vehtari et al., 2019)

loo_1 <- loo(fit_1)
loo_2 <- loo(fit_2)
loo_3 <- loo(fit_3)

We get warnings that PSIS-LOO is failing for the third model. We can check the details via

loo_3


Computed from 4000 by 150 log-likelihood matrix

         Estimate   SE
elpd_loo   -522.5 12.0
p_loo        53.8  7.3
looic      1045.1 24.0
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     104   69.3%   995       
 (0.5, 0.7]   (ok)        32   21.3%   155       
   (0.7, 1]   (bad)       11    7.3%   33        
   (1, Inf)   (very bad)   3    2.0%   12        
See help('pareto-k-diagnostic') for details.

As there are only 5 observations per rat, and hierarchical model has 2 rat specific parameters, some of the observations are highly influential and PSIS-LOO is not able to give reliable estimate (if PSIS-LOO fails, WAIC fails, too, but a failure of WAIC is more difficult to diagnose (Vehtari, Gelman and Gabry, 2017))

We can run exact LOO-CV for the failing folds.

(loo_3 <- loo(fit_3, k_threshold=0.7))


Computed from 4000 by 150 log-likelihood matrix

         Estimate   SE
elpd_loo   -523.2 12.4
p_loo        54.4  7.7
looic      1046.4 24.8
------
Monte Carlo SE of elpd_loo is 0.6.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     104   76.5%   995       
 (0.5, 0.7]   (ok)        32   23.5%   155       
   (0.7, 1]   (bad)        0    0.0%   <NA>      
   (1, Inf)   (very bad)   0    0.0%   <NA>      

All Pareto k estimates are ok (k < 0.7).
See help('pareto-k-diagnostic') for details.

We see that PSIS-LOO-estimated elpd_loo for model 3 was too optimistic by 2.6 points. Further its SE was also underestimated.

We can now safely do the model comparison:

loo_compare(loo_1, loo_2, loo_3)

      elpd_diff se_diff
fit_3    0.0       0.0 
fit_2  -23.7       9.5 
fit_1 -109.2      13.4

Model 3 is slightly better than model 2. Model 1 is clearly worst. Knowing all the other observations except one, it is beneficial to have individual intercept and slope terms.

When we have more than two models, it can be easier to understand the uncertainties in the comparison by looking at the model averaging weights based on Bayesian stacking (Yao et al., 2018) which optimizes the model weights so that the combined predictive distribution maximizes the estimate leave-one-out cross-validation performance.

lpd_point <- cbind(loo_1$pointwise[,"elpd_loo"],
                   loo_2$pointwise[,"elpd_loo"],
                   loo_3$pointwise[,"elpd_loo"])
stacking_weights(lpd_point)

Method: stacking
------
       weight
model1 0.000 
model2 0.087 
model3 0.913

The simple linear model has weight 0, but the hierarchical slope model has weight 0.1, which means we can get better predictions averaging over model 2 and 3, instead of just using the model 3. If we were forced to choose just one model, model 3 is providing the best predictive accuracy.

5 K-fold cross-validation

In K-fold cross-validation the data is divided in K blocks. By using different ways to divide the data, we can target for different prediction tasks or assess different model parts.

5.1 Random K-fold approximation of LOO

Sometimes it is possible that very large number of PSIS-LOO folds fail. In this case, performing exact LOO-CV for all of these observations would take too long. We can approximate LOO cross-validation running K-fold-CV with completely random division of data and then looking at the individual CV predictions.

The helper function kfold_split_random can be used to form such a random division. We generate random divisions with K=10 and K=30.

cv10rfolds <- kfold_split_random(K=10, N = N)
cv30rfolds <- kfold_split_random(K=30, N = N)

Let’s illustrate the first of the 30 folds:

prr <- pr +
    geom_point(data=dfrats[cv30rfolds==1,], color="red", size=5, shape=1) +
    ggtitle('Random kfold approximation of LOO')
prr

We use the kfold function for K-fold cross-validation. (The current release version requires the number of folds (K), but a fixed version can infer it from the folds argument.)

cv10r_1 <- rstanarm::kfold(fit_1, K=10, folds = cv10rfolds)
cv10r_2 <- rstanarm::kfold(fit_2, K=10, folds = cv10rfolds)
cv10r_3 <- rstanarm::kfold(fit_3, K=10, folds = cv10rfolds)
cv30r_1 <- rstanarm::kfold(fit_1, K=30, folds = cv30rfolds)
cv30r_2 <- rstanarm::kfold(fit_2, K=30, folds = cv30rfolds)
cv30r_3 <- rstanarm::kfold(fit_3, K=30, folds = cv30rfolds)

Compare models

loo_compare(cv10r_1, cv10r_2, cv10r_3)

      elpd_diff se_diff
fit_3    0.0       0.0 
fit_2  -27.9       7.8 
fit_1 -112.5      10.7

loo_compare(cv30r_1, cv30r_2, cv30r_3)

      elpd_diff se_diff
fit_3    0.0       0.0 
fit_2  -22.7       9.5 
fit_1 -107.1      13.3

The results are similar to LOO. In both cases, the elpd’s are slightly lower than with LOO, and the difference is larger for K=10 and increases with model complexity, which is due to model fitted to less observations than in LOO.

Corresoponing model weights using Bayesian stacking method

lpd_point <- cbind(cv30r_1$pointwise[,"elpd_kfold"],
                   cv30r_2$pointwise[,"elpd_kfold"],
                   cv30r_3$pointwise[,"elpd_kfold"])
stacking_weights(lpd_point)

Method: stacking
------
       weight
model1 0.000 
model2 0.085 
model3 0.915

The model weights are very close to ones with leave-one-out cross-validation, which is expected.

5.2 Stratified K-fold approximation of LOO

The random split might just by chance leave out more than one observation from one rat, which would not be good for approximating LOO in case of hierarchical models. We can further improve K-fold-CV by using stratified resampling which ensures that the relative category frequencies are approximately preserved. In this case, it means that from each rat only up to one observation is left out per fold.

The helper function kfold_split_stratified can be used to form a stratified division.

cv10sfolds <- kfold_split_stratified(K=10, x = dfrats$rat)
cv30sfolds <- kfold_split_stratified(K=30, x = dfrats$rat)

Let’s illustrate the first of the 30 folds:

prs <- pr +
    geom_point(data=dfrats[cv30sfolds==1,], color="red", size=5, shape=1) +
    ggtitle('Stratified K-fold approximation of LOO')
prs

We use the kfold function for K-fold cross-validation.

cv10s_1 <- rstanarm::kfold(fit_1, K=10, folds = cv10sfolds)
cv10s_2 <- rstanarm::kfold(fit_2, K=10, folds = cv10sfolds)
cv10s_3 <- rstanarm::kfold(fit_3, K=10, folds = cv10sfolds)
cv30s_1 <- rstanarm::kfold(fit_1, K=30, folds = cv30sfolds)
cv30s_2 <- rstanarm::kfold(fit_2, K=30, folds = cv30sfolds)
cv30s_3 <- rstanarm::kfold(fit_3, K=30, folds = cv30sfolds)

Compare models

loo_compare(cv10s_1, cv10s_2, cv10s_3)

      elpd_diff se_diff
fit_3    0.0       0.0 
fit_2  -21.7       9.6 
fit_1 -106.6      13.4

loo_compare(cv30s_1, cv30s_2, cv30s_3)

      elpd_diff se_diff
fit_3    0.0       0.0 
fit_2  -22.9       9.5 
fit_1 -108.9      13.4

The results are similar to LOO. In both cases, elpd’s are slightly lower than with LOO, and the difference increases with model complexity, which is due to model fitted to less observations than in LOO. For hierarchical models, the results with K=10 and k=30 are closer to each other than in case of complete random division, as the stratified division balances the data division.

Corresoponing model weights using Bayesian stacking method

lpd_point <- cbind(cv30s_1$pointwise[,"elpd_kfold"],
                   cv30s_2$pointwise[,"elpd_kfold"],
                   cv30s_3$pointwise[,"elpd_kfold"])
stacking_weights(lpd_point)

Method: stacking
------
       weight
model1 0.000 
model2 0.089 
model3 0.911

The model weights are very close to ones with leave-one-out cross-validation, which is expected.

5.3 Grouped K-fold for leave-one-group-out

K-fold cross-validation can also be used for leave-one-group-out cross-validation (LOGO-CV). In our example, each group could represent all observation from a particular rat. LOGO-CV is useful if the future prediction task would be to predict growth curves for new rats, or if we are interested in primarily assessing hierarchical part of the model.

In theory, PSIS could be used to also approximate LOGO-CV. However, in hierarchical models, each group has its own set of parameters and the posterior of those parameters tend to change a lot if all observations in that group are removed, which likely leads to failure of importance sampling. For certain models, quadrature methods could be used to compute integrated (marginalized) importance sampling (Merkle, Furr and Rabe-Hesketh, 2018).

The helper function kfold_split_grouped can be used to form a grouped division. With K=30 we thus perform leave-one-rat-out CV. With K=10 we get faster computation by leaving out 3 rats at a time, but the results are likely to be similar to K=30.

cv10gfolds <- kfold_split_grouped(K = 10, x = dfrats$rat)
cv30gfolds <- kfold_split_grouped(K = 30, x = dfrats$rat)

Let’s illustrate the first of the 30 folds:

prg <- pr +
    geom_point(data=dfrats[cv30gfolds==1,], color="red", size=5, shape=1) +
    ggtitle('Leave-one-rat-out')
prg

We use the kfold function for K-fold cross-validation.

cv10g_1 <- rstanarm::kfold(fit_1, K=10, folds = cv10gfolds)
cv10g_2 <- rstanarm::kfold(fit_2, K=10, folds = cv10gfolds)
cv10g_3 <- rstanarm::kfold(fit_3, K=10, folds = cv10gfolds)
cv30g_1 <- rstanarm::kfold(fit_1, K=30, folds = cv30gfolds)
cv30g_2 <- rstanarm::kfold(fit_2, K=30, folds = cv30gfolds)
cv30g_3 <- rstanarm::kfold(fit_3, K=30, folds = cv30gfolds)

Compare models

loo_compare(cv10g_1, cv10g_2, cv10g_3)

      elpd_diff se_diff
fit_3  0.0       0.0   
fit_2 -2.6       3.7   
fit_1 -5.5       4.8

loo_compare(cv30g_1, cv30g_2, cv30g_3)

      elpd_diff se_diff
fit_3  0.0       0.0   
fit_2 -4.4       4.0   
fit_1 -6.1       4.5

The results are very different than those obtained by LOO. The order of the models is the same, but the differences are much smaller. As there is no rat-specific covariate information, there is not much difference between predicting with the population curve and a normal response distribution with large scale (fit_1) or predicting with uncertain individual curves and and a normal response distribution with small scale (fit_2 and fit_3).

Corresoponing model weights using Bayesian stacking method

lpd_point <- cbind(cv30g_1$pointwise[,"elpd_kfold"],
                   cv30g_2$pointwise[,"elpd_kfold"],
                   cv30g_3$pointwise[,"elpd_kfold"])
stacking_weights(lpd_point)

Method: stacking
------
       weight
model1 0.000 
model2 0.176 
model3 0.824

The model weights are now different than those obtained with LOO. The elpd difference are small, but stacking weights show that if predict mostly with hierarchical intercept and slope model (fit_3), the other models are not able to add much.

In the above model, The SE of the elpd differences (se_diff) was computed without taking into account the grouping structure. A more accurate SE estimate could be obtained by firsting computing the group specific elpds:

cvgfix <- function(cv, cvidx) {
    groupwise=numeric();
    K <- length(unique(cvidx))
    for (i in 1:K) { groupwise[i]=sum(cv$pointwise[cvidx==i,"elpd_kfold"])}
    cv$pointwise <- cbind(elpd_kfolds=groupwise)
    cv$se_elpd_kfold <- sd(groupwise)*sqrt(K)
    cv$estimates[2] <- cv$se_elpd_kfold
    cv
}

Note that, to sum the pointwise elpds for each rat, we use the 30-fold structure even when grouping elpds from 10-fold-CV.

cv10gg_1 <- cvgfix(cv10g_1, cv30gfolds)
cv10gg_2 <- cvgfix(cv10g_2, cv30gfolds)
cv10gg_3 <- cvgfix(cv10g_3, cv30gfolds)
cv30gg_1 <- cvgfix(cv30g_1, cv30gfolds)
cv30gg_2 <- cvgfix(cv30g_2, cv30gfolds)
cv30gg_3 <- cvgfix(cv30g_3, cv30gfolds)

Now we are comparing 30 groupwise elpds, instead of 150 pointwise elpds.

loo_compare(cv10gg_1, cv10gg_2, cv10gg_3)

      elpd_diff se_diff
fit_3  0.0       0.0   
fit_2 -2.6       2.3   
fit_1 -5.5       4.4

loo_compare(cv30gg_1, cv30gg_2, cv30gg_3)

      elpd_diff se_diff
fit_3  0.0       0.0   
fit_2 -4.4       2.6   
fit_1 -6.1       3.9

The groupwise computation doesn’t change the elpd differences, but changes the corresponding SE, which are now smaller. This implies a bit more accuracy in the comparison, but the differences are still small.

Corresoponing model weights using Bayesian stacking method

lpd_point <- cbind(cv30gg_1$pointwise[,"elpd_kfolds"],
                   cv30gg_2$pointwise[,"elpd_kfolds"],
                   cv30gg_3$pointwise[,"elpd_kfolds"])
stacking_weights(lpd_point)

Method: stacking
------
       weight
model1 0.000 
model2 0.000 
model3 1.000

The smaller SEs are reflected also in more certain model weights.

5.4 Grouped K-fold for prediction given initial weight

We can modify cross-validation to different prediction tasks. If in the future we would like to predict growth curves after we have measured the birth weight, we could leave one rat out except for the first observation.

We can use kfold_split_grouped for this purpose and then manually form an extra group for the initial weight. Here, We compute results only for K=10.

cv10xfolds <- kfold_split_grouped(K = 10, x = dfrats$rat)
cv10xfolds[dfrats$age==8] <- 11

Still, the 30-folds have to be used for the rat specific predictions before performing the actual comparison.

cv30xfolds <- kfold_split_grouped(K = 30, x = dfrats$rat)
cv30xfolds[dfrats$age==8] <- 31
prx <- pr +
    geom_point(data=dfrats[cv10xfolds==1,], color="red", size=5, shape=1) +
    ggtitle('Predict given initial weight')
prx

We use the kfold function for K-fold cross-validation.

cv10x_1 <- rstanarm::kfold(fit_1, K=10, folds = cv10xfolds)
cv10x_2 <- rstanarm::kfold(fit_2, K=10, folds = cv10xfolds)
cv10x_3 <- rstanarm::kfold(fit_3, K=10, folds = cv10xfolds)

Compute groupwise elpds:

cvxxfix <- function(cv, cvidx) {
    groupwise=numeric();
    K <- length(unique(cvidx))-1
    for (i in 1:K) { groupwise[i]=sum(cv$pointwise[cvidx==i,"elpd_kfold"])}
    cv$pointwise <- cbind(elpd_kfolds=groupwise)
    cv$se_elpd_kfold <- sd(groupwise)*sqrt(K)
    cv$estimates[2] <- cv$se_elpd_kfold
    cv
}
cv10xx_1 <- cvxxfix(cv10x_1, cv30xfolds)
cv10xx_2 <- cvxxfix(cv10x_2, cv30xfolds)
cv10xx_3 <- cvxxfix(cv10x_3, cv30xfolds)

Compare models

loo_compare(cv10xx_1, cv10xx_2, cv10xx_3)

      elpd_diff se_diff
fit_3   0.0       0.0  
fit_2 -13.1      11.5  
fit_1 -43.5      19.3

The results are different from those obtained by LOO and LOGO. Still, the order of the models is the same. Model 1 is clearly worse. Knowing the initial weight, we get quite similar predictive accuracy when using a common slope instead of varying slopes.

Corresoponing model weights using Bayesian stacking method

lpd_point <- cbind(cv10xx_1$pointwise[,"elpd_kfolds"],
                   cv10xx_2$pointwise[,"elpd_kfolds"],
                   cv10xx_3$pointwise[,"elpd_kfolds"])
stacking_weights(lpd_point)

Method: stacking
------
       weight
model1 0.129 
model2 0.282 
model3 0.589

The model weights are more diffuse now. The stacking weights are not computed from the differences and SEs, but optimized to maximize the estimated predictive performance. The diffuse weights indicate that mixture prediction is better, which might indicate slight misspecification of the observation model.

6 Alternative models for the prediction given initial weight

We may formulate the initial weight approach also in a different way, which then resembles a group-specific covariate approach.

Create dataframe

dfrats2 <- with(rats, data.frame(age=x[x>8], age_c=x[x>8]-25.5, weight=y[x>8], rat=rat[x>8],
                                 initweight_c = rep(y[x==8],4)-mean(y[x==8])))

6.1 Models

Simple linear model

fit2_1 <- stan_glm(weight ~ initweight_c + age_c, data=dfrats2, refresh=0)

Linear model with hierarchical intercept

fit2_2 <- stan_glmer(weight ~ initweight_c + age_c + (1 | rat), data=dfrats2, refresh=0)

Linear model with hierarchical intercept and slope

fit2_3 <- stan_glmer(weight ~ initweight_c + age_c + (age_c | rat), data=dfrats2, refresh=0)

6.2 Grouped K-fold for prediction given initial weight

The helper function kfold_split_grouped can be used to form a grouped division.

cv10g2folds <- kfold_split_grouped(K = 10, x = dfrats2$rat)
cv30g2folds <- kfold_split_grouped(K = 30, x = dfrats2$rat)

We use the kfold function for K-fold cross-validation.

cv10g2_1 <- rstanarm::kfold(fit2_1, K=10, folds = cv10g2folds)
cv10g2_2 <- rstanarm::kfold(fit2_2, K=10, folds = cv10g2folds)
cv10g2_3 <- rstanarm::kfold(fit2_3, K=10, folds = cv10g2folds)

Compute groupwise elpds:

cv10gg2_1 <- cvgfix(cv10g2_1, cv30g2folds)
cv10gg2_2 <- cvgfix(cv10g2_2, cv30g2folds)
cv10gg2_3 <- cvgfix(cv10g2_3, cv30g2folds)

Now we are comparing 30 groupwise elpds, instead of 150 pointwise elpds.

loo_compare(cv10gg2_1, cv10gg2_2, cv10gg2_3)

       elpd_diff se_diff
fit2_3   0.0       0.0  
fit2_2 -18.1       6.2  
fit2_1 -21.4       7.4

Model 3 is clearly the best. When the model includes the initial weight, adding varying intercepts doesn’t improve prediction accuracy, but adding varying slopes does.

Corresoponing model weights using Bayesian stacking method

lpd_point <- cbind(cv10gg2_1$pointwise[,"elpd_kfolds"],
                   cv10gg2_2$pointwise[,"elpd_kfolds"],
                   cv10gg2_3$pointwise[,"elpd_kfolds"])
stacking_weights(lpd_point)

Method: stacking
------
       weight
model1 0.000 
model2 0.060 
model3 0.940

The model with hierarchical intercept and slope (fit2_3) gets again the largest weight and it is likely that slopes are varying.

7 Conclusion

In all comparisons shown in this case study, model 3 was best followed by model 2 while model 1 clearly performed worst. However, depending on the particular cross-validation approach, the differences between models varied.

The stacking model weights complement the results and show also when model averaging could improve the predictive performance.

Throughout this case study, we have used rstanarm for the model fitting. If instead you prefer to use brms, the loo and kfold methods and their primary arguments will continue to work in the same way.

References

Bürkner, P.-C., Gabry, J. and Vehtari, A. (2019) ‘Approximate leave-future-out cross-validation for time series models’, arXiv preprint arXiv:1902.06281. Available at: https://arxiv.org/abs/1902.06281.

Gelfand, A. E., Hills, S. E., Racine-Poon, A. and Smith, A. F. (1990) ‘Illustration of Bayesian inference in normal data models using Gibbs sampling’, Journal of the American Statistical Association, 85(412), pp. 972–985.

Merkle, E. C., Furr, D. and Rabe-Hesketh, S. (2018) ‘Bayesian comparison of latent variable models: Conditional vs marginal likelihoods’, arXiv preprint arXiv:1802.04452. Available at: https://arxiv.org/abs/1802.04452.

Vehtari, A., Gelman, A. and Gabry, J. (2017) ‘Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC’, Statistics and Computing, 27(5), pp. 1413–1432. doi: 10.1007/s11222-016-9696-4.

Vehtari, A., Simpson, D., Gelman, A., Yao, Y. and Gabry, J. (2019) ‘Pareto smoothed importance sampling’, arXiv preprint arXiv:1507.02646. Available at: https://arxiv.org/abs/1507.02646v6.

Yao, Y., Vehtari, A., Simpson, D. and Gelman, A. (2018) ‘Using stacking to average Bayesian predictive distributions (with discussion)’, Bayesian Analysis, 13(3), pp. 917–1003. doi: 10.1214/17-BA1091.

Licenses

Original Computing Environment

sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fi_FI.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bayesplot_1.8.1 ggplot2_3.3.5   rstanarm_2.21.1 Rcpp_1.0.8     
[5] loo_2.4.1       rmarkdown_2.11  knitr_1.37     

loaded via a namespace (and not attached):
 [1] nlme_3.1-155         matrixStats_0.61.0   xts_0.12.1          
 [4] threejs_0.3.3        rstan_2.21.3         tools_4.1.2         
 [7] backports_1.4.1      bslib_0.3.1          utf8_1.2.2          
[10] R6_2.5.1             DT_0.20              DBI_1.1.2           
[13] colorspace_2.0-2     withr_2.4.3          tidyselect_1.1.1    
[16] gridExtra_2.3        prettyunits_1.1.1    processx_3.5.2      
[19] compiler_4.1.2       cli_3.1.1            shinyjs_2.1.0       
[22] labeling_0.4.2       colourpicker_1.1.1   sass_0.4.0          
[25] scales_1.1.1         dygraphs_1.1.1.6     checkmate_2.0.0     
[28] ggridges_0.5.3       callr_3.7.0          stringr_1.4.0       
[31] digest_0.6.29        StanHeaders_2.21.0-7 minqa_1.2.4         
[34] base64enc_0.1-3      pkgconfig_2.0.3      htmltools_0.5.2     
[37] lme4_1.1-28          fastmap_1.1.0        highr_0.9           
[40] htmlwidgets_1.5.4    rlang_1.0.1          shiny_1.7.1         
[43] farver_2.1.0         jquerylib_0.1.4      generics_0.1.2      
[46] zoo_1.8-9            jsonlite_1.7.3       crosstalk_1.2.0     
[49] gtools_3.9.2         dplyr_1.0.8          inline_0.3.19       
[52] magrittr_2.0.2       Matrix_1.4-0         munsell_0.5.0       
[55] fansi_1.0.2          lifecycle_1.0.1      stringi_1.7.6       
[58] yaml_2.2.2           MASS_7.3-55          pkgbuild_1.3.1      
[61] plyr_1.8.6           grid_4.1.2           parallel_4.1.2      
[64] promises_1.2.0.1     crayon_1.4.2         miniUI_0.1.1.1      
[67] lattice_0.20-45      splines_4.1.2        ps_1.6.0            
[70] pillar_1.7.0         igraph_1.2.11        boot_1.3-28         
[73] markdown_1.1         shinystan_2.5.0      reshape2_1.4.4      
[76] codetools_0.2-18     stats4_4.1.2         rstantools_2.1.1    
[79] glue_1.6.1           evaluate_0.14        RcppParallel_5.1.5  
[82] vctrs_0.3.8          nloptr_1.2.2.3       httpuv_1.6.5        
[85] gtable_0.3.0         purrr_0.3.4          assertthat_0.2.1    
[88] xfun_0.29            mime_0.12            xtable_1.8-4        
[91] later_1.3.0          rsconnect_0.8.25     survival_3.2-13     
[94] tibble_3.1.6         shinythemes_1.2.0    ellipsis_0.3.2

Cross-validation for hierarchical models

Aki Vehtari

First version 2019-03-11. Last modified 2022-02-20.

Setup

1 Introduction

2 Rats data

3 Models

4 Leave-one-out cross-validation

5 K-fold cross-validation

5.1 Random K-fold approximation of LOO

5.2 Stratified K-fold approximation of LOO

5.3 Grouped K-fold for leave-one-group-out

5.4 Grouped K-fold for prediction given initial weight

6 Alternative models for the prediction given initial weight

6.1 Models

6.2 Grouped K-fold for prediction given initial weight

7 Conclusion

References

Licenses

Original Computing Environment