Setup

Load packages

library(rstanarm)
library(brms)
library(cmdstanr)
options(mc.cores = parallel::detectCores())
library(loo)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans"))

1 Introduction

This notebook demonstrates cross-validation of simple misspecified model. In this case, cross-validation is useful to detect misspecification.

The example comes from Chapter 8.3 of Gelman and Hill (2007) and the introduction text for the data is from Estimating Generalized Linear Models for Count Data with rstanarm by Jonah Gabry and Ben Goodrich.

We want to make inferences about the efficacy of a certain pest management system at reducing the number of roaches in urban apartments. Here is how Gelman and Hill describe the experiment (pg. 161):

the treatment and control were applied to 160 and 104 apartments, respectively, and the outcome measurement \(y_i\) in each apartment \(i\) was the number of roaches caught in a set of traps. Different apartments had traps for different numbers of days

In addition to an intercept, the regression predictors for the model are the pre-treatment number of roaches roach1, the treatment indicator treatment, and a variable indicating whether the apartment is in a building restricted to elderly residents senior. Because the number of days for which the roach traps were used is not the same for all apartments in the sample, we include it as an exposure2 by adding \(\ln(u_i)\)) to the linear predictor \(\eta_i\) and it can be specified using the offset argument to stan_glm.

2 Poisson model

Load data

data(roaches)
# Roach1 is very skewed and we take a square root
roaches$sqrt_roach1 <- sqrt(roaches$roach1)

Fit with stan_glm

stan_glmp <- stan_glm(y ~ sqrt_roach1 + treatment + senior, offset = log(exposure2),
                      data = roaches, family = poisson, 
                      prior = normal(0,2.5), prior_intercept = normal(0,5),
                      chains = 4, cores = 1, seed = 170400963, refresh=0)

2.1 Analyse posterior

Plot posterior

mcmc_areas(as.matrix(stan_glmp), prob_outer = .999)

We have all marginals significantly away from zero.

2.2 Cross-validation checking

We can use Pareto-smoothed importance sampling leave-one-out cross-validation as model checking tool (Vehtari, Gelman and Gabry, 2017).

(loop <- loo(stan_glmp))


Computed from 4000 by 262 log-likelihood matrix

         Estimate     SE
elpd_loo  -5461.3  695.1
p_loo       256.5   54.4
looic     10922.7 1390.1
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     240   91.6%   132       
 (0.5, 0.7]   (ok)         8    3.1%   47        
   (0.7, 1]   (bad)        7    2.7%   29        
   (1, Inf)   (very bad)   7    2.7%   1         
See help('pareto-k-diagnostic') for details.

We got serious warnings, let’s plot Pareto \(k\) values.

plot(loop)

There are several observations which are highly influential, which indicates potential model misspecification (Vehtari, Gelman and Gabry, 2017).

Before looking in more detail where the problem is or fixing it, let’s check what would cross-validation say about relevance of covariates.

We form 3 models by dropping each of the covariates out.

stan_glmm1p <- update(stan_glmp, formula = y ~ treatment + senior)
stan_glmm2p <- update(stan_glmp, formula = y ~ sqrt_roach1 + senior)
stan_glmm3p <- update(stan_glmp, formula = y ~ sqrt_roach1 + treatment)

Although Pareto \(k\) values were very large we can make a quick test with PSIS-LOO (if the comparison would say there is difference, then PSIS-LOO couldn’t be trusted and PSIS-LOO+ or k-fold-CV would be needed (see more in Vehtari, Gelman and Gabry, 2017)).

loo_compare(loo(stan_glmm1p), loop)

            elpd_diff se_diff
stan_glmp       0.0       0.0
stan_glmm1p -3001.2     687.0

loo_compare(loo(stan_glmm2p), loop)

            elpd_diff se_diff
stan_glmp      0.0       0.0 
stan_glmm2p -231.1     199.6

loo_compare(loo(stan_glmm3p), loop)

            elpd_diff se_diff
stan_glmp     0.0       0.0  
stan_glmm3p -10.8      91.7

Based on this the roaches covariate would be relevant, but although dropping treatment or senior covariate will make a large change to elpd, the uncertainty is also large and cross-validation states that these covariates are not necessarily relevant! The posterior marginals are conditional on the model, but cross-validation is more cautious by not using any model for the future data distribution.

2.3 Posterior predictive checking

It’s also good to remember that in addition of cross-validation, the posterior predictive checks can often detect problems and also provide more information about the reason. Here we test the proportion of zeros predicted by the model and compare them to the observed number of zeros.

prop_zero <- function(y) mean(y == 0)
(prop_zero_test1 <- pp_check(stan_glmp, plotfun = "stat", stat = "prop_zero"))

3 Negative binomial model

Next we change the Poisson model to a more robust negative binomial model

stan_glmnb <- update(stan_glmp, family = neg_binomial_2)

3.1 Analyse posterior

Plot posterior

mcmc_areas(as.matrix(stan_glmnb), prob_outer = .999,
    pars = c("(Intercept)","sqrt_roach1","treatment","senior"))

Treatment effect is much closer to zero, and senior effect has lot of probability mass on both sides of 0. So it matters, which model we use.

We discuss posterior dependencies in more detail in collinear notebook, but for reference we plot also here paired marginals.

mcmc_pairs(as.matrix(stan_glmnb),pars = c("(Intercept)","sqrt_roach1","treatment","senior"))

There are some posterior correlations, but not something which would change our conclusions.

3.2 Cross-validation checking

Let’s check PSIS-LOO Pareto \(k\) diagnostics

(loonb <- loo(stan_glmnb))


Computed from 4000 by 262 log-likelihood matrix

         Estimate   SE
elpd_loo   -881.8 38.3
p_loo         8.4  3.5
looic      1763.6 76.6
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     260   99.2%   1568      
 (0.5, 0.7]   (ok)         1    0.4%   99        
   (0.7, 1]   (bad)        1    0.4%   140       
   (1, Inf)   (very bad)   0    0.0%   <NA>      
See help('pareto-k-diagnostic') for details.

All khat’s are ok, which indicates that negative-Binomial would be better (for final results it would be good to run PSIS-LOO+). We can also compare Poisson and negative-Binomial.

loo_compare(loop, loonb)

           elpd_diff se_diff
stan_glmnb     0.0       0.0
stan_glmp  -4579.5     675.0

Negative-Binomial model is clearly better than Poisson.

As Poisson is a special case of negative-Binomial, we could have also seen that Poisson is likely by looking at the posterior of the over-dispersion parameter (which gets very small values).

mcmc_areas(as.matrix(stan_glmnb), prob_outer = .999,
    pars = c("reciprocal_dispersion"))

3.3 Posterior predictive checking

We next use posterior predictive checking to check that the improved model can also predict the proportion of zeros well.

(prop_zero_test2 <- pp_check(stan_glmnb, plotfun = "stat", stat = "prop_zero"))

The result looks much better than for the Poisson model.

3.4 Predictive relevance of covariates

Let’s finally check cross-validation model comparison that it agrees on relevance of covariates

stan_glmm1nb <- update(stan_glmm1p, family = neg_binomial_2)
stan_glmm2nb <- update(stan_glmm2p, family = neg_binomial_2)
stan_glmm3nb <- update(stan_glmm3p, family = neg_binomial_2)

loo_compare(loo(stan_glmm1nb), loonb)

             elpd_diff se_diff
stan_glmnb     0.0       0.0  
stan_glmm1nb -35.2       9.1

loo_compare(loo(stan_glmm2nb), loonb)

             elpd_diff se_diff
stan_glmnb    0.0       0.0   
stan_glmm2nb -8.3       5.6

loo_compare(loo(stan_glmm3nb), loonb)

             elpd_diff se_diff
stan_glmm3nb  0.0       0.0   
stan_glmnb   -2.3       1.9

Roaches1 has clear effect. Treatment effect is visible in marginal posterior, but as discussed in betablockers demo, cross-validation is not good for detecting weak effects. Based on cross-validation senior effect is also not relevant.

Conclusion from the analysis would be then that, treatment is likely to help, but it’s difficult to predict the number of roaches given treatment or not.

4 Poisson model with “random effects”

Sometimes overdispersion is modelled by adding “random effects” for each individual. The following example illustrates computational problems in this approach.

roaches$id <- 1:dim(roaches)[1]

Fit with stan_glm

stan_glmpr <- stan_glmer(y ~ sqrt_roach1 + treatment + senior + (1 | id), 
                         offset = log(exposure2),
                         data = roaches, family = poisson, 
                         prior = normal(0,2.5), prior_intercept = normal(0,5),
                         chains = 4, cores = 4, iter = 4000, 
                         seed = 170400963, refresh=0)

4.1 Analyse posterior

Plot posterior

mcmc_areas(as.matrix(stan_glmpr), prob_outer = .999,
    pars = c("(Intercept)","sqrt_roach1","treatment","senior"))

The marginals are similar as with negative-binomial model except the marginal for senior effect is clearly away from zero.

4.2 Cross-validation checking

Let’s check PSIS-LOO.

(loopr <- loo(stan_glmpr))


Computed from 8000 by 262 log-likelihood matrix

         Estimate   SE
elpd_loo   -638.0 24.4
p_loo       173.6  5.1
looic      1276.0 48.9
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)       2    0.8%   911       
 (0.5, 0.7]   (ok)        52   19.8%   146       
   (0.7, 1]   (bad)      180   68.7%   17        
   (1, Inf)   (very bad)  28   10.7%   3         
See help('pareto-k-diagnostic') for details.

We got serious warnings, which in this case is due to having a “random effect” parameter for each observations. Removing one observation changes the posterior for that random effect so much that importance sampling fails (even with Pareto smoothing). Note that WAIC would fail due to the same reason.

We can also plot Pareto \(k\) values.

plot(loopr)

We have a very large number of high \(k\) values which indicates very flexible model.

While importance sampling in PSIS-LOO can fail for “random effect” model, we can use \(K\)-fold-CV instead to re-fit the model 10 times, each time leaving out 10% of the observations. This shows that cross-validation itself is not infeasible for “random effect” models.

(kcvpr <- kfold(stan_glmpr, K=10))


Based on 10-fold cross-validation

           Estimate   SE
elpd_kfold   -878.3 38.2
p_kfold          NA   NA
kfoldic      1756.6 76.5

loo package allows comparing PSIS-LOO and \(K\)-fold-CV results

loo_compare(loonb, kcvpr)

           elpd_diff se_diff
stan_glmpr  0.0       0.0   
stan_glmnb -3.5       7.2

There is not much difference, and this difference could also be explained by \(K\)-fold-CV using only 90% of observations for the posteriors, while PSIS-LOO is using 99.6% of observations for the posteriors. We can check this by running \(K\)-fold-CV also for negative-binomial model.

(kcvnb <- kfold(stan_glmnb, K=10))


Based on 10-fold cross-validation

           Estimate   SE
elpd_kfold   -886.9 39.4
p_kfold          NA   NA
kfoldic      1773.9 78.7

loo_compare(kcvnb, kcvpr)

           elpd_diff se_diff
stan_glmpr  0.0       0.0   
stan_glmnb -8.6       9.1

When both models are assessed using \(K\)-fold-CV, the difference in predictive performance is very small. The models can still have different predictive distributions.

Now that we’ve seen that based on robust \(K\)-fold-CV there is not much difference between negative-binomial and random-effect-Poisson models, we can also check how bad the comparison would have been with PSIS-LOO.

loo_compare(loonb, loopr)

           elpd_diff se_diff
stan_glmpr    0.0       0.0 
stan_glmnb -243.8      17.5

If we would have ignored Pareto-\(k\) warnings, we would have mistakenly assumed that random effect model is much better. Note that WAIC is (as usual) even worse

loo_compare(waic(stan_glmnb), waic(stan_glmpr))

           elpd_diff se_diff
stan_glmpr    0.0       0.0 
stan_glmnb -309.3      18.3

4.3 Posterior predictive checking

We do posterior predictive checking also for Poisson with “random effects” model.

(prop_zero_test3 <- pp_check(stan_glmpr, plotfun = "stat", stat = "prop_zero"))

The proportion of zeros in posterior predictive simulations from Poisson with “random effects” model is different (less variation) than from negative-binomial model. The models are similar in that way that are modeling over-dispersion, but the model for the over-dispersion is different.

5 Poisson model with “random effects” and integrated LOO

Removing one observation changes the posterior for that random effect so much that importance sampling fails (even with Pareto smoothing). We can get improved stability by integrating the random effect out with something more accurate than the importance sampling (Vehtari et al., 2016). If there is only one group or individual specific parameter then we can integrate that out easily with adaptive quadrature (2 parameteres with 2D adaptive quadrature would be likely to work, too).

As we can easily integrate out only 1 (or 2) parameters (per group / individual), it’s not easy to make a generic approach for rstanrm and brms, and thus I illustrate the approach with direct Stan model code and cmdstanr.

The following Stan model uses individual specific intercept term z[i] (with common prior with scale sigmaz). In the generated quantities, the usual log_lik computation is replaced with integrated approach.

poisson_re_int <- "poisson_re_integrate.stan"
writeLines(readLines(poisson_re_int))

// Poisson regression with hierarchical intercept ("random effect")
functions {
  real integrand(real z, real notused, array[] real theta,
               array[] real Xi, array[] int yi) {
    real sigmaz = theta[1];
    real offsetti_plus_alpha = theta[2];
    int ntheta = num_elements(theta);
    vector[ntheta-2] beta = to_vector(theta[3:ntheta]);
    return exp(normal_lpdf(z|0,sigmaz)+poisson_log_glm_lpmf(yi | to_row_vector(Xi), z+offsetti_plus_alpha, beta));
  }
}
data {
  int<lower=0> N;           // number of data points
  int<lower=0> P;           // number of covariates
  matrix[N,P] X;            // covariates
  array[N] int<lower=0> y;  // target
  vector[N] offsett;        // offset (offset variable name is reserved)
}
parameters {
  real alpha;               // intercept
  vector[P] beta;           // slope
  vector[N] z;              // individual intercept ("random effect")
  real<lower=0> sigmaz;     // prior scale for z
}
model {
  // priors
  alpha ~ normal(0, 3);  
  beta ~ normal(0, 3);    
  z ~ normal(0, sigmaz);  
  sigmaz ~ normal(0, 1);
  // observation model
  y ~ poisson_log_glm(X, z+offsett+alpha, beta); 
}
generated quantities {
  // log_lik for PSIS-LOO
  vector[N] log_lik;
  for (i in 1:N) {
    // z as posterior draws, this would be challenging for PSIS-LOO (and WAIC)
    // log_lik[i] = poisson_log_glm_lpmf({y[i]} | X[i,], z[i]+offsett[i]+alpha, beta);
    // we can integrate each z[i] iut with 1D adaptive quadrature 
    log_lik[i] = log(integrate_1d(integrand,
                              negative_infinity(),
                              positive_infinity(),
                              append_array({sigmaz}, append_array({offsett[i]+alpha}, to_array_1d(beta))),
                  to_array_1d(X[i,]),
                  {y[i]}));
  }
}

Compile the model, prepare the data, and sample.

modpri <- cmdstan_model(stan_file = poisson_re_int)
datap <- list(N=dim(roaches)[1], P=3,
              offsett=log(roaches$exposure2),
              X=roaches[,c('sqrt_roach1','treatment','senior')],
              y=roaches$y)
fitpri <- modpri$sample(data = datap, refresh=0, chains=8, parallel_chains=4)

The posterior is as above for Poisson with random effect.

mcmc_areas(as_draws_matrix(fitpri$draws(variables=c('alpha','beta','sigmaz'))), prob_outer = .999)

At the moment of writing this,cmdstanr is missing the CmdStanMCMC specific loo function

loocmd <- function(fit) {
  loo(fit$draws("log_lik"), r_eff=relative_eff(fit$draws("log_lik")))
}

(loopri <- loocmd(fitpri))


Computed from 8000 by 262 log-likelihood matrix

         Estimate   SE
elpd_loo   -876.1 38.2
p_loo       192.0 16.4
looic      1752.2 76.3
------
Monte Carlo SE of elpd_loo is 0.1.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Now the PSIS-LOO didn’t give warnings and the result is close to K-fold-CV.

Compare to negative binomial model

loo_compare(fitpri=loopri, loonb)

           elpd_diff se_diff
fitpri      0.0       0.0   
stan_glmnb -5.7       7.5

There is not much difference in the predictive performance, but there is a difference in the posterior, and thus further investigation about the difference might be needed.

6 Zero-inflated negative-binomial model

As the proportion of zeros is quite high in the data, it is worthwhile to test also a zero-inflated negative-binomial model, which is a mixture of two models - logistic regression to model the proportion of extra zero counts - negative-binomial model

We switch to brms as rstanarm doesn’t support zero-inflated negative-binomial model

brm_glmznb <- brm(bf(y ~ sqrt_roach1 + treatment + senior + offset(log(exposure2)),
                  zi ~ sqrt_roach1 + treatment + senior + offset(log(exposure2))),
                  family=zero_inflated_negbinomial(), data=roaches,
                  prior=set_prior("normal(0,1)"), seed=170400963, refresh=500)

looznb <- loo(brm_glmznb)
loo_compare(loonb, looznb)

           elpd_diff se_diff
brm_glmznb   0.0       0.0  
stan_glmnb -22.9       6.9

Based on loo, zero-inflated negative-binomial is clearly better.

Posterior predictive checking provides further evidence that zero-inflated negative-binomial is better.

yrepnzb <- posterior_predict(brm_glmznb)
(prop_zero_test4 <- ppc_stat(y=roaches$y, yrepnzb, stat=function(y) mean(y==0)))

Proportion of zeros is similar, but has more variation compared to negative-binomial model. However there is much bigger difference in max count test statistic. We first check the max count PPC for negative-binomial model.

(max_test_nb <- pp_check(stan_glmnb, plotfun = "stat", stat = "max"))

which shows that negative-binomial model is predicting often 10-100 larger roach counts than in the data (40,000 roaches in one trap is a lot).

The max count PPC for zero-inflated negative-binomial model

(max_test_znb <- ppc_stat(y=roaches$y, yrepnzb, stat="max"))

is much better, although still the max counts can be 10 times bigger than the max count in the data.

6.1 Analyse posterior

Plot posterior

mcmc_areas(as.matrix(brm_glmznb)[,1:8], prob_outer = .999)

The marginals for negative-binomial part are similar to marginals in the plain negative-binomial model. The marginal effects for the logistic part have opposite sign as the logistic part is modelling the extra zeros.

6.2 Cross-validation checking

6.3 Predictive relevance of covariates

Let’s finally check cross-validation model comparison to see whether improved model has effect on the predictive performance comparison.

brm_glmm1znb <- brm(bf(y ~ treatment + senior + offset(log(exposure2)),
                  zi ~ treatment + senior + offset(log(exposure2))),
                  family=zero_inflated_negbinomial(), data=roaches,
                  prior=set_prior("normal(0,1)"), seed=170400963, refresh=500)
brm_glmm2znb <- brm(bf(y ~ sqrt_roach1 + senior + offset(log(exposure2)),
                  zi ~ sqrt_roach1 + senior + offset(log(exposure2))),
                  family=zero_inflated_negbinomial(), data=roaches,
                  prior=set_prior("normal(0,1)"), seed=170400963, refresh=500)
brm_glmm3znb <- brm(bf(y ~ sqrt_roach1 + treatment + offset(log(exposure2)),
                  zi ~ sqrt_roach1 + treatment + offset(log(exposure2))),
                  family=zero_inflated_negbinomial(), data=roaches,
                  prior=set_prior("normal(0,1)"), seed=170400963, refresh=500)

loo_compare(loo(brm_glmm1znb),looznb)

             elpd_diff se_diff
brm_glmznb     0.0       0.0  
brm_glmm1znb -57.7      10.1

loo_compare(loo(brm_glmm2znb),looznb)

             elpd_diff se_diff
brm_glmznb    0.0       0.0   
brm_glmm2znb -8.4       4.9

loo_compare(loo(brm_glmm3znb),looznb)

             elpd_diff se_diff
brm_glmm3znb  0.0       0.0   
brm_glmznb   -1.1       2.5

Roaches1 has clear effect. Treatment effect is visible in marginal posterior, but as discussed in betablockers demo, cross-validation is not that good for detecting weak effects.

References

Vehtari, A., Gelman, A. and Gabry, J. (2017) ‘Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC’, Statistics and Computing, 27(5), pp. 1413–1432. doi: 10.1007/s11222-016-9696-4.

Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T. and Winther, O. (2016) ‘Bayesian leave-one-out cross-validation approximations for gaussian latent variable models’, The Journal of Machine Learning Research, 17(1), pp. 3581–3618.

Licenses

Original Computing Environment

sessionInfo()

R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fi_FI.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bayesplot_1.8.1 ggplot2_3.3.5   loo_2.5.1       cmdstanr_0.5.1 
[5] brms_2.16.3     rstanarm_2.21.1 Rcpp_1.0.8     

loaded via a namespace (and not attached):
  [1] minqa_1.2.4             colorspace_2.0-3        ellipsis_0.3.2         
  [4] ggridges_0.5.3          rsconnect_0.8.25        markdown_1.1           
  [7] base64enc_0.1-3         farver_2.1.0            rstan_2.29.1.9000      
 [10] DT_0.20                 fansi_1.0.2             mvtnorm_1.1-3          
 [13] bridgesampling_1.1-2    codetools_0.2-18        splines_4.1.3          
 [16] knitr_1.37              shinythemes_1.2.0       projpred_2.0.2         
 [19] jsonlite_1.8.0          nloptr_1.2.2.3          shiny_1.7.1            
 [22] compiler_4.1.3          backports_1.4.1         assertthat_0.2.1       
 [25] Matrix_1.4-1            fastmap_1.1.0           cli_3.2.0              
 [28] later_1.3.0             htmltools_0.5.2         prettyunits_1.1.1      
 [31] tools_4.1.3             igraph_1.2.11           coda_0.19-4            
 [34] gtable_0.3.0            glue_1.6.2              reshape2_1.4.4         
 [37] dplyr_1.0.8             posterior_1.2.0         V8_4.1.0               
 [40] jquerylib_0.1.4         vctrs_0.3.8             nlme_3.1-157           
 [43] crosstalk_1.2.0         tensorA_0.36.2          xfun_0.30              
 [46] stringr_1.4.0           ps_1.6.0                lme4_1.1-28            
 [49] mime_0.12               miniUI_0.1.1.1          lifecycle_1.0.1        
 [52] gtools_3.9.2            MASS_7.3-56             zoo_1.8-9              
 [55] scales_1.2.0            colourpicker_1.1.1      promises_1.2.0.1       
 [58] Brobdingnag_1.2-7       parallel_4.1.3          inline_0.3.19          
 [61] shinystan_2.5.0         gamm4_0.2-6             yaml_2.3.5             
 [64] curl_4.3.2              gridExtra_2.3           sass_0.4.0             
 [67] StanHeaders_2.29.1.9000 stringi_1.7.6           highr_0.9              
 [70] dygraphs_1.1.1.6        checkmate_2.0.0         boot_1.3-28            
 [73] pkgbuild_1.3.1          rlang_1.0.2             pkgconfig_2.0.3        
 [76] matrixStats_0.61.0      distributional_0.3.0    evaluate_0.15          
 [79] lattice_0.20-45         purrr_0.3.4             rstantools_2.1.1       
 [82] htmlwidgets_1.5.4       labeling_0.4.2          processx_3.5.2         
 [85] tidyselect_1.1.1        plyr_1.8.6              magrittr_2.0.2         
 [88] R6_2.5.1                generics_0.1.2          DBI_1.1.2              
 [91] pillar_1.7.0            withr_2.5.0             mgcv_1.8-38            
 [94] xts_0.12.1              survival_3.2-13         abind_1.4-5            
 [97] tibble_3.1.6            crayon_1.5.0            utf8_1.2.2             
[100] rmarkdown_2.13          grid_4.1.3              data.table_1.14.2      
[103] callr_3.7.0             threejs_0.3.3           digest_0.6.29          
[106] xtable_1.8-4            httpuv_1.6.5            RcppParallel_5.1.5     
[109] stats4_4.1.3            munsell_0.5.0           bslib_0.3.1            
[112] shinyjs_2.1.0

Roaches cross-validation demo

Aki Vehtari

First version 2017-01-10. Last modified 2022-04-24.

Setup

1 Introduction

2 Poisson model

2.1 Analyse posterior

2.2 Cross-validation checking

2.3 Posterior predictive checking

3 Negative binomial model

3.1 Analyse posterior

3.2 Cross-validation checking

3.3 Posterior predictive checking

3.4 Predictive relevance of covariates

4 Poisson model with “random effects”

4.1 Analyse posterior

4.2 Cross-validation checking

4.3 Posterior predictive checking

5 Poisson model with “random effects” and integrated LOO

6 Zero-inflated negative-binomial model

6.1 Analyse posterior

6.2 Cross-validation checking

6.3 Predictive relevance of covariates

References

Licenses

Original Computing Environment