**Load packages**

```
library(rstanarm)
library(brms)
library(cmdstanr)
options(mc.cores = parallel::detectCores())
library(loo)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans"))
```

This notebook demonstrates cross-validation of simple misspecified model. In this case, cross-validation is useful to detect misspecification.

The example comes from Chapter 8.3 of Gelman and Hill (2007) and the introduction text for the data is from Estimating Generalized Linear Models for Count Data with rstanarm by Jonah Gabry and Ben Goodrich.

We want to make inferences about the efficacy of a certain pest management system at reducing the number of roaches in urban apartments. Here is how Gelman and Hill describe the experiment (pg. 161):

the treatment and control were applied to 160 and 104 apartments, respectively, and the outcome measurement \(y_i\) in each apartment \(i\) was the number of roaches caught in a set of traps. Different apartments had traps for different numbers of days

In addition to an intercept, the regression predictors for the model are the pre-treatment number of roaches `roach1`

, the treatment indicator `treatment`

, and a variable indicating whether the apartment is in a building restricted to elderly residents `senior`

. Because the number of days for which the roach traps were used is not the same for all apartments in the sample, we include it as an `exposure2`

by adding \(\ln(u_i)\)) to the linear predictor \(\eta_i\) and it can be specified using the `offset`

argument to `stan_glm`

.

Load data

```
data(roaches)
# Roach1 is very skewed and we take a square root
roaches$sqrt_roach1 <- sqrt(roaches$roach1)
```

Fit with stan_glm

```
stan_glmp <- stan_glm(y ~ sqrt_roach1 + treatment + senior, offset = log(exposure2),
data = roaches, family = poisson,
prior = normal(0,2.5), prior_intercept = normal(0,5),
chains = 4, cores = 1, seed = 170400963, refresh=0)
```

Plot posterior

`mcmc_areas(as.matrix(stan_glmp), prob_outer = .999)`