Prior specification for regression models: Reanalysis of a sleep study

Author

Paul Bürkner and Aki Vehtari

Published

2025-10-22

Modified

2026-06-15

This notebook includes the code for the Bayesian Workflow book Chapter 17 Prior specification for regression models: Reanalysis of a sleep study.

1 Introduction

Prior distributions are at the heart of Bayesian statistics and are mentioned as one of its defining features in almost all introductions. Yet, in practice, specifying priors remains a highly challenging and complex topic that tends to cause a lot of confusion for people having to deal with it. In this case study, we clarify some of this confusion by explaining the different purposes of priors and things that should be considered when specifying them.

Load packages

library(rprojroot)
root <- has_file(".Bayesian-Workflow-root")$make_fix_file()
library(ggplot2)
## library(bayesplot)
devtools::load_all("~/proj/bayesplot")
theme_set(bayesplot::theme_default(base_family = "sans"))
library(patchwork)
library(dplyr)
library(loo)
library(brms)
options(mc.cores = 4)
dir.create(root("sleep_study","models/"), showWarnings = FALSE)
BRMS_MODEL_DIR <- root("sleep_study", "models/")
options(future.globals.maxSize = 1e9)
library(priorsense)
options(priorsense.plot_help_text = FALSE)

2 The sleep study data

We analyze the sleepstudy data set (Belenky et al. 2003) that is shipped with the R package lme4 (Bates et al. 2015). The dataset covers 18 people undergoing sleep deprivation (less than 3 hours of sleep per night) for 7 consecutive nights, with their average reaction times in milliseconds in a simple experiment.

Reasons for choosing the sleepstudy data set:

Few variables all of which are easy to understand
easy yet important multilevel structure
sensible to express with both linear and generalized linear models
non-trivial error distributions
independent priors are sensible(ish) due to the small number of parameters
well known to a lot of R users

Days 0-1 were adaptation and training (T1/T2), day 2 was baseline (B); sleep deprivation started after day 2. We Drop days 0-1, and make the baseline to be new 0.

data("sleepstudy", package = "lme4")
conditions <- make_conditions(sleepstudy, "Subject", incl_vars = FALSE)
sleepstudy <- sleepstudy |>
  filter(Days >= 2) |>
  mutate(Days = Days - 2)

Plot the data

sleepstudy |>
  ggplot(aes(Days, Reaction)) + 
  geom_point() +
  facet_wrap("Subject", ncol = 6) +
  scale_x_continuous(breaks = 0:7) +
  labs(y = "Reaction time (ms)")

3 Simple linear model

Prior base

prior_lin_base <- prior(normal(200, 100), class = b, coef = "Intercept") +
  prior(normal(0, 20), class = b, coef = "Days") +
  prior(exponential(0.02), class = sigma)

Model base and sample from the posterior

fit_lin_base <- brm(
  Reaction ~ 0 + Intercept + Days, 
  data = sleepstudy,
  family = gaussian(),
  prior = prior_lin_base, 
  file = paste0(BRMS_MODEL_DIR, "fit_lin_base"),
  file_refit = "on_change"
)

Posterior summary

summary(fit_lin_base, priors = TRUE)

 Family: gaussian 
  Links: mu = identity 
Formula: Reaction ~ 0 + Intercept + Days 
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Priors:
b_Days ~ normal(0, 20)
b_Intercept ~ normal(200, 100)
<lower=0> sigma ~ exponential(0.02)

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   267.85      7.77   253.00   283.43 1.00     1776     1927
Days         11.42      1.85     7.73    15.00 1.00     1798     1821

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    51.11      3.09    45.54    57.66 1.00     2296     2140

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit_lin_base), points = TRUE, plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

4 Simple linear model (centered predictors)

Points to discuss:

priors on original or centered intercept?
dependency of the prior on marginal moments of the data?
different qualitative options for priors on b and sigma

Prior 1

prior1 <- prior(normal(250, 100), class = Intercept) +
  prior(normal(0, 20), class = b) +
  prior(exponential(0.02), class = sigma)

Sample from the prior 1

fit1_prior <- brm(
  Reaction ~ 1 + Days, 
  data = sleepstudy,
  family = gaussian(),
  prior = prior1, 
  sample_prior = "only",
  file = paste0(BRMS_MODEL_DIR, "fit1_prior"),
  file_refit = "on_change"
)

Prior predictive checking

set.seed(652312)
pp_check(fit1_prior, ndraws = 100) + 
  ylim(c(0, 0.02)) +
  theme_sub_axis_y(line = element_blank())

Model 1: sample from the posterior

fit1 <- brm(
  Reaction ~ 1 + Days, 
  data = sleepstudy,
  family = gaussian(),
  prior = prior1, 
  file = paste0(BRMS_MODEL_DIR, "fit1"),
  file_refit = "on_change"
)

Posterior summary

summary(fit1, priors = TRUE)

 Family: gaussian 
  Links: mu = identity 
Formula: Reaction ~ 1 + Days 
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Priors:
b ~ normal(0, 20)
Intercept ~ normal(250, 100)
<lower=0> sigma ~ exponential(0.02)

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   268.27      7.74   253.54   283.15 1.00     3793     2868
Days         11.34      1.85     7.71    14.96 1.00     3831     2749

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    51.14      3.02    45.83    57.55 1.00     4026     2586

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit1), points = TRUE, plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

Posterior predictive checking

pp_check(fit1, ndraws = 50) +
  theme_sub_axis_y(line = element_blank())

Prior sensitivity analysis

powerscale_plot_dens(fit1, variable = c("b_Intercept", "b_Days", "sigma"),
                     component = "prior")

5 Simple linear model (informative priors)

Points to discuss:

Priors will be influencing the posterior if chosen to be informative enough
For models that are simple relative to the amount of data, prior distributions are unlikely to affect the posterior strongly, unless prior are very informative

Prior 2

prior2 <- prior(normal(250, 100), class = Intercept) +
  prior(normal(0, 1), class = b) +
  prior(exponential(0.02), class = sigma)

Model 2: sample from the posterior

fit2 <- brm(
  Reaction ~ 1 + Days, 
  data = sleepstudy,
  family = gaussian(),
  prior = prior2, 
  file = paste0(BRMS_MODEL_DIR, "fit2"),
  file_refit = "on_change"
)

Posterior summary

summary(fit2)

 Family: gaussian 
  Links: mu = identity 
Formula: Reaction ~ 1 + Days 
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   299.79      5.59   288.80   310.64 1.00     3826     2899
Days          2.28      0.93     0.41     4.12 1.00     3596     2903

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    55.14      3.38    48.96    62.21 1.00     3390     2904

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit2), points = TRUE, plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

Prior sensitivity analysis

powerscale_plot_dens(fit2, variable = c("b_Intercept", "b_Days", "sigma"),
                     component = "prior")

6 Simple linear model (informative priors with fat tails)

Points to discuss: - tails of the priors (normal vs. student-t)

Prior 2b

prior2b <- prior(normal(250, 100), class = Intercept) +
  prior(student_t(7, 0, 1), class = b) +
  prior(exponential(0.02), class = sigma)

Model 2b: sample from the posterior

fit2b <- brm(
  Reaction ~ 1 + Days, 
  data = sleepstudy,
  family = gaussian(),
  prior = prior2b, 
  file = paste0(BRMS_MODEL_DIR, "fit2b"),
  file_refit = "on_change"
)

Posterior summary

summary(fit2b)

 Family: gaussian 
  Links: mu = identity 
Formula: Reaction ~ 1 + Days 
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   279.50      9.48   261.69   299.47 1.00     2550     2288
Days          8.10      2.42     2.91    12.52 1.00     2674     2135

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    51.77      3.26    45.95    58.62 1.00     2695     2311

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit2b), points = TRUE, plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

Prior sensitivity analysis

powerscale_plot_dens(fit2b, variable = c("b_Intercept", "b_Days", "sigma"),
                     component = "prior")

Illustrate difference between normal and Student-t prior:

x <- seq(-4, 4, 0.01)
d1 <- dnorm(x, 0, 1)
d2 <- dstudent_t(x, 7, 0, 1)
data.frame(
  d = c(d1, d2), 
  x = rep(x, 2), 
  Prior = rep(c("normal(0, 1)", "Student-t(7, 0, 1)"), each = length(x))
) |>
  ggplot(aes(x, d, color = Prior)) +
  geom_line(size = 0.8) +
  xlab(expression(b[1])) +
  ylab("Density")

Compute CI-bound for an exponential prior:

qexp(c(0.025, 0.975), 0.02)

[1]   1.26589 184.44397

7 Linear varying intercept model

Points to discuss:

How to represent unidimensional multilevel structures via priors
priors on hyperparameters (SDs)
shall the prior on sigma change now that we add more terms?

Prior 3

prior3 <- prior(normal(250, 100), class = Intercept) +
  prior(normal(0, 20), class = b) +
  prior(exponential(0.02), class = sigma) +
  prior(exponential(0.02), class = sd)

Model 3: sample from the posterior

fit3 <- brm(
  Reaction ~ 1 + Days + (1 | Subject), 
  data = sleepstudy,
  family = gaussian(),
  prior = prior3, 
  file = paste0(BRMS_MODEL_DIR, "fit3"),
  file_refit = "on_change"
)

Posterior summary

summary(fit3)

 Family: gaussian 
  Links: mu = identity 
Formula: Reaction ~ 1 + Days + (1 | Subject) 
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Subject (Number of levels: 18) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)    43.47      8.25    30.21    62.22 1.01      771     1322

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   267.93     10.78   246.92   289.29 1.01      715     1131
Days         11.40      1.13     9.18    13.63 1.00     4452     2833

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    30.47      1.96    26.94    34.64 1.00     3505     2734

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit3), plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

Posterior conditional effects

plot(conditional_effects(fit3, conditions = conditions, re_formula = NULL),
     ncol = 6, points = TRUE, plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

8 Linear varying intercept and slope model

Points to discuss:

How to represent multidimensional multilevel structures via priors
priors on hyperparameters (SDs and correlations)
The implications of the LKJ prior for correlation matrices

Prior 4

prior4 <- prior(normal(250, 100), class = Intercept) +
  prior(normal(0, 20), class = b) +
  prior(exponential(0.04), class = sigma) +
  prior(exponential(0.04), class = sd, group = Subject, coef = Intercept) +
  prior("exponential(1.0/15)", class = sd, group = Subject, coef = Days) +
  prior(lkj(1), class = cor)

Model 4: sample from the posterior

fit4 <- brm(
  Reaction ~ 1 + Days + (1 + Days | Subject), 
  data = sleepstudy,
  family = gaussian(),
  prior = prior4, 
  save_pars = save_pars(all = TRUE)
)

Posterior summary

summary(fit4)

 Family: gaussian 
  Links: mu = identity 
Formula: Reaction ~ 1 + Days + (1 + Days | Subject) 
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Subject (Number of levels: 18) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)          31.72      7.33    19.36    47.62 1.00     1683     2180
sd(Days)                7.20      1.79     4.18    11.15 1.00     1697     2316
cor(Intercept,Days)     0.21      0.29    -0.35     0.77 1.00     1524     1638

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   267.82      8.62   250.77   285.26 1.00     1564     2352
Days         11.28      1.97     7.40    15.24 1.00     1771     2356

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    25.90      1.80    22.56    29.72 1.00     2791     2798

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit4), plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

Posterior predictive checking

pp_check(fit4) +
  theme_sub_axis_y(line = element_blank())

Posterior conditional effects

plot(conditional_effects(fit4, conditions = conditions, re_formula = NULL),
     ncol = 6, points = TRUE, plot = FALSE)[[1]] +
  scale_x_continuous(breaks = 0:9) +
  labs(y = "Reaction time (ms)")

Illustrate the marginal LKJ(1) prior for different dimensions.

dmLKJ <- function(x, eta, d) {
  dbeta((x + 1) / 2, eta + (d - 2)/2, eta + (d - 2)/2)
}
data.frame(x = rep(seq(-0.999, 0.999, 0.001), 3)) |>
  mutate(
    d = rep(c(2, 5, 10), each = n()/3),
    dens = dmLKJ(x, 1, d),
    d = factor(d)
  ) |> 
  ggplot(aes(x, dens, color = d)) +
  geom_line(size = 0.8) +
  scale_color_viridis_d() +
  ylab("Density") +
  xlab(expression(rho))

8.1 Log-Linear prior-only model

Reuse priors from the normal linear model: Prior ln1

prior_ln1 <- prior(normal(250, 100), class = Intercept) +
  prior(normal(0, 20), class = b) +
  prior(exponential(0.02), class = sigma)

Sample from the prior ln1

fit_ln1_prior <- brm(
  Reaction ~ 1 + Days, 
  data = sleepstudy,
  family = lognormal(),
  prior = prior_ln1, 
  sample_prior = "only",
  file = paste0(BRMS_MODEL_DIR, "fit_ln1_prior"),
  file_refit = "on_change"
)

Prior predictive checking

pp_check(fit_ln1_prior) +
  theme_sub_axis_y(line = element_blank())

Prior predictive checking

set.seed(652312)
prp_ln <- apply(posterior_predict(fit_ln1_prior), 2, function(x) mean(x[is.finite(x)]))
prp_ln_dat <- data.frame(y = sleepstudy$Reaction, yrep = prp_ln)
gg_ln1_prior <- ggplot(prp_ln_dat, aes(y, yrep)) +
  geom_point() +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  ylab("Mean yrep")

Use more sensible priors: Prior ln2

prior_ln2 <- prior(normal(5, 0.55), class = Intercept) +
  prior(normal(0, 0.2), class = b) +
  prior(exponential(3), class = sigma)

Sample from the prior ln2

fit_ln2_prior <- brm(
  Reaction ~ 1 + Days, 
  data = sleepstudy,
  family = lognormal(),
  prior = prior_ln2, 
  sample_prior = "only",
  file = paste0(BRMS_MODEL_DIR, "fit_ln2_prior"),
  file_refit = "on_change"
)

Prior predictive checking

pp_check(fit_ln2_prior) +
  theme_sub_axis_y(line = element_blank())

Prior predictive checking

set.seed(652312)
prp_ln <- apply(posterior_predict(fit_ln2_prior), 2, function(x) mean(x[is.finite(x)]))
prp_ln_dat <- data.frame(y = sleepstudy$Reaction, yrep = prp_ln)
gg_ln2_prior <- ggplot(prp_ln_dat, aes(y, yrep)) +
  geom_point() +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  ylab("Mean yrep")

Prior predictive checking comparing priors ln1 and ln2

gg_ln1_prior + gg_ln2_prior

Sample from the posterior

fit_ln2 <- brm(
  Reaction ~ 1 + Days, 
  data = sleepstudy,
  family = lognormal(),
  prior = prior_ln2, 
  file = paste0(BRMS_MODEL_DIR, "fit_ln2"),
  file_refit = "on_change"
)

Posterior summary

summary(fit_ln2)

 Family: lognormal 
  Links: mu = identity 
Formula: Reaction ~ 1 + Days 
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     5.59      0.02     5.54     5.64 1.00     4603     3347
Days          0.04      0.01     0.02     0.05 1.00     4869     3163

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.17      0.01     0.15     0.19 1.00     3691     2666

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit_ln2), plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

Prior sensitivity analysis

powerscale_plot_dens(fit_ln2, variable = c("b_Intercept", "b_Days", "sigma"),
                     component = "prior")

9 Log-linear varying intercept and slope model

Points to discuss:

positive only family may be preferred theoretically but may not always be required
How a non-identity link (log in this case) messes with our intuition about parameters and hence with prior specification
how Jensen’s inequality makes direct translations of prior difficult
how Jacobian adjustment would be needed to ensure equivalence of two prior on different scales. Prior 5

prior5 <- prior(normal(5, 0.55), class = Intercept) +
  prior(normal(0, 0.2), class = b) +
  prior(exponential(6), class = sigma) +
  prior(exponential(6), class = sd, group = Subject, coef = Intercept) +
  prior(exponential(10), class = sd, group = Subject, coef = Days) +
  prior(lkj(1), class = cor)

Model5: sample from the posterior

fit5 <- brm(
  Reaction ~ 1 + Days + (1 + Days | Subject), 
  data = sleepstudy,
  family = lognormal(),
  prior = prior5, 
  file = paste0(BRMS_MODEL_DIR, "fit5"),
  file_refit = "on_change"
)

Posterior summary

summary(fit5)

 Family: lognormal 
  Links: mu = identity 
Formula: Reaction ~ 1 + Days + (1 + Days | Subject) 
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Subject (Number of levels: 18) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)           0.13      0.03     0.09     0.19 1.00     1493     2503
sd(Days)                0.02      0.01     0.01     0.04 1.00     1375     2185
cor(Intercept,Days)     0.03      0.28    -0.48     0.59 1.00     1184     1985

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     5.59      0.03     5.52     5.65 1.00     1223     2108
Days          0.04      0.01     0.02     0.05 1.00     1762     2279

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.08      0.01     0.07     0.09 1.00     3039     2749

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit5), plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

Posterior predictive checking

pp_check(fit5) +
  theme_sub_axis_y(line = element_blank())

Posterior predictive checking comparing fit4 and fit5

pp_check(fit4, type = "intervals") + labs(y = "Reaction time (ms)") +
  pp_check(fit5, type = "intervals") + 
  plot_layout(guides = "collect")

Posterior conditional effects

plot(conditional_effects(fit5, conditions = conditions, re_formula = NULL),
     ncol = 6, points = TRUE, plot = FALSE)[[1]] +
  scale_x_continuous(breaks = 0:9) +
  labs(y = "Reaction time (ms)")

Prior sensitivity analysis

vars5 <- c("b_Intercept", "b_Days", "sd_Subject__Intercept", 
           "sd_Subject__Days", "cor_Subject__Intercept__Days", "sigma")
powerscale_plot_dens(fit5, variable = vars5, component = "prior")

Compare with linear multilevel model which indicated the lognormal model having a little better predictive performance.

loo(fit4, fit5)

Output of model 'fit4':

Computed from 4000 by 144 log-likelihood matrix.

         Estimate   SE
elpd_loo   -692.0 21.7
p_loo        31.3  8.6
looic      1383.9 43.3
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.3]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     141   97.9%   556     
   (0.7, 1]   (bad)        2    1.4%   <NA>    
   (1, Inf)   (very bad)   1    0.7%   <NA>    
See help('pareto-k-diagnostic') for details.

Output of model 'fit5':

Computed from 4000 by 144 log-likelihood matrix.

         Estimate   SE
elpd_loo   -680.4 19.0
p_loo        31.5  7.7
looic      1360.8 38.0
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.4]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     141   97.9%   218     
   (0.7, 1]   (bad)        2    1.4%   <NA>    
   (1, Inf)   (very bad)   1    0.7%   <NA>    
See help('pareto-k-diagnostic') for details.

Model comparisons:
 model elpd_diff se_diff p_worse diag_diff      diag_elpd
  fit5       0.0     0.0      NA           3 k_psis > 0.7
  fit4     -11.6     4.7    0.99           3 k_psis > 0.7

10 log-linear distributional multilevel model

Points to discuss:

Parameters for standard deviations on the log or log-log scale or hard to understand and hence set priors on.
This is specifically true for standard deviations of standard deviations on the log or log-log scale.
In theory, we would not need the exponential link on sigma but then we had to care for the positivity of the varying intercepts on sigma and hence would have to specify, for example, an hierarchical Gamma rather than a hierarchical normal prior.
Look at prior predictions for the correlations to demonstrate the effect of the LKJ prior for larger than 2x2 matrices

Prior 6

prior6 <- prior(normal(5.5, 0.55), class = Intercept) +
  prior(normal(0, 0.2), class = b) +
  prior(normal(0, 0.3), class = Intercept, dpar = sigma) +
  prior(exponential(3), class = sd, group = Subject, coef = Intercept) +
  prior(exponential(5), class = sd, group = Subject, coef = Days) +
  prior(exponential(3), class = sd, dpar = sigma, group = Subject) +
  prior(lkj(1), class = cor)

Model 6: sample from the posterior

fit6 <- brm(
  bf(Reaction ~ 1 + Days + (1 + Days |S| Subject),
     sigma ~ 1 + (1 |S| Subject)),
  data = sleepstudy,
  family = lognormal(),
  prior = prior6, 
  file = paste0(BRMS_MODEL_DIR, "fit6"),
  file_refit = "on_change"
)

Posterior summary

summary(fit6)

 Family: lognormal 
  Links: mu = identity; sigma = log 
Formula: Reaction ~ 1 + Days + (1 + Days | S | Subject) 
         sigma ~ 1 + (1 | S | Subject)
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Subject (Number of levels: 18) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                      0.16      0.04     0.10     0.26 1.00
sd(Days)                           0.03      0.01     0.02     0.05 1.01
sd(sigma_Intercept)                1.88      0.40     1.18     2.78 1.01
cor(Intercept,Days)               -0.01      0.28    -0.53     0.55 1.00
cor(Intercept,sigma_Intercept)     0.25      0.38    -0.51     0.84 1.01
cor(Days,sigma_Intercept)          0.26      0.41    -0.60     0.88 1.01
                               Bulk_ESS Tail_ESS
sd(Intercept)                      1048     1386
sd(Days)                            917     1389
sd(sigma_Intercept)                1200     1801
cor(Intercept,Days)                1039     1181
cor(Intercept,sigma_Intercept)      400     1049
cor(Days,sigma_Intercept)           370      675

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           5.63      0.08     5.48     5.80 1.01      394     1021
sigma_Intercept    -0.94      0.36    -1.66    -0.28 1.00     1939     2114
Days                0.04      0.02     0.01     0.08 1.01      382      861

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit6, conditions = conditions, re_formula = NULL),
     ncol = 6, points = TRUE, plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

11 Exgaussian distributional multilevel model

Points to discuss:

All the models before are relatively robust to the choice of priors: Even completely flat priors work ok and don’t produce much different results because the models are parsimonious relative to the amount of data and quality
This robustness is no longer the case for the upcoming models
Exgaussian is linear yet can handle strong right skewness at the expense of not strictly respecting the lower boundary of zero
Avoids the log-log awkwardness of the lognormal and related models
Interpretation of the beta (skewness) parameter at least somewhat intuitive

Prior 7

prior7 <- prior(normal(250, 100), class = Intercept) +
  prior(normal(0, 20), class = b) +
  prior(normal(0, 5), class = Intercept, dpar = sigma) +
  prior(normal(0, 5), class = beta) +
  prior(exponential(0.04), class = sd, group = Subject, coef = Intercept) +
  prior(exponential(0.05), class = sd, group = Subject, coef = Days) +
  prior(exponential(0.2), class = sd, dpar = sigma, group = Subject) +
  prior(lkj(1), class = cor)

Model 7: sample from the posterior

fit7 <- brm(
  bf(Reaction ~ 1 + Days + (1 + Days |S| Subject),
     sigma ~ 1 + (1 |S| Subject)), 
  data = sleepstudy,
  family = exgaussian(),
  prior = prior7, 
  inits = 0,
  cores = 4,
  file = paste0(BRMS_MODEL_DIR, "fit7"),
  file_refit = "on_change"
)

Posterior summary for model 7

summary(fit7)

 Family: exgaussian 
  Links: mu = identity; sigma = log 
Formula: Reaction ~ 1 + Days + (1 + Days | S | Subject) 
         sigma ~ 1 + (1 | S | Subject)
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Subject (Number of levels: 18) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                    111.77    110.74     0.16   229.32 3.17
sd(Days)                          27.87     14.89     9.11    45.26 3.94
sd(sigma_Intercept)                3.97      3.33     0.81     9.58 3.89
cor(Intercept,Days)                0.34      0.47    -0.34     0.81 3.36
cor(Intercept,sigma_Intercept)    -0.21      0.52    -0.67     0.68 3.24
cor(Days,sigma_Intercept)         -0.23      0.22    -0.54     0.11 3.42
                               Bulk_ESS Tail_ESS
sd(Intercept)                         4       12
sd(Days)                              4       11
sd(sigma_Intercept)                   4       14
cor(Intercept,Days)                   4       13
cor(Intercept,sigma_Intercept)        4       14
cor(Days,sigma_Intercept)             4       11

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           2.59      3.73    -3.19     7.07 3.25        4       17
sigma_Intercept     1.69      1.93    -1.44     3.40 3.54        4       11
Days                0.01      0.88    -1.46     0.72 2.96        5       12

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
beta    33.63     32.07     1.50    69.73 3.04        5       11

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit7, conditions = conditions, re_formula = NULL),
     ncol = 6, points = TRUE, plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

12 Exgaussian distributional multilevel model (default priors)

Points to discuss:

some parameters are better informed by the data than others
using our custom priors and default priors barely matters for most except for the skewness parameter where the effect is clearly visible.
default priors are already chosen in an effort to be sensible(ish)

Model 8: sample from the posterior

fit8 <- brm(
  bf(Reaction ~ 1 + Days + (1 + Days |S| Subject),
     sigma ~ 1 + (1 |S| Subject)), 
  data = sleepstudy,
  family = exgaussian(),
  # no prior argument: default priors are used
  inits = 0,
  cores = 4,
  file = paste0(BRMS_MODEL_DIR, "fit8"),
  file_refit = "on_change"
)

Posterior summary

summary(fit8)

 Family: exgaussian 
  Links: mu = identity; sigma = log 
Formula: Reaction ~ 1 + Days + (1 + Days | S | Subject) 
         sigma ~ 1 + (1 | S | Subject)
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Subject (Number of levels: 18) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                     61.82     41.60    27.73   133.00 2.56
sd(Days)                          15.80     14.68     5.90    41.17 2.45
sd(sigma_Intercept)                0.85      0.19     0.60     1.19 2.01
cor(Intercept,Days)                0.30      0.41    -0.26     0.94 2.42
cor(Intercept,sigma_Intercept)     0.54      0.30    -0.10     0.85 2.16
cor(Days,sigma_Intercept)          0.50      0.19     0.06     0.71 1.72
                               Bulk_ESS Tail_ESS
sd(Intercept)                         5       28
sd(Days)                              6       25
sd(sigma_Intercept)                   7       30
cor(Intercept,Days)                   6       21
cor(Intercept,sigma_Intercept)        5       22
cor(Days,sigma_Intercept)             7       32

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         242.75     47.68   160.72   276.84 2.64        6       26
sigma_Intercept     2.70      0.41     2.10     3.18 2.84        5       14
Days                3.90     15.34   -22.53    14.45 2.59        5       11

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
beta     2.57      1.86     1.45     7.99 2.81        5       13

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit8, conditions = conditions, re_formula = NULL),
     ncol = 6, points = TRUE, plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

13 Exgaussian distributional multilevel model (flat priors)

Points to discuss:

without reasonable(ish) priors, sampling may fall apart with some really long-running chains, divergent transitions and increased Rhats.
priors matter, but if you don’t specify quite informative priors or have small data relative to the model complexity, you will likely not see a difference compared to sensible default priors

Extract all default priors:

prior9 <- get_prior(
  bf(Reaction ~ 1 + Days + (1 + Days |S| Subject),
     sigma ~ 1 + (1 |S| Subject)), 
  data = sleepstudy,
  family = exgaussian()
)

Make all priors flat (except for the varying coefficients’ priors):

prior9$prior <- ""

Model 9: sample from the posterior

fit9 <- brm(
  bf(Reaction ~ 1 + Days + (1 + Days |S| Subject),
     sigma ~ 1 + (1 |S| Subject)), 
  data = sleepstudy,
  family = exgaussian(),
  prior = prior9, 
  inits = 0,
  cores = 4,
  file = paste0(BRMS_MODEL_DIR, "fit9"),
  file_refit = "on_change"
)

Posterior summary

summary(fit9)

 Family: exgaussian 
  Links: mu = identity; sigma = log 
Formula: Reaction ~ 1 + Days + (1 + Days | S | Subject) 
         sigma ~ 1 + (1 | S | Subject)
   Data: sleepstudy (Number of observations: 144) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Subject (Number of levels: 18) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                    203.92    111.77    29.13   335.48 3.28
sd(Days)                           9.08      5.84     4.00    19.03 3.43
sd(sigma_Intercept)                1.50      1.69     0.44     4.55 3.41
cor(Intercept,Days)                0.34      0.51    -0.25     0.92 2.61
cor(Intercept,sigma_Intercept)    -0.06      0.23    -0.41     0.27 3.06
cor(Days,sigma_Intercept)          0.26      0.19    -0.03     0.56 2.71
                               Bulk_ESS Tail_ESS
sd(Intercept)                         5       28
sd(Days)                              4       11
sd(sigma_Intercept)                   4       18
cor(Intercept,Days)                   5       12
cor(Intercept,sigma_Intercept)        5       13
cor(Days,sigma_Intercept)             5       13

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          54.37    120.81   -55.11   258.42 3.57        4       13
sigma_Intercept     1.82      1.45    -0.70     2.92 2.95        5       20
Days                4.75      7.10    -5.27    13.43 3.34        4       12

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
beta    35.48     57.86     1.62   145.79 3.05        5       12

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior conditional effects

plot(conditional_effects(fit9, conditions = conditions, re_formula = NULL),
     ncol = 6, points = TRUE, plot = FALSE)[[1]] +
  labs(y = "Reaction time (ms)")

14 Model comparison and checking

Compute LOO-CV for models 3, 4, and 5

fit3 <- add_criterion(fit3, criterion = "loo", save_psis = TRUE, reloo = TRUE)
fit4 <- add_criterion(fit4, criterion = "loo", save_psis = TRUE, reloo = TRUE)
fit5 <- add_criterion(fit5, criterion = "loo", save_psis = TRUE, reloo = TRUE)

Compare varying intercept and slope models with normal and log-normal data models

loo_compare(fit4, fit5)

 model elpd_diff se_diff p_worse diag_diff diag_elpd
  fit5       0.0     0.0      NA                    
  fit4     -12.2     5.0    0.99

Log-normal is not different from normal model, which makes sense as all reaction times are fra from 0.

Posterior predictive checking of fit4 using LOO predictive intervals

pp_check(fit4, type = "loo_intervals") +
  labs(y = "Reaction time (ms)")

There are clearly some outliers.

Create varying intercept (fit3t) and varying intercept and slope (fit4t) models with Student’s t data model

fit3t <- update(fit3, family = student())
fit4t <- update(fit4, family = student())

Compute LOO-CV for models 3, 4, and 5

fit3t <- add_criterion(fit3t, criterion = "loo", save_psis = TRUE)
fit4t <- add_criterion(fit4t, criterion = "loo", save_psis = TRUE)

Posterior predictive checking of fit4t using LOO predictive intervals

pp_check(fit4t, type = "loo_intervals") +
  labs(y = "Reaction time (ms)")

The LOO predictive intervals look better now.

theme_set(bayesplot::theme_default(base_family = "sans", base_size = 14))
pp_check(fit4, type = "loo_pit_ecdf", method = "correlated", moment_match = TRUE) +
  pp_check(fit4t, type = "loo_pit_ecdf", method = "correlated")

Looking at the LOO-PIT plots, we see that normal and log-normal models have too wide predictive distribution for most observations, which is due to a few outliers inflating the residual scale. LOO-PIT plot for Student’s t model looks better. Compare normal and Student’s t models

loo_compare(fit4, fit4t)

 model elpd_diff se_diff p_worse diag_diff diag_elpd
 fit4t       0.0     0.0      NA                    
  fit4     -40.8    13.7    1.00

Student’s t model has much better predictive performance.

Examine how much adding varying slope improved predictive performance in case of normal data model:

loo_compare(fit3, fit4)

 model elpd_diff se_diff p_worse diag_diff diag_elpd
  fit4       0.0     0.0      NA                    
  fit3     -13.6    10.3    0.91

Examine how much adding varying slope improved predictive performance in case of Student’s t data model:

loo_compare(fit3t, fit4t)

 model elpd_diff se_diff p_worse diag_diff diag_elpd
 fit4t       0.0     0.0      NA                    
 fit3t     -45.6     8.6    1.00

When using Student’s t model, the predictive performance difference between not using or using varying slope is bigger.

References

Bates, D., M. Maechler, B. Bolker, and S. Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software 67: 1–48.

Belenky, G., N. J. Wesensten, D. R. Thorne, M. L. Thomas, H. C. Sing, D. P. Redmond, M. B. Russo, and T. J. Balkin. 2003. “Patterns of Performance Degradation and Restoration During Sleep Restriction and Subsequent Recovery: A Sleep Dose-Response Study.” Journal of Sleep Research 12: 1–12.

1 Introduction

2 The sleep study data

3 Simple linear model

4 Simple linear model (centered predictors)

5 Simple linear model (informative priors)

6 Simple linear model (informative priors with fat tails)

7 Linear varying intercept model

8 Linear varying intercept and slope model

8.1 Log-Linear prior-only model

9 Log-linear varying intercept and slope model

10 log-linear distributional multilevel model

11 Exgaussian distributional multilevel model

12 Exgaussian distributional multilevel model (default priors)

13 Exgaussian distributional multilevel model (flat priors)

14 Model comparison and checking

References

Licenses