library(rprojroot)
root <- has_file(".Bayesian-Workflow-root")$make_fix_file()
library(ggplot2)
## library(bayesplot)
devtools::load_all("~/proj/bayesplot")
theme_set(bayesplot::theme_default(base_family = "sans"))
library(patchwork)
library(dplyr)
library(loo)
library(brms)
options(mc.cores = 4)
dir.create(root("sleep_study","models/"), showWarnings = FALSE)
BRMS_MODEL_DIR <- root("sleep_study", "models/")
options(future.globals.maxSize = 1e9)
library(priorsense)
options(priorsense.plot_help_text = FALSE)Prior specification for regression models: Reanalysis of a sleep study
This notebook includes the code for the Bayesian Workflow book Chapter 17 Prior specification for regression models: Reanalysis of a sleep study.
1 Introduction
Prior distributions are at the heart of Bayesian statistics and are mentioned as one of its defining features in almost all introductions. Yet, in practice, specifying priors remains a highly challenging and complex topic that tends to cause a lot of confusion for people having to deal with it. In this case study, we clarify some of this confusion by explaining the different purposes of priors and things that should be considered when specifying them.
Load packages
2 The sleep study data
We analyze the sleepstudy data set (Belenky et al. 2003) that is shipped with the R package lme4 (Bates et al. 2015). The dataset covers 18 people undergoing sleep deprivation (less than 3 hours of sleep per night) for 7 consecutive nights, with their average reaction times in milliseconds in a simple experiment.
Reasons for choosing the sleepstudy data set:
- Few variables all of which are easy to understand
- easy yet important multilevel structure
- sensible to express with both linear and generalized linear models
- non-trivial error distributions
- independent priors are sensible(ish) due to the small number of parameters
- well known to a lot of R users
Days 0-1 were adaptation and training (T1/T2), day 2 was baseline (B); sleep deprivation started after day 2. We Drop days 0-1, and make the baseline to be new 0.
data("sleepstudy", package = "lme4")
conditions <- make_conditions(sleepstudy, "Subject", incl_vars = FALSE)
sleepstudy <- sleepstudy |>
filter(Days >= 2) |>
mutate(Days = Days - 2)Plot the data
sleepstudy |>
ggplot(aes(Days, Reaction)) +
geom_point() +
facet_wrap("Subject", ncol = 6) +
scale_x_continuous(breaks = 0:7) +
labs(y = "Reaction time (ms)")3 Simple linear model
Prior base
prior_lin_base <- prior(normal(200, 100), class = b, coef = "Intercept") +
prior(normal(0, 20), class = b, coef = "Days") +
prior(exponential(0.02), class = sigma)Model base and sample from the posterior
fit_lin_base <- brm(
Reaction ~ 0 + Intercept + Days,
data = sleepstudy,
family = gaussian(),
prior = prior_lin_base,
file = paste0(BRMS_MODEL_DIR, "fit_lin_base"),
file_refit = "on_change"
) Posterior summary
summary(fit_lin_base, priors = TRUE) Family: gaussian
Links: mu = identity
Formula: Reaction ~ 0 + Intercept + Days
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Priors:
b_Days ~ normal(0, 20)
b_Intercept ~ normal(200, 100)
<lower=0> sigma ~ exponential(0.02)
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 267.85 7.77 253.00 283.43 1.00 1776 1927
Days 11.42 1.85 7.73 15.00 1.00 1798 1821
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 51.11 3.09 45.54 57.66 1.00 2296 2140
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit_lin_base), points = TRUE, plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")4 Simple linear model (centered predictors)
Points to discuss:
- priors on original or centered intercept?
- dependency of the prior on marginal moments of the data?
- different qualitative options for priors on b and sigma
Prior 1
prior1 <- prior(normal(250, 100), class = Intercept) +
prior(normal(0, 20), class = b) +
prior(exponential(0.02), class = sigma)Sample from the prior 1
fit1_prior <- brm(
Reaction ~ 1 + Days,
data = sleepstudy,
family = gaussian(),
prior = prior1,
sample_prior = "only",
file = paste0(BRMS_MODEL_DIR, "fit1_prior"),
file_refit = "on_change"
) Prior predictive checking
set.seed(652312)
pp_check(fit1_prior, ndraws = 100) +
ylim(c(0, 0.02)) +
theme_sub_axis_y(line = element_blank())Model 1: sample from the posterior
fit1 <- brm(
Reaction ~ 1 + Days,
data = sleepstudy,
family = gaussian(),
prior = prior1,
file = paste0(BRMS_MODEL_DIR, "fit1"),
file_refit = "on_change"
) Posterior summary
summary(fit1, priors = TRUE) Family: gaussian
Links: mu = identity
Formula: Reaction ~ 1 + Days
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Priors:
b ~ normal(0, 20)
Intercept ~ normal(250, 100)
<lower=0> sigma ~ exponential(0.02)
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 268.27 7.74 253.54 283.15 1.00 3793 2868
Days 11.34 1.85 7.71 14.96 1.00 3831 2749
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 51.14 3.02 45.83 57.55 1.00 4026 2586
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit1), points = TRUE, plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")Posterior predictive checking
pp_check(fit1, ndraws = 50) +
theme_sub_axis_y(line = element_blank())Prior sensitivity analysis
powerscale_plot_dens(fit1, variable = c("b_Intercept", "b_Days", "sigma"),
component = "prior")5 Simple linear model (informative priors)
Points to discuss:
- Priors will be influencing the posterior if chosen to be informative enough
- For models that are simple relative to the amount of data, prior distributions are unlikely to affect the posterior strongly, unless prior are very informative
Prior 2
prior2 <- prior(normal(250, 100), class = Intercept) +
prior(normal(0, 1), class = b) +
prior(exponential(0.02), class = sigma)Model 2: sample from the posterior
fit2 <- brm(
Reaction ~ 1 + Days,
data = sleepstudy,
family = gaussian(),
prior = prior2,
file = paste0(BRMS_MODEL_DIR, "fit2"),
file_refit = "on_change"
) Posterior summary
summary(fit2) Family: gaussian
Links: mu = identity
Formula: Reaction ~ 1 + Days
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 299.79 5.59 288.80 310.64 1.00 3826 2899
Days 2.28 0.93 0.41 4.12 1.00 3596 2903
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 55.14 3.38 48.96 62.21 1.00 3390 2904
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit2), points = TRUE, plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")Prior sensitivity analysis
powerscale_plot_dens(fit2, variable = c("b_Intercept", "b_Days", "sigma"),
component = "prior")6 Simple linear model (informative priors with fat tails)
Points to discuss: - tails of the priors (normal vs. student-t)
Prior 2b
prior2b <- prior(normal(250, 100), class = Intercept) +
prior(student_t(7, 0, 1), class = b) +
prior(exponential(0.02), class = sigma)Model 2b: sample from the posterior
fit2b <- brm(
Reaction ~ 1 + Days,
data = sleepstudy,
family = gaussian(),
prior = prior2b,
file = paste0(BRMS_MODEL_DIR, "fit2b"),
file_refit = "on_change"
) Posterior summary
summary(fit2b) Family: gaussian
Links: mu = identity
Formula: Reaction ~ 1 + Days
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 279.50 9.48 261.69 299.47 1.00 2550 2288
Days 8.10 2.42 2.91 12.52 1.00 2674 2135
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 51.77 3.26 45.95 58.62 1.00 2695 2311
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit2b), points = TRUE, plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")Prior sensitivity analysis
powerscale_plot_dens(fit2b, variable = c("b_Intercept", "b_Days", "sigma"),
component = "prior")Illustrate difference between normal and Student-t prior:
x <- seq(-4, 4, 0.01)
d1 <- dnorm(x, 0, 1)
d2 <- dstudent_t(x, 7, 0, 1)
data.frame(
d = c(d1, d2),
x = rep(x, 2),
Prior = rep(c("normal(0, 1)", "Student-t(7, 0, 1)"), each = length(x))
) |>
ggplot(aes(x, d, color = Prior)) +
geom_line(size = 0.8) +
xlab(expression(b[1])) +
ylab("Density")Compute CI-bound for an exponential prior:
qexp(c(0.025, 0.975), 0.02)[1] 1.26589 184.44397
7 Linear varying intercept model
Points to discuss:
- How to represent unidimensional multilevel structures via priors
- priors on hyperparameters (SDs)
- shall the prior on sigma change now that we add more terms?
Prior 3
prior3 <- prior(normal(250, 100), class = Intercept) +
prior(normal(0, 20), class = b) +
prior(exponential(0.02), class = sigma) +
prior(exponential(0.02), class = sd)Model 3: sample from the posterior
fit3 <- brm(
Reaction ~ 1 + Days + (1 | Subject),
data = sleepstudy,
family = gaussian(),
prior = prior3,
file = paste0(BRMS_MODEL_DIR, "fit3"),
file_refit = "on_change"
) Posterior summary
summary(fit3) Family: gaussian
Links: mu = identity
Formula: Reaction ~ 1 + Days + (1 | Subject)
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Subject (Number of levels: 18)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 43.47 8.25 30.21 62.22 1.01 771 1322
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 267.93 10.78 246.92 289.29 1.01 715 1131
Days 11.40 1.13 9.18 13.63 1.00 4452 2833
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 30.47 1.96 26.94 34.64 1.00 3505 2734
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit3), plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")Posterior conditional effects
plot(conditional_effects(fit3, conditions = conditions, re_formula = NULL),
ncol = 6, points = TRUE, plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")8 Linear varying intercept and slope model
Points to discuss:
- How to represent multidimensional multilevel structures via priors
- priors on hyperparameters (SDs and correlations)
- The implications of the LKJ prior for correlation matrices
Prior 4
prior4 <- prior(normal(250, 100), class = Intercept) +
prior(normal(0, 20), class = b) +
prior(exponential(0.04), class = sigma) +
prior(exponential(0.04), class = sd, group = Subject, coef = Intercept) +
prior("exponential(1.0/15)", class = sd, group = Subject, coef = Days) +
prior(lkj(1), class = cor)Model 4: sample from the posterior
fit4 <- brm(
Reaction ~ 1 + Days + (1 + Days | Subject),
data = sleepstudy,
family = gaussian(),
prior = prior4,
save_pars = save_pars(all = TRUE)
) Posterior summary
summary(fit4) Family: gaussian
Links: mu = identity
Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Subject (Number of levels: 18)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 31.72 7.33 19.36 47.62 1.00 1683 2180
sd(Days) 7.20 1.79 4.18 11.15 1.00 1697 2316
cor(Intercept,Days) 0.21 0.29 -0.35 0.77 1.00 1524 1638
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 267.82 8.62 250.77 285.26 1.00 1564 2352
Days 11.28 1.97 7.40 15.24 1.00 1771 2356
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 25.90 1.80 22.56 29.72 1.00 2791 2798
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit4), plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")Posterior predictive checking
pp_check(fit4) +
theme_sub_axis_y(line = element_blank())Posterior conditional effects
plot(conditional_effects(fit4, conditions = conditions, re_formula = NULL),
ncol = 6, points = TRUE, plot = FALSE)[[1]] +
scale_x_continuous(breaks = 0:9) +
labs(y = "Reaction time (ms)")Illustrate the marginal LKJ(1) prior for different dimensions.
dmLKJ <- function(x, eta, d) {
dbeta((x + 1) / 2, eta + (d - 2)/2, eta + (d - 2)/2)
}
data.frame(x = rep(seq(-0.999, 0.999, 0.001), 3)) |>
mutate(
d = rep(c(2, 5, 10), each = n()/3),
dens = dmLKJ(x, 1, d),
d = factor(d)
) |>
ggplot(aes(x, dens, color = d)) +
geom_line(size = 0.8) +
scale_color_viridis_d() +
ylab("Density") +
xlab(expression(rho))8.1 Log-Linear prior-only model
Reuse priors from the normal linear model: Prior ln1
prior_ln1 <- prior(normal(250, 100), class = Intercept) +
prior(normal(0, 20), class = b) +
prior(exponential(0.02), class = sigma)Sample from the prior ln1
fit_ln1_prior <- brm(
Reaction ~ 1 + Days,
data = sleepstudy,
family = lognormal(),
prior = prior_ln1,
sample_prior = "only",
file = paste0(BRMS_MODEL_DIR, "fit_ln1_prior"),
file_refit = "on_change"
) Prior predictive checking
pp_check(fit_ln1_prior) +
theme_sub_axis_y(line = element_blank())Prior predictive checking
set.seed(652312)
prp_ln <- apply(posterior_predict(fit_ln1_prior), 2, function(x) mean(x[is.finite(x)]))
prp_ln_dat <- data.frame(y = sleepstudy$Reaction, yrep = prp_ln)
gg_ln1_prior <- ggplot(prp_ln_dat, aes(y, yrep)) +
geom_point() +
scale_x_continuous(trans = "log10") +
scale_y_continuous(trans = "log10") +
ylab("Mean yrep")Use more sensible priors: Prior ln2
prior_ln2 <- prior(normal(5, 0.55), class = Intercept) +
prior(normal(0, 0.2), class = b) +
prior(exponential(3), class = sigma)Sample from the prior ln2
fit_ln2_prior <- brm(
Reaction ~ 1 + Days,
data = sleepstudy,
family = lognormal(),
prior = prior_ln2,
sample_prior = "only",
file = paste0(BRMS_MODEL_DIR, "fit_ln2_prior"),
file_refit = "on_change"
) Prior predictive checking
pp_check(fit_ln2_prior) +
theme_sub_axis_y(line = element_blank())Prior predictive checking
set.seed(652312)
prp_ln <- apply(posterior_predict(fit_ln2_prior), 2, function(x) mean(x[is.finite(x)]))
prp_ln_dat <- data.frame(y = sleepstudy$Reaction, yrep = prp_ln)
gg_ln2_prior <- ggplot(prp_ln_dat, aes(y, yrep)) +
geom_point() +
scale_x_continuous(trans = "log10") +
scale_y_continuous(trans = "log10") +
ylab("Mean yrep")Prior predictive checking comparing priors ln1 and ln2
gg_ln1_prior + gg_ln2_priorSample from the posterior
fit_ln2 <- brm(
Reaction ~ 1 + Days,
data = sleepstudy,
family = lognormal(),
prior = prior_ln2,
file = paste0(BRMS_MODEL_DIR, "fit_ln2"),
file_refit = "on_change"
) Posterior summary
summary(fit_ln2) Family: lognormal
Links: mu = identity
Formula: Reaction ~ 1 + Days
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 5.59 0.02 5.54 5.64 1.00 4603 3347
Days 0.04 0.01 0.02 0.05 1.00 4869 3163
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.17 0.01 0.15 0.19 1.00 3691 2666
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit_ln2), plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")Prior sensitivity analysis
powerscale_plot_dens(fit_ln2, variable = c("b_Intercept", "b_Days", "sigma"),
component = "prior")9 Log-linear varying intercept and slope model
Points to discuss:
- positive only family may be preferred theoretically but may not always be required
- How a non-identity link (log in this case) messes with our intuition about parameters and hence with prior specification
- how Jensen’s inequality makes direct translations of prior difficult
- how Jacobian adjustment would be needed to ensure equivalence of two prior on different scales. Prior 5
prior5 <- prior(normal(5, 0.55), class = Intercept) +
prior(normal(0, 0.2), class = b) +
prior(exponential(6), class = sigma) +
prior(exponential(6), class = sd, group = Subject, coef = Intercept) +
prior(exponential(10), class = sd, group = Subject, coef = Days) +
prior(lkj(1), class = cor)Model5: sample from the posterior
fit5 <- brm(
Reaction ~ 1 + Days + (1 + Days | Subject),
data = sleepstudy,
family = lognormal(),
prior = prior5,
file = paste0(BRMS_MODEL_DIR, "fit5"),
file_refit = "on_change"
) Posterior summary
summary(fit5) Family: lognormal
Links: mu = identity
Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Subject (Number of levels: 18)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.13 0.03 0.09 0.19 1.00 1493 2503
sd(Days) 0.02 0.01 0.01 0.04 1.00 1375 2185
cor(Intercept,Days) 0.03 0.28 -0.48 0.59 1.00 1184 1985
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 5.59 0.03 5.52 5.65 1.00 1223 2108
Days 0.04 0.01 0.02 0.05 1.00 1762 2279
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.08 0.01 0.07 0.09 1.00 3039 2749
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit5), plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")Posterior predictive checking
pp_check(fit5) +
theme_sub_axis_y(line = element_blank())Posterior predictive checking comparing fit4 and fit5
pp_check(fit4, type = "intervals") + labs(y = "Reaction time (ms)") +
pp_check(fit5, type = "intervals") +
plot_layout(guides = "collect") Posterior conditional effects
plot(conditional_effects(fit5, conditions = conditions, re_formula = NULL),
ncol = 6, points = TRUE, plot = FALSE)[[1]] +
scale_x_continuous(breaks = 0:9) +
labs(y = "Reaction time (ms)")Prior sensitivity analysis
vars5 <- c("b_Intercept", "b_Days", "sd_Subject__Intercept",
"sd_Subject__Days", "cor_Subject__Intercept__Days", "sigma")
powerscale_plot_dens(fit5, variable = vars5, component = "prior")Compare with linear multilevel model which indicated the lognormal model having a little better predictive performance.
loo(fit4, fit5)Output of model 'fit4':
Computed from 4000 by 144 log-likelihood matrix.
Estimate SE
elpd_loo -692.0 21.7
p_loo 31.3 8.6
looic 1383.9 43.3
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.3]).
Pareto k diagnostic values:
Count Pct. Min. ESS
(-Inf, 0.7] (good) 141 97.9% 556
(0.7, 1] (bad) 2 1.4% <NA>
(1, Inf) (very bad) 1 0.7% <NA>
See help('pareto-k-diagnostic') for details.
Output of model 'fit5':
Computed from 4000 by 144 log-likelihood matrix.
Estimate SE
elpd_loo -680.4 19.0
p_loo 31.5 7.7
looic 1360.8 38.0
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.4]).
Pareto k diagnostic values:
Count Pct. Min. ESS
(-Inf, 0.7] (good) 141 97.9% 218
(0.7, 1] (bad) 2 1.4% <NA>
(1, Inf) (very bad) 1 0.7% <NA>
See help('pareto-k-diagnostic') for details.
Model comparisons:
elpd_diff se_diff
fit5 0.0 0.0
fit4 -11.6 4.7
10 log-linear distributional multilevel model
Points to discuss:
- Parameters for standard deviations on the log or log-log scale or hard to understand and hence set priors on.
- This is specifically true for standard deviations of standard deviations on the log or log-log scale.
- In theory, we would not need the exponential link on sigma but then we had to care for the positivity of the varying intercepts on sigma and hence would have to specify, for example, an hierarchical Gamma rather than an hierarchical normal prior.
- Look at prior predictions for the correlations to demonstrate the effect of the LKJ prior for larger than 2x2 matrices
Prior 6
prior6 <- prior(normal(5.5, 0.55), class = Intercept) +
prior(normal(0, 0.2), class = b) +
prior(normal(0, 0.3), class = Intercept, dpar = sigma) +
prior(exponential(3), class = sd, group = Subject, coef = Intercept) +
prior(exponential(5), class = sd, group = Subject, coef = Days) +
prior(exponential(3), class = sd, dpar = sigma, group = Subject) +
prior(lkj(1), class = cor)Model 6: sample from the posterior
fit6 <- brm(
bf(Reaction ~ 1 + Days + (1 + Days |S| Subject),
sigma ~ 1 + (1 |S| Subject)),
data = sleepstudy,
family = lognormal(),
prior = prior6,
file = paste0(BRMS_MODEL_DIR, "fit6"),
file_refit = "on_change"
) Posterior summary
summary(fit6) Family: lognormal
Links: mu = identity; sigma = log
Formula: Reaction ~ 1 + Days + (1 + Days | S | Subject)
sigma ~ 1 + (1 | S | Subject)
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Subject (Number of levels: 18)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept) 0.16 0.04 0.10 0.26 1.00
sd(Days) 0.03 0.01 0.02 0.05 1.01
sd(sigma_Intercept) 1.88 0.40 1.18 2.78 1.01
cor(Intercept,Days) -0.01 0.28 -0.53 0.55 1.00
cor(Intercept,sigma_Intercept) 0.25 0.38 -0.51 0.84 1.01
cor(Days,sigma_Intercept) 0.26 0.41 -0.60 0.88 1.01
Bulk_ESS Tail_ESS
sd(Intercept) 1048 1386
sd(Days) 917 1389
sd(sigma_Intercept) 1200 1801
cor(Intercept,Days) 1039 1181
cor(Intercept,sigma_Intercept) 400 1049
cor(Days,sigma_Intercept) 370 675
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 5.63 0.08 5.48 5.80 1.01 394 1021
sigma_Intercept -0.94 0.36 -1.66 -0.28 1.00 1939 2114
Days 0.04 0.02 0.01 0.08 1.01 382 861
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit6, conditions = conditions, re_formula = NULL),
ncol = 6, points = TRUE, plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")11 Exgaussian distributional multilevel model
Points to discuss:
- All the models before are relatively robust to the choice of priors: Even completely flat priors work ok and don’t produce much different results because the models are parsimonious relative to the amount of data and quality
- This robustness is no longer the case for the upcoming models
- Exgaussian is linear yet can handle strong right skewness at the expense of not strictly respecting the lower boundary of zero
- Avoids the log-log awkwardness of the lognormal and related models
- Interpretation of the beta (skewness) parameter at least somewhat intuitive
Prior 7
prior7 <- prior(normal(250, 100), class = Intercept) +
prior(normal(0, 20), class = b) +
prior(normal(0, 5), class = Intercept, dpar = sigma) +
prior(normal(0, 5), class = beta) +
prior(exponential(0.04), class = sd, group = Subject, coef = Intercept) +
prior(exponential(0.05), class = sd, group = Subject, coef = Days) +
prior(exponential(0.2), class = sd, dpar = sigma, group = Subject) +
prior(lkj(1), class = cor)Model 7: sample from the posterior
fit7 <- brm(
bf(Reaction ~ 1 + Days + (1 + Days |S| Subject),
sigma ~ 1 + (1 |S| Subject)),
data = sleepstudy,
family = exgaussian(),
prior = prior7,
inits = 0,
cores = 4,
file = paste0(BRMS_MODEL_DIR, "fit7"),
file_refit = "on_change"
) Posterior summary for model 7
summary(fit7) Family: exgaussian
Links: mu = identity; sigma = log
Formula: Reaction ~ 1 + Days + (1 + Days | S | Subject)
sigma ~ 1 + (1 | S | Subject)
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Subject (Number of levels: 18)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept) 111.77 110.74 0.16 229.32 3.17
sd(Days) 27.87 14.89 9.11 45.26 3.94
sd(sigma_Intercept) 3.97 3.33 0.81 9.58 3.89
cor(Intercept,Days) 0.34 0.47 -0.34 0.81 3.36
cor(Intercept,sigma_Intercept) -0.21 0.52 -0.67 0.68 3.24
cor(Days,sigma_Intercept) -0.23 0.22 -0.54 0.11 3.42
Bulk_ESS Tail_ESS
sd(Intercept) 4 12
sd(Days) 4 11
sd(sigma_Intercept) 4 14
cor(Intercept,Days) 4 13
cor(Intercept,sigma_Intercept) 4 14
cor(Days,sigma_Intercept) 4 11
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 2.59 3.73 -3.19 7.07 3.25 4 17
sigma_Intercept 1.69 1.93 -1.44 3.40 3.54 4 11
Days 0.01 0.88 -1.46 0.72 2.96 5 12
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
beta 33.63 32.07 1.50 69.73 3.04 5 11
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit7, conditions = conditions, re_formula = NULL),
ncol = 6, points = TRUE, plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")12 Exgaussian distributional multilevel model (default priors)
Points to discuss:
- some parameters are better informed by the data than others
- using our custom priors and default priors barely matters for most except for the skewness parameter where the effect is clearly visible.
- default priors are already chosen in an effort to be sensible(ish)
Model 8: sample from the posterior
fit8 <- brm(
bf(Reaction ~ 1 + Days + (1 + Days |S| Subject),
sigma ~ 1 + (1 |S| Subject)),
data = sleepstudy,
family = exgaussian(),
# no prior argument: default priors are used
inits = 0,
cores = 4,
file = paste0(BRMS_MODEL_DIR, "fit8"),
file_refit = "on_change"
) Posterior summary
summary(fit8) Family: exgaussian
Links: mu = identity; sigma = log
Formula: Reaction ~ 1 + Days + (1 + Days | S | Subject)
sigma ~ 1 + (1 | S | Subject)
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Subject (Number of levels: 18)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept) 61.82 41.60 27.73 133.00 2.56
sd(Days) 15.80 14.68 5.90 41.17 2.45
sd(sigma_Intercept) 0.85 0.19 0.60 1.19 2.01
cor(Intercept,Days) 0.30 0.41 -0.26 0.94 2.42
cor(Intercept,sigma_Intercept) 0.54 0.30 -0.10 0.85 2.16
cor(Days,sigma_Intercept) 0.50 0.19 0.06 0.71 1.72
Bulk_ESS Tail_ESS
sd(Intercept) 5 28
sd(Days) 6 25
sd(sigma_Intercept) 7 30
cor(Intercept,Days) 6 21
cor(Intercept,sigma_Intercept) 5 22
cor(Days,sigma_Intercept) 7 32
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 242.75 47.68 160.72 276.84 2.64 6 26
sigma_Intercept 2.70 0.41 2.10 3.18 2.84 5 14
Days 3.90 15.34 -22.53 14.45 2.59 5 11
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
beta 2.57 1.86 1.45 7.99 2.81 5 13
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit8, conditions = conditions, re_formula = NULL),
ncol = 6, points = TRUE, plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")13 Exgaussian distributional multilevel model (flat priors)
Points to discuss:
- without reasonable(ish) priors, sampling may fall apart with some really long-running chains, divergent transitions and increased Rhats.
- priors matter, but if you don’t specify quite informative priors or have small data relative to the model complexity, you will likely not see a difference compared to sensible default priors
Extract all default priors:
prior9 <- get_prior(
bf(Reaction ~ 1 + Days + (1 + Days |S| Subject),
sigma ~ 1 + (1 |S| Subject)),
data = sleepstudy,
family = exgaussian()
)Make all priors flat (except for the varying coefficients’ priors):
prior9$prior <- ""Model 9: sample from the posterior
fit9 <- brm(
bf(Reaction ~ 1 + Days + (1 + Days |S| Subject),
sigma ~ 1 + (1 |S| Subject)),
data = sleepstudy,
family = exgaussian(),
prior = prior9,
inits = 0,
cores = 4,
file = paste0(BRMS_MODEL_DIR, "fit9"),
file_refit = "on_change"
) Posterior summary
summary(fit9) Family: exgaussian
Links: mu = identity; sigma = log
Formula: Reaction ~ 1 + Days + (1 + Days | S | Subject)
sigma ~ 1 + (1 | S | Subject)
Data: sleepstudy (Number of observations: 144)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Subject (Number of levels: 18)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept) 203.92 111.77 29.13 335.48 3.28
sd(Days) 9.08 5.84 4.00 19.03 3.43
sd(sigma_Intercept) 1.50 1.69 0.44 4.55 3.41
cor(Intercept,Days) 0.34 0.51 -0.25 0.92 2.61
cor(Intercept,sigma_Intercept) -0.06 0.23 -0.41 0.27 3.06
cor(Days,sigma_Intercept) 0.26 0.19 -0.03 0.56 2.71
Bulk_ESS Tail_ESS
sd(Intercept) 5 28
sd(Days) 4 11
sd(sigma_Intercept) 4 18
cor(Intercept,Days) 5 12
cor(Intercept,sigma_Intercept) 5 13
cor(Days,sigma_Intercept) 5 13
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 54.37 120.81 -55.11 258.42 3.57 4 13
sigma_Intercept 1.82 1.45 -0.70 2.92 2.95 5 20
Days 4.75 7.10 -5.27 13.43 3.34 4 12
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
beta 35.48 57.86 1.62 145.79 3.05 5 12
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior conditional effects
plot(conditional_effects(fit9, conditions = conditions, re_formula = NULL),
ncol = 6, points = TRUE, plot = FALSE)[[1]] +
labs(y = "Reaction time (ms)")14 Model comparison and checking
Compute LOO-CV for models 3, 4, and 5
fit3 <- add_criterion(fit3, criterion = "loo", save_psis = TRUE, reloo = TRUE)
fit4 <- add_criterion(fit4, criterion = "loo", save_psis = TRUE, reloo = TRUE)
fit5 <- add_criterion(fit5, criterion = "loo", save_psis = TRUE, reloo = TRUE)Compare varying intercept and slope models with normal and log-normal data models
loo_compare(fit4, fit5) elpd_diff se_diff
fit5 0.0 0.0
fit4 -12.2 5.0
Log-normal is not different from normal model, which makes sense as all reaction times are fra from 0.
Posterior predictive checking of fit4 using LOO predictive intervals
pp_check(fit4, type = "loo_intervals") +
labs(y = "Reaction time (ms)")There are clearly some outliers.
Create varying intercept (fit3t) and varying intercept and slope (fit4t) models with Student’s t data model
fit3t <- update(fit3, family = student())
fit4t <- update(fit4, family = student())Compute LOO-CV for models 3, 4, and 5
fit3t <- add_criterion(fit3t, criterion = "loo", save_psis = TRUE)
fit4t <- add_criterion(fit4t, criterion = "loo", save_psis = TRUE)Posterior predictive checking of fit4t using LOO predictive intervals
pp_check(fit4t, type = "loo_intervals") +
labs(y = "Reaction time (ms)")The LOO predictive intervals look better now.
theme_set(bayesplot::theme_default(base_family = "sans", base_size = 14))
pp_check(fit4, type = "loo_pit_ecdf", method = "correlated", moment_match = TRUE) +
pp_check(fit4t, type = "loo_pit_ecdf", method = "correlated") Looking at the LOO-PIT plots, we see that normal and log-normal models have too wide predictive distribution for most observations, which is due to a few outliers inflating the residual scale. LOO-PIT plot for Student’s t model looks better. Compare normal and Student’s t models
loo_compare(fit4, fit4t) elpd_diff se_diff
fit4t 0.0 0.0
fit4 -40.8 13.7
Student’s t model has much better predictive performance.
Examine how much adding varying slope improved predictive performance in case of normal data model:
loo_compare(fit3, fit4) elpd_diff se_diff
fit4 0.0 0.0
fit3 -13.6 10.3
Examine how much adding varying slope improved predictive performance in case of Student’s t data model:
loo_compare(fit3t, fit4t) elpd_diff se_diff
fit4t 0.0 0.0
fit3t -45.6 8.6
When using Student’s t model, the predictive performance difference between not using or using varying slope is bigger.
References
Licenses
- Code © 2025, Paul Bürkner and Aki Vehtari, licensed under BSD-3.
- Text © 2025, Paul Bürkner and Aki Vehtari, licensed under CC-BY-NC 4.0.