Bayesian data analysis - BRMS demos

Author

Aki Vehtari

Published

2023-12-05

Modified

2024-10-10

1 Introduction

This notebook contains several examples of how to use Stan in R with brms. This notebook assumes basic knowledge of Bayesian inference and MCMC. The examples are related to Bayesian data analysis course.

Load packages

Code
library(tidyr)
library(dplyr)
library(tibble)
library(pillar)
library(stringr)
library(brms)
options(brms.backend = "cmdstanr", mc.cores = 2)
library(posterior)
options(posterior.num_args=list(digits=2))
options(pillar.negative = FALSE)
library(loo)
library(priorsense)
#options(priorsense.use_plot_theme=FALSE)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans", base_size=14))
library(tidybayes)
library(ggdist)
library(patchwork)
library(RColorBrewer)
library(tinytable)
options(tinytable_format_num_fmt = "significant_cell", tinytable_format_digits = 2, tinytable_tt_digits=2)
SEED <- 48927 # set random seed for reproducibility

2 Bernoulli model

Toy data with sequence of failures (0) and successes (1). We would like to learn about the unknown probability of success. brms wants the data in a data frame format (tibble which is a variant of data frame can be used, too).

data_bern <- data.frame(y = c(1, 1, 1, 0, 1, 1, 1, 0, 1, 0))

As usual in case of generalized linear models, (GLMs) brms defines the priors on the latent model parameters. With Bernoulli the default link function is logit, and thus the prior is set on logit(theta). As there are no covariates logit(theta)=Intercept. The brms default prior for Intercept is student_t(3, 0, 2.5), but we use student_t(7, 0, 1.5) which is close to logistic distribution, and thus makes the prior near-uniform for theta. We can simulate from these priors to check the implied prior on theta. We next compare the result to using normal(0, 1) prior on logit probability. We visualize the implied priors by sampling from the priors.

plogis is the cumulative density function for logistic distribution, which is equal to inverse-logit transformation. Base-R’s Student’s t-distribution does not have mu and sigma arguments, and thus we use a function from ggdist package.

data.frame(theta = plogis(ggdist::rstudent_t(n=20000, df=3, mu=0, sigma=2.5))) |>
  mcmc_hist() +
  xlim(c(0,1)) +
  labs(title='Default brms student_t(3, 0, 2.5) for Intercept')

data.frame(theta = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
  mcmc_hist() +
  xlim(c(0,1)) +
  labs(title='student_t(7, 0, 1.5) for Intercept')

Almost uniform prior on theta could be obtained also with normal(0,1.5)

data.frame(theta = plogis(rnorm(n=20000, mean=0, sd=1.5))) |>
  mcmc_hist() +
  xlim(c(0,1)) +
  labs(title='normal(0, 1.5) for Intercept')

Formula y ~ 1 corresponds to a model \mathrm{logit}(\theta) = \alpha\times 1 = \alpha. brms denotes the \alpha as Intercept.

fit_bern <- brm(y ~ 1, family = bernoulli(), data = data_bern,
                prior = prior(student_t(7, 0, 1.5), class='Intercept'),
                seed = SEED, refresh = 0)

Check the summary of the posterior and inference diagnostics.

fit_bern
 Family: bernoulli 
  Links: mu = logit 
Formula: y ~ 1 
   Data: data_bern (Number of observations: 10) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.76      0.64    -0.45     2.05 1.00     1602     1574

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Extract the posterior draws in data frame (df) format

draws <- as_draws_df(fit_bern)

We can select subset of stored variables with subset_draws() and get summary information using summarise_draws()

draws |>
  subset_draws(variable='b_Intercept') |>
  summarise_draws()
# A tibble: 1 × 10
  variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 b_Intercept  0.76   0.73  0.64  0.62 -0.27  1.85  1.00  1601.73  1573.87

We get a prettier table using tinytable::tt(). Earlier we had defined options

options(tinytable_format_num_fmt = "significant_cell",
        tinytable_format_digits = 2,
        tinytable_tt_digits=2)

which makes the table to show only 2 significant digits for each value, which is sufficient accuracy and having fewer digits makes the table easier to read.

draws |>
  subset_draws(variable='b_Intercept') |>
  summarise_draws() |>
  tt()
tinytable_ll7cc0oh3p0c2jqc3w8f
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
b_Intercept 0.76 0.73 0.64 0.62 -0.27 1.9 1 1602 1574

We can compute the probability of success by using plogis which is equal to inverse-logit function

draws <- draws |>
  mutate_variables(theta=plogis(b_Intercept))

Summary of theta

draws |>
  subset_draws(variable='theta') |>
  summarise_draws() |>
  tt()
tinytable_3hc7lqdc5ee9syjom936
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
theta 0.67 0.67 0.13 0.13 0.43 0.86 1 1602 1574

Histogram of theta using bayesplot::mcmc_hist()

mcmc_hist(draws, pars='theta') +
  xlab('theta') +
  xlim(c(0,1))

Prior and likelihood sensitivity plot shows posterior density estimate depending on amount of power-scaling. Overlapping lines indicate low sensitivity and wider gaps between lines indicate greater sensitivity.

powerscale_plot_dens(draws, fit=fit_bern, variable='theta',
                     help_text=FALSE) +
  xlim(c(0,1))

We can summarise the prior and likelihood sensitivity using cumulative Jensen-Shannon distance.

powerscale_sensitivity(draws, fit=fit_bern, variable="theta") |>
  tt()
tinytable_x21h680vqjrgwfk560vn
variable prior likelihood diagnosis
theta 0.041 0.11 -

3 Binomial model

Instead of sequence of 0’s and 1’s, we can summarize the data with the number of trials and the number successes and use Binomial model. The prior is specified in the ‘latent space’. The actual probability of success, theta = plogis(alpha), where plogis is the inverse of the logistic function.

Binomial model with the same data and prior

data_bin <- data.frame(N = c(10), y = c(7))

Formula y | trials(N) ~ 1 corresponds to a model \mathrm{logit}(\theta) = \alpha, and the number of trials for each observation is provided by | trials(N)

fit_bin <- brm(y | trials(N) ~ 1, family = binomial(), data = data_bin,
               prior = prior(student_t(7, 0,1.5), class='Intercept'),
               seed = SEED, refresh = 0)

Check the summary of the posterior and inference diagnostics.

fit_bin
 Family: binomial 
  Links: mu = logit 
Formula: y | trials(N) ~ 1 
   Data: data_bin (Number of observations: 1) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.75      0.63    -0.44     2.05 1.00     1330     1429

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Extract the posterior draws in data frame format

draws <- as_draws_df(fit_bin)

Summary of latent intercept

draws |>
  subset_draws(variable='b_Intercept') |>
  summarise_draws() |>
  tt()
tinytable_5z4ibh0ix944csk9k02e