Post script Appendix

This notebook is a post script appendix to the paper

  • Aki Vehtari, Andrew Gelman, Daniel Simpson, Bob Carpenter, Paul-Christian Bürkner (2021): Rank-normalization, folding, and localization: An improved \(\widehat{R}\) for assessing convergence of MCMC. Bayesian analysis, 16(2):667-718. doi:10.1214/20-BA1221.

The code used to make this notebook is available at https://github.com/avehtari/rhat_ess.

Introduction

The MCMC effective sample size (ESS) and Monte Carlo standard error (MCSE) estimated for one chain includes estimation of the correlation between the iterations, for example, using autocorrelation time (or spectral density at frequency zero). As a finite number of MCMC draws are available the autocorrelation estimates for bigger lags are noisier. In the paper, we wrote about autocorrelation estimators

In our experiments, Geyer’s [(1992)] truncation had superior stability compared to flat-top (Doss et al., 2014) and slug-sail (Vats and Knudson, 2018) lag window approaches.

(Vats and Knudson (2018) has since been published as Vats and Knudson (2021)). As the main point of the paper was not comparison of autocorrelation estimators, the experimental results were not included in the paper.

This notebook includes additional experiments comparing the Geyer’s (1992) truncation approach to three other methods for estimating effective sample size (ESS) and corresponding Monte Carlo standard error (MCSE). We also compare the new quantile MCSE method we proposed in the paper (Vehtari et al., 2021) to batch means quantile MCSE method by (Gong and Flegal, 2015). Finally, we compare the behavior of ESS implementations in four R packages in case of multiple chains and well separated multimodal distribution.

The methods and R packages compared

  • stableGR (n.eff) computes a weighted sum of autocorrelations using slug-sail lag window (Vats and Knudson, 2018)
  • ‘mcmcse’ (ess and mcse.q) uses by default a batch means approach (Gong and Flegal, 2015)
  • coda (effectiveSize) computes the spectral density at frequency zero by fitting an AR model to the chain (Heidelberger and Welch, 1981)
  • posterior: ess_basic computes sum of autocorrelations using Geyer’s truncation rule (Geyer, 1992), and mcse_quantile uses in addition an inverse approach for quantile MCSE. The posterior package also takes into account the the possibility that the chains are not mixing by using also multi-chain \(\widehat{R}\) in ESS and MCSE computations.

tl;dr

In case of well mixing chains, ess_basic and mcse_quantile in the posterior package are the most accurate methods. In case of well separated modes, the posterior package also provides appropriately small ESS indicating that the chains are not mixing.

Comparison of ESS estimators with AR(1) simulation

Mean of x

We simulate \(M=4\) chains from a known AR(1) process with varying parameter \(\phi\), and normal(0,1) marginal distribution.

x1 = arima.sim(n = N, list(ar = c(phi)), sd = sqrt((1-phi^2)));
x2 = arima.sim(n = N, list(ar = c(phi)), sd = sqrt((1-phi^2)));
x3 = arima.sim(n = N, list(ar = c(phi)), sd = sqrt((1-phi^2)));
x4 = arima.sim(n = N, list(ar = c(phi)), sd = sqrt((1-phi^2)));

We vary the length of the chains, \(N\), and the AR process parameter \(\phi\)

Ns=c(100, 1000, 10000)
phis=c(-0.3, -0.1, 0.1, 0.3, 0.5, 0.7, 0.9)

\(N=1000\) is the default in Stan, and \(N=100\) and \(N=10000\) are used to illustrate the behavior with less or more MCMC iterations. The total number of draws is \(S=M*N\).

The values of \(\phi\) have been chosen to reflect typical ESS/\(S\) values seen from Stan. The chosen \(\phi\) values correspond to ESS/\(S\) values \((1.9, 1.2, 0.8, 0.5, 0.3, 0.2, 0.1)\). The two first \(\phi\) values being negative produce antithetic chains with ESS>\(S\), which is not rare when using Stan’s dynamic Hamiltonian Monte Carlo.

Simulation is repeated \(100\,000\) times for each combination of \(N\) and \(\phi\). For each simulation we compute the empirical mean. The variance of the empirical means and known true variance are used to compute the asymptotic efficiency, that is, true ESS/\(S\) when \(S\rightarrow \infty\). For each simulation we compute three ESS estimates and use those to compute Monte Carlo standard error (MCSE) which is compared to the actual error, that is, the difference between the empirical mean \(\hat{\mu}\) and known true mean \(\mu=0\).

The following figure summarizes the results. In the case of estimating some posterior expectation \(\mathrm{E}[g(x)]\), the used ESS estimators assume finite variance and MCSE is simply \(\mathrm{sd}[g(x)]/\sqrt{\mathrm{ESS}}\). Thus accuracy of ESS is directly related to the accuracy of MCSE. We report root mean square of standardized errors \(|\mu - \hat{\mu}|/\widehat{\mathrm{MCSE}}\), which is in case of well calibrated ESS and MCSE should be close to 1. Values less than 1 indicate underestimated ESS and overestimated MCSE, and values larger than 1 indicate overestimated ESS and underestimated MCSE.