Loading required package: markmyassignment
Assignment set:
assignment4: Bayesian Data Analysis: Assignment 4
The assignment contain the following (4) tasks:
- log_importance_weights
- normalized_importance_weights
- S_eff
- posterior_mean
This assignment is related to Lecture 4 and BDA3 Chapters 3 and 10.
Reading instructions:
The following will set-up markmyassignment
to check your functions at the end of the notebook:
Loading required package: markmyassignment
Assignment set:
assignment4: Bayesian Data Analysis: Assignment 4
The assignment contain the following (4) tasks:
- log_importance_weights
- normalized_importance_weights
- S_eff
- posterior_mean
The following installs and loads the aaltobda
package:
Loading required package: aaltobda
The following installs and loads the latex2exp
package, which allows us to use LaTeX in plots:
posterior
and ggdist
)The following installs and loads the posterior
package, which allows us to use its rvar
Random Variable Datatype:
Loading required package: posterior
This is posterior version 1.6.1
Attaching package: 'posterior'
The following object is masked from 'package:aaltobda':
mcse_quantile
The following objects are masked from 'package:stats':
mad, sd, var
The following objects are masked from 'package:base':
%in%, match
The following installs and loads the ggdist
package for advanced plotting functions:
In this exercise, we familiarise ourselves with the concept of the MCSE. As you have learned from the lecture, this allows us to determine how many decimal digits should be reported. In the below, we give you code to get started on quiz question 1.5 on MyCourses. See also The digits case study for further illustration.
set.seed(4711)
NUMBER_OF_SAMPLES <- 1
NUMBER_OF_DRAWS <- 1
# create a vector to store the sample means
sample_means <- rep(NA, times = NUMBER_OF_SAMPLES)
# loop over the number of samples
for (i in 1:NUMBER_OF_SAMPLES) {
# generate a sample from the gamma distribution of size 400
#sample <-
# calculate the sample mean and save it to the vector
#sample_means[i] <- mean(sample)
}
# calculate the standard deviation of the sample means
sd(sample_means)
[1] NA
In this exercise, you will use a dose-response relation model that is used in BDA3 Section 3.7 and in the chapter reading instructions. The used likelihood is the same, but instead of uniform priors, we will use a bivariate normal distribution as the joint prior distribution of the parameters \(\alpha\) and \(\beta\).
In the prior distribution for \((\alpha,\beta)\), the marginal distributions are \(\alpha \sim N(0,2^2)\) and \(\beta \sim N(10,10^2)\), and the correlation between them is \(\mathrm{corr}(\alpha, \beta)=0.6\).
First, implement a function for computing the log importance ratios (log importance weights) when the importance sampling target distribution is the posterior distribution, and the proposal distribution is the prior distribution.
Non-log importance ratios are given by equation (10.3) in the course book. The fact that our proposal distribution is the same as the prior distribution makes this task easier. The logarithm of the likelihood can be computed with the bioassaylp
function from the aaltobda
package. The data required for the likelihood can be loaded with data("bioassay")
.
# Useful functions: bioassaylp (from aaltobda)
alpha_test <- c(1.896, -3.6, 0.374, 0.964, -3.123, -1.581)
beta_test <- c(24.76, 20.04, 6.15, 18.65, 8.16, 17.4)
log_importance_weights<- function(alpha, beta){
# Do computation here, and return as below.
# This is the correct return value for the test data provided above.
c(-8.95, -23.47, -6.02, -8.13, -16.61, -14.57)
}
Then, implement a function for computing normalized importance ratios from the unnormalized log ratios. In other words, exponentiate the log ratios and scale them such that they sum to one.
Then, Sample 4000 draws of \(\alpha\) and \(\beta\) from the prior distribution. Compute and plot a histogram of the 4000 normalized importance ratios (plotting not required for MyCourses quiz, but important for building intution and checking).
Use the function rmvnorm
from the aaltobda
package for sampling and the functions you implemented before.
Then, using the normalised importance weights, calculate the posterior mean.
Then, using the importance ratios, compute the importance sampling effective sample size \(S_{\text{eff}}\) and report it.
Equation (10.4) in the course book.
BDA3 1st (2013) and 2nd (2014) printing have an error for \(\tilde{w}(\theta^s)\) used in the effective sample size equation (10.4). The normalized weights equation should not have the multiplier S (the normalized weights should sum to one). The later printings, the online version, and the slides have the correct equation.
Finally, let’s compute the posterior mean MCSE and report the number of digits for the means based on the MCSEs.
The values below are only a test case, you need to use 4000 draws for \(\alpha\) and \(\beta\) in the final report.
Use the same equation for the MCSE of \(\text{E}[\theta]\) as earlier (\(\sqrt{\text{Var} [\theta]/S}\)), but now replace \(S\) with \(S_{\text{eff}}\). To compute \(\text{Var} [\theta]\) with importance sampling, use the identity \(\text{Var}[\theta] = \text{E}[\theta^2] - \text{E}[\theta]^2\).
You are given 4000 independent draws from the posterior distribution of the model in the dataset bioassay_posterior
in the aaltobda
package.
The number of significant digits can be different for the mean and quantile estimates. In some other cases, the number of digits reported can be less than MCSE allows for practical reasons as discussed in the lecture.
Hint:
Quantiles can be computed with the quantile
function. With \(S\) draws, the MCSE for \(\text{E}[\theta]\) is \(\sqrt{\text{Var} [\theta]/S}\). MCSE for the quantile estimates can be computed with the mcse_quantile
function from the aaltobda
package.
If you are unsure how to proceed, find the error(s) in the code below:
# A tibble: 4 × 3
variable mean mcse_mean
<chr> <dbl> <dbl>
1 alpha 1.85 0.0375
2 beta 135. 1.92
3 LD50 0.0000832 0.0000290
4 P 1.000 0.000353