Notebook for Assignment 4

Author

Aki Vehtari et al.

1 General information

This assignment is related to Lecture 4 and BDA3 Chapters 3 and 10.

General Instructions for Answering the Assignment Questions
  • Questions below are exact copies of the text found in the MyCourses quiz and should serve as a notebook where you can store notes and code.
  • We recommend opening these notebooks in the Aalto JupyterHub, see how to use R and RStudio remotely.
  • For inspiration for code, have a look at the BDA R Demos and the specific Assignment code notebooks
  • Recommended additional self study exercises for each chapter in BDA3 are listed in the course web page. These will help to gain deeper understanding of the topic.
  • Common questions and answers regarding installation and technical problems can be found in Frequently Asked Questions (FAQ).
  • Deadlines for all assignments can be found on the course web page and in MyCourses.
  • You are allowed to discuss assignments with your friends, but it is not allowed to copy solutions directly from other students or from internet.
  • Do not share your answers publicly.
  • Do not copy answers from the internet or from previous years. We compare the answers to the answers from previous years and to the answers from other students this year.
  • Use of AI is allowed on the course, but the most of the work needs to by the student, and you need to report whether you used AI and in which way you used them (See points 5 and 6 in Aalto guidelines for use of AI in teaching).
  • All suspected plagiarism will be reported and investigated. See more about the Aalto University Code of Academic Integrity and Handling Violations Thereof.
  • If you have any suggestions or improvements to the course material, please post in the course chat feedback channel, create an issue, or submit a pull request to the public repository!
Setup

The following will set-up markmyassignment to check your functions at the end of the notebook:

if(!require(markmyassignment)){
    install.packages("markmyassignment")
    library(markmyassignment)
}
Loading required package: markmyassignment
assignment_path = paste("https://github.com/avehtari/BDA_course_Aalto/blob/master/tests/assignment4.yml", sep="")
set_assignment(assignment_path)    
Assignment set:
assignment4: Bayesian Data Analysis: Assignment 4
The assignment contain the following (4) tasks:
- log_importance_weights
- normalized_importance_weights
- S_eff
- posterior_mean

The following installs and loads the aaltobda package:

if(!require(aaltobda)){
    install.packages("remotes")
    remotes::install_github("avehtari/BDA_course_Aalto", subdir = "rpackage", upgrade="never")
    library(aaltobda)
}
Loading required package: aaltobda

The following installs and loads the latex2exp package, which allows us to use LaTeX in plots:

if(!require(latex2exp)){
    install.packages("latex2exp")
    library(latex2exp)
}
Loading required package: latex2exp

1.1 Setting up advanced packages (posterior and ggdist)

The following installs and loads the posterior package, which allows us to use its rvar Random Variable Datatype:

if(!require(posterior)){
    install.packages("posterior")
    library(posterior)
}
Loading required package: posterior
This is posterior version 1.6.0

Attaching package: 'posterior'
The following object is masked from 'package:aaltobda':

    mcse_quantile
The following objects are masked from 'package:stats':

    mad, sd, var
The following objects are masked from 'package:base':

    %in%, match

The following installs and loads the ggdist package for advanced plotting functions:

if(!require(ggplot2)){
    install.packages("ggplot2")
    library(ggplot2)
}
Loading required package: ggplot2
ggplot2::theme_set(theme_minimal(base_size = 14))
if(!require(ggdist)){
    install.packages("ggdist")
    library(ggdist)
}
Loading required package: ggdist

2 Mean estimates and the Monte Carlo standard error (MCSE)

In this exercise, we familiarise ourselves with the concept of the MCSE. As you have learned from the lecture, this allows us to determine how many decimal digits should be reported. In the below, we give you code to get started on quiz question 1.5 on MyCourses. See also The digits case study for further illustration.

set.seed(4711)

NUMBER_OF_SAMPLES <- 1
NUMBER_OF_DRAWS <- 1

# create a vector to store the sample means
sample_means <- rep(NA, times = NUMBER_OF_SAMPLES)

# loop over the number of samples
for (i in 1:NUMBER_OF_SAMPLES) {
  # generate a sample from the gamma distribution of size 400
    #sample <- 
  # calculate the sample mean and save it to the vector
    #sample_means[i] <- mean(sample)
}

# calculate the standard deviation of the sample means
 sd(sample_means)
[1] NA

3 Bioassay model and importance sampling

In this exercise, you will use a dose-response relation model that is used in BDA3 Section 3.7 and in the chapter reading instructions. The used likelihood is the same, but instead of uniform priors, we will use a bivariate normal distribution as the joint prior distribution of the parameters \(\alpha\) and \(\beta\).

In the prior distribution for \((\alpha,\beta)\), the marginal distributions are \(\alpha \sim N(0,2^2)\) and \(\beta \sim N(10,10^2)\), and the correlation between them is \(\mathrm{corr}(\alpha, \beta)=0.6\).

First, implement a function for computing the log importance ratios (log importance weights) when the importance sampling target distribution is the posterior distribution, and the proposal distribution is the prior distribution.

Non-log importance ratios are given by equation (10.3) in the course book. The fact that our proposal distribution is the same as the prior distribution makes this task easier. The logarithm of the likelihood can be computed with the bioassaylp function from the aaltobda package. The data required for the likelihood can be loaded with data("bioassay").

# Useful functions: bioassaylp (from aaltobda)
alpha_test <- c(1.896, -3.6, 0.374, 0.964, -3.123, -1.581) 
beta_test <- c(24.76, 20.04, 6.15, 18.65, 8.16, 17.4)

log_importance_weights<- function(alpha, beta){ 
  # Do computation here, and return as below.
  # This is the correct return value for the test data provided above.
  c(-8.95, -23.47, -6.02, -8.13, -16.61, -14.57)
}

Then, implement a function for computing normalized importance ratios from the unnormalized log ratios. In other words, exponentiate the log ratios and scale them such that they sum to one.

normalized_importance_weights <- function(alpha, beta) {
  # Do computation here, and return as below.
  # This is the correct return value for the test data provided above. 
  c(0.045, 0.000, 0.852, 0.103, 0.000, 0.000)
  }

Then, Sample 4000 draws of \(\alpha\) and \(\beta\) from the prior distribution. Compute and plot a histogram of the 4000 normalized importance ratios (plotting not required for MyCourses quiz, but important for building intution and checking).

Use the function rmvnorm from the aaltobda package for sampling and the functions you implemented before.

Then, using the normalised importance weights, calculate the posterior mean.

posterior_mean<- function(alpha, beta){
  # Do computation here, and return as below.
  # This is the correct return value for the test data provided above. 
  c(0.50326, 8.27495) 
  }

Then, using the importance ratios, compute the importance sampling effective sample size \(S_{\text{eff}}\) and report it.

S_eff <- function(alpha, beta) { 
  # Do computation here, and return as below. 
  # This is the correct return value for the test data provided above. 
  1.354 }

Equation (10.4) in the course book.

BDA3 1st (2013) and 2nd (2014) printing have an error for \(\tilde{w}(\theta^s)\) used in the effective sample size equation (10.4). The normalized weights equation should not have the multiplier S (the normalized weights should sum to one). The later printings, the online version, and the slides have the correct equation.

Finally, let’s compute the posterior mean MCSE and report the number of digits for the means based on the MCSEs.

The values below are only a test case, you need to use 4000 draws for \(\alpha\) and \(\beta\) in the final report.

Use the same equation for the MCSE of \(\text{E}[\theta]\) as earlier (\(\sqrt{\text{Var} [\theta]/S}\)), but now replace \(S\) with \(S_{\text{eff}}\). To compute \(\text{Var} [\theta]\) with importance sampling, use the identity \(\text{Var}[\theta] = \text{E}[\theta^2] - \text{E}[\theta]^2\).

posterior_mean_MCSE <- function(alpha,beta) {
  # Do computation here, and return as below.
  # This is the correct return value for the test data provided above. 
  c(0.3031995,4.4797743)
  }

4 Bioassay model

You are given 4000 independent draws from the posterior distribution of the model in the dataset bioassay_posterior in the aaltobda package.

The number of significant digits can be different for the mean and quantile estimates. In some other cases, the number of digits reported can be less than MCSE allows for practical reasons as discussed in the lecture.

Hint:

Quantiles can be computed with the quantile function. With \(S\) draws, the MCSE for \(\text{E}[\theta]\) is \(\sqrt{\text{Var} [\theta]/S}\). MCSE for the quantile estimates can be computed with the mcse_quantile function from the aaltobda package.

If you are unsure how to proceed, find the error(s) in the code below:

bioassay_posterior^2 |> as_draws() |> mutate_variables(LD50 = exp(-beta/alpha)) |> mutate_variables(P = as.numeric(LD50 < 0.05)) |> summarise_draws(mean, mcse_mean)
# A tibble: 4 × 3
  variable        mean mcse_mean
  <chr>          <dbl>     <dbl>
1 alpha      1.85      0.0375   
2 beta     135.        1.92     
3 LD50       0.0000832 0.0000290
4 P          1.00      0.000353