Assignment 5
1 General information
The exercises here refer to the lecture 5/BDA chapter 11 content. The main topics for this assignment are the MCSE and importance sampling.
The exercises constitute 96% of the Quiz 5 grade.
We prepared a quarto notebook specific to this assignment to help you get started. You still need to fill in your answers on Mycourses! You can inspect this and future templates
- as a rendered html file (to access the qmd file click the “</> Code” button at the top right hand corner of the template)
- Questions below are exact copies of the text found in the MyCourses quiz and should serve as a notebook where you can store notes and code.
- We recommend opening these notebooks in the Aalto JupyterHub, see how to use R and RStudio remotely.
- For inspiration for code, have a look at the BDA R Demos and the specific Assignment code notebooks
- Recommended additional self study exercises for each chapter in BDA3 are listed in the course web page. These will help to gain deeper understanding of the topic.
- Common questions and answers regarding installation and technical problems can be found in Frequently Asked Questions (FAQ).
- Deadlines for all assignments can be found on the course web page and in MyCourses.
- You are allowed to discuss assignments with your friends, but it is not allowed to copy solutions directly from other students or from internet.
- Do not share your answers publicly.
- Do not copy answers from the internet or from previous years. We compare the answers to the answers from previous years and to the answers from other students this year.
- Use of AI is allowed on the course, but the most of the work needs to by the student, and you need to report whether you used AI and in which way you used them (See points 5 and 6 in Aalto guidelines for use of AI in teaching).
- All suspected plagiarism will be reported and investigated. See more about the Aalto University Code of Academic Integrity and Handling Violations Thereof.
- If you have any suggestions or improvements to the course material, please post in the course chat feedback channel, create an issue, or submit a pull request to the public repository!
1.1 Assignment questions
For convenience the assignment questions are copied below. Answer the questions in MyCourses.
Lecture 5/Chapter 11 of BDA Quiz (96% of grade)
1. Monte Carlo Methods
As you've learned in the previous assignments, in some cases we have closed form solution for useful posterior summaries for parameter vector
1.1 Why are Monte Carlo and other sampling based methods for posterior evaluation of functions
1.2 What is the main issue of grid sampling:
We may try to apply more elaborated methods such as rejection sampling (page 264 of BDA3). Rejection sampling is based on defining a positive function
1.3 : What is a problem with this rejection sampling:
2. Markov Chain Monte Carlo (MCMC)
The most popular way to obtain posterior draws nowadays is MCMC.2.1 What is a Markov chain?
2.2 Which of the following is not a Markov chain?
2.3 In question 2.2 we assumed discrete sequences. For the continuous case, we can use conditional transition distribution
2.4 What are the necessary and sufficient conditions to prove that a Markov chain has a unique stationary distribution?
2.5 Why is it often necessary to discard the first draws of the MCMC chain?
2.6 Why might MCMC be preferable to grid and rejection sampling?
3. Metropolis algorithm
The Metropolis-Hastings algorithm is a relatively simple MCMC algorithm, yet it is foundational for more advanced algorithms, such as the Hamiltonian Monte Carlo algorithm you will encounter later in the course. Later in this assignment, you will implement the Metropolis algorithm (page 278 of BDA3). Denote the proposal distribution at time
3.1: What is a main idea behind the Metropolis algorithm:
3.2: Denote the current proposal by
3.3: Which one is the correct rule for accepting the proposal? Denote the acceptance ratio you've determined in 3.2 by
3.4 Assume the posterior surface has a unique mode. What would eventually happen to the MCMC chain if adopting the rule
3.5 Retain the assumption made in question 3.4. Why is the posterior mode alone not generally useful for posterior expectations?
4. Trace plot diagnostics
Suppose you've successfully obtained multiple chains of MCMC draws from the Metropolis algorithm. Remember that the goal for the MCMC chains is to firstly converge to some unique distribution, and that all chains converge to the same distribution (with the hope of that being
One way is to investigate the trace plot: check visually whether the chains converged to the same distribution. If they have, we say the chains have mixed. To do this, we plot them in the same figure and observe if something went wrong. Below you are given an example with two chains. Look at them carefully and answer questions below.
Figure 1:4.1: Can you say that your chains converged to the same distribution based on Figure 1:
Figure 2:
4.2: Can you say that your chains are mixing well based on Figure 2:
5. Rhat diagnostics
Trace plot inspection can be useful when parameter space is small, however, when dealing with large spaces, it becomes inconvenient to analyse trace plots of each parameter separately, therefore we would like to also have some quantitative metrics for investigating convergence. One of them is called Rhat. There are different versions of it, for instance, version from BDA3 book(page 285). The much improved version you will use throughout the course is based on Vehtari et al. (2021). Let's look at the book version to understand why it is preferred to use the modified version.5.1: What is the definition of Rhat from BDA3 book:
5.2: Which range of values for Rhat would are a good indicator for convergence (see lecture 5)?
Imagine a situation where one chain converged to
Figure 3:
5.3: What do you observe:
5.4: However, despite of this issue, the Rhat value from the book equals 0.999859, indicating good convergence of the chains. What can be the explanation for this:
On the contrary, Rhat value from Vehtari et al paper equals 1.39025, much bigger than recommended 1.01, resolving the above problem. You can find more examples about new Rhat in the paper but also this demo.
6. ESS diagnostics
Another useful quantity is the effective sample size.
6.1: Why can we not rely on total sample size?
6.2: Suppose you have obtained S draws from your posterior with MCMC. We define the effective sample size as
6.3: Look at the trace plots below. For which sequences do you expect HIGHER effective sample size:
6.4: Select the autocorrelation function below corresponding to the chains in Figure 4 of the previous question.
7. Generalised linear model: Bioassay model with Metropolis algorithm
In this exercise, you will implement the Metropolis algorithm. Here you are provided code to get you started fro this part of the assignment. You are provided with the code for a Metropolis algorithm, however, it contains three errors. Your task is to find these errors, correct them and answer the following questions below.
7.1: Lets denote "bioassaylp(alpha_propose, beta_propose, x, y, n) - bioassaylp(alpha_previous, beta_previous, x, y, n) + dmvnorm(c(alpha_propose, beta_propose), prior_mean, prior_sigma, TRUE) - dmvnorm(c(alpha_previous, beta_previous), prior_mean, prior_sigma, TRUE)" as expression 1. How should we correct error 1:
7.2: Lets denote "if(runif(1) > density_ratio(alpha_propose, alpha_previous, beta_propose, beta_previous, x, y, n))" as expression 2. How should we correct error 2:
7.3: How should we correct error 3:
After you corrected the errors, you may run sampling. Investigate the code in the notebook and answer the questions below.
7.4: How many chains did we run: .
7.5: How many draws including warm-up have we obtained for each individual chain: .
7.6: What is the warm-up length: .
7.7: After performing sampling, let's check the convergence of the chains. The next code chunk will produce trace plots. Which one is the correct one:
After looking at trace plots, compute Rhat values for both alpha and beta:
7.8: Rhat for alpha=, Rhat for beta=
In addition to Rhat, compute ESS for for the mean and 0.25 quantile of alpha and beta:
7.9: ESS mean for alpha=, ESS mean for beta=.
7.10: ESS q0.25 for alpha=, ESS q0.25 for beta=.
Let us compare ESS of independent draws and MCMC draws. Use 4000 independent draws from the posterior distribution of the model in the dataset bioassay_posterior in the aaltobda package from Assignment 4.
7.11: ESS mean of alpha for bioassay_posterior =, ESS mean of beta for bioassay_posterior =
7.12: ESS q0.25 of alpha for bioassay_posterior =, ESS q0.25 of beta for bioassay_posterior =
7.13: Visualise scatter plot of posterior draws. Which one is the correct figure: