Assignment 4
1 General information
The exercises here refer to the lecture 4/BDA chapters 3 and 10 content. The main topics for this assignment are the MCSE and importance sampling.
The exercises constitute 96% of the Quiz 4 grade.
We prepared a quarto notebook specific to this assignment to help you get started. You still need to fill in your answers on Mycourses! You can inspect this and future templates
- as a rendered html file (to access the qmd file click the “</> Code” button at the top right hand corner of the template)
- Questions below are exact copies of the text found in the MyCourses quiz and should serve as a notebook where you can store notes and code.
- We recommend opening these notebooks in the Aalto JupyterHub, see how to use R and RStudio remotely.
- For inspiration for code, have a look at the BDA R Demos and the specific Assignment code notebooks
- Recommended additional self study exercises for each chapter in BDA3 are listed in the course web page. These will help to gain deeper understanding of the topic.
- Common questions and answers regarding installation and technical problems can be found in Frequently Asked Questions (FAQ).
- Deadlines for all assignments can be found on the course web page and in MyCourses.
- You are allowed to discuss assignments with your friends, but it is not allowed to copy solutions directly from other students or from internet.
- Do not share your answers publicly.
- Do not copy answers from the internet or from previous years. We compare the answers to the answers from previous years and to the answers from other students this year.
- Use of AI is allowed on the course, but the most of the work needs to by the student, and you need to report whether you used AI and in which way you used them (See points 5 and 6 in Aalto guidelines for use of AI in teaching).
- All suspected plagiarism will be reported and investigated. See more about the Aalto University Code of Academic Integrity and Handling Violations Thereof.
- If you have any suggestions or improvements to the course material, please post in the course chat feedback channel, create an issue, or submit a pull request to the public repository!
1.1 Assignment questions
For convenience the assignment questions are copied below. Answer the questions in MyCourses.
Lecture 4/Chapters 3 and 10 of BDA Quiz (96% of grade)
As, always, please see the provided code notebook from the website to help you get started on the coding tasks.
1. Mean estimates, Monte Carlo standard error (MCSE) and Pareto-k diagnostic
This task is about understanding and calculating Monte Carlo standard error.
1.1 Which of these correctly describes MCSE:
We will explore mean estimates and MCSE estimates of two distributions.
First is the gamma distribution, parameterised with shape (alpha) and rate (beta).
1.2 Look up the gamma distribution in Appendix A in BDA (or wikipedia). What is the mean of the gamma(alpha = 3, beta = 3) distribution?
Use the appropriate R function to draw a sample of size 1000 from a gamma(3, 3) distribution.
1.3 Which of these commands correctly does this:
One way to calculate the Monte Carlo standard error of the mean is to repeatedly take samples of draws from the distribution, calculate the mean of each sample and then the standard deviation of the means. Let's re-calculate the mean with a lower number of draws.
1.4 Which of these best describes the relationship between the empirical mean of the sample of size 400 and the analytical mean of the gamma(3, 3) distribution?
Simulate 2000 samples, each of 400 draws from the Gamma(3, 3), calculate and save the mean of each sample. Then calculate the standard deviation of the sample of means.
Adjust the following R code to do this:
# create a vector to store the sample means
sample_means <- rep(NA, times = NUMBER_OF_SAMPLES)
# loop over the number of samples
for (i in 1:NUMBER_OF_SAMPLES) {
# generate a sample from the gamma distribution of size 400
sample <- ...
# calculate the sample mean and save it to the vector
sample_means[i] <- ...
}
# calculate the standard deviation of the sample means
sd(...)
1.5 Enter the MCSE that is calculated from the code:
Now, explore the Cauchy(location = 0, scale = 1) distribution. The corresponding functions are named similarly to the gamma functions above.
1.6 What is the mean of the Cauchy(0, 1) distribution?
Change the simulation to take samples of the Cauchy(0, 1) distribution and calculate the MCSE of the mean. Try this a few times.
1.7 What do you notice about the MCSE for the mean of the Cauchy and gamma distributions?
The Central Limit Theorem (CLT) gives us the conditions under which it is also possible to estimate the MCSE from a single sample (see lecture slides and for excellent visual intuition, check out "But what is the Central Limit Theorem" by 3Blue1Brown ) for means and other functions of the posterior. The function mcse_mean from the posterior package provides this method (more on this is also mentioned in the lecture).
1.8 Let’s put the claims of the CLT to test. Assume you’ve taken \(N\) draws of a random variable and stored them in a vector \(\theta\). Assume also that the random variable follows a distribution with finite mean and variance, what is the correct formula for the MCSE of the mean? \(SD\) denotes the standard deviation and \(var\) the variance of \(\theta\).
1.9 Based on what you learned about the Gamma(3, 3) and the Cauchy(0, 1) distributions, which of these statements is correct?
The Pareto-k diagnostic can be used to diagnose whether the mean (or other moments) exists. The function posterior::pareto_khat() calculates this for a sample of draws. Calculate the Pareto-k for samples of 400 draws from the gamma(3, 3) and the Cauchy(0, 1) distributions.
1.10 Which of these is true:
MCSE can be calculated for any estimate, including quantiles and probability and can be used to guide how many digits to report for a given estimate. Suppose you calculate the following summary of a parameter:
mean = 0.483834 (MCSE = 0.003)
5% quantile = 0.234536 (MCSE = 0.01)
95% quantile = 1.34823 (MCSE = 0.2)
1.11 Based on recommendations from the lecture how should you report for the mean?
Lecture 4/Chapters 3 and 10 of BDA Quiz (96% of grade)
As, always, please see the provided code notebook from the website to help you get started on the coding tasks.
1. Mean estimates, Monte Carlo standard error (MCSE) and Pareto-k diagnostic
This task is about understanding and calculating Monte Carlo standard error.
1.1 Which of these correctly describes MCSE:
We will explore mean estimates and MCSE estimates of two distributions.
First is the gamma distribution, parameterised with shape (alpha) and rate (beta).
1.2 Look up the gamma distribution in Appendix A in BDA (or wikipedia). What is the mean of the gamma(alpha = 3, beta = 3) distribution?
Use the appropriate R function to draw a sample of size 1000 from a gamma(3, 3) distribution.
One way to calculate the Monte Carlo standard error of the mean is to repeatedly take samples of draws from the distribution, calculate the mean of each sample and then the standard deviation of the means. Let's re-calculate the mean with a lower number of draws.
1.4 Which of these best describes the relationship between the empirical mean of the sample of size 400 and the analytical mean of the gamma(3, 3) distribution?
Simulate 2000 samples, each of 400 draws from the Gamma(3, 3), calculate and save the mean of each sample. Then calculate the standard deviation of the sample of means.
Adjust the following R code to do this:
# create a vector to store the sample means
sample_means <- rep(NA, times = NUMBER_OF_SAMPLES)
# loop over the number of samples
for (i in 1:NUMBER_OF_SAMPLES) {
# generate a sample from the gamma distribution of size 400
sample <- ...
# calculate the sample mean and save it to the vector
sample_means[i] <- ...
}
# calculate the standard deviation of the sample means
sd(...)
1.5 Enter the MCSE that is calculated from the code:
Now, explore the Cauchy(location = 0, scale = 1) distribution. The corresponding functions are named similarly to the gamma functions above.
1.6 What is the mean of the Cauchy(0, 1) distribution?
Change the simulation to take samples of the Cauchy(0, 1) distribution and calculate the MCSE of the mean. Try this a few times.
1.7 What do you notice about the MCSE for the mean of the Cauchy and gamma distributions?
The Central Limit Theorem (CLT) gives us the conditions under which it is also possible to estimate the MCSE from a single sample (see lecture slides and for excellent visual intuition, check out "But what is the Central Limit Theorem" by 3Blue1Brown ) for means and other functions of the posterior. The function mcse_mean from the posterior package provides this method (more on this is also mentioned in the lecture).
1.8 Let’s put the claims of the CLT to test. Assume you’ve taken \(N\) draws of a random variable and stored them in a vector \(\theta\). Assume also that the random variable follows a distribution with finite mean and variance, what is the correct formula for the MCSE of the mean? \(SD\) denotes the standard deviation and \(var\) the variance of \(\theta\).
1.9 Based on what you learned about the Gamma(3, 3) and the Cauchy(0, 1) distributions, which of these statements is correct?
The Pareto-k diagnostic can be used to diagnose whether the mean (or other moments) exists. The function posterior::pareto_khat() calculates this for a sample of draws. Calculate the Pareto-k for samples of 400 draws from the gamma(3, 3) and the Cauchy(0, 1) distributions.
1.10 Which of these is true:
1.12 Based on recommendations from the lecture how should you report the 5% quantile?
1.13 Based on recommendations from the lecture how should you report the 95% quantile?
2. Bioassay model: Prior
In this exercise, you will use a dose-response relation model that is used in BDA3 Section 3.7 and in the chapter reading notes.
The likelihood is the same as in the book, but instead of uniform priors, we will use a bivariate normal distribution as the joint prior distribution of the parameters \(\alpha\) and \(\beta\).
In the prior distribution for \((\alpha,\beta)\), the marginal distributions are \(\alpha \sim N(0,2^2)\) and \(\beta \sim N(10,10^2)\), and the correlation between them is \(\mathrm{corr}(\alpha, \beta)=0.6\).
2.1 The mean of the prior distribution for \((\alpha, \beta)\) is:
2.2 The covariance of the prior distribution (two by two matrix) is:
3. Bioassay model: Importance sampling
4. Bioassay model: Independent posterior draws
You are given 4000 independent draws from the posterior distribution
of the model in the dataset `bioassay_posterior` in the `aaltobda` package.
4.1 The posterior can be summarised by the means and 90% posterior interval.
What is the mean, 5% and 95% quantiles of the posterior draws for the parameters?
mean of alpha: , 5% quantile of alpha: , 95% quantile of alpha:
mean of beta: , 5% quantile of beta: , 95% quantile of beta:
For alpha: MCSE for mean: , MCSE for 5% quantile: , MCSE for 95% quantile: