For inspiration for code, have a look at the BDA R Demos and the specific Assignment code notebooks
Recommended additional self study exercises for each chapter in BDA3 are listed in the course web page. These will help to gain deeper understanding of the topic.
Deadlines for all assignments can be found on the course web page and in MyCourses.
You are allowed to discuss assignments with your friends, but it is not allowed to copy solutions directly from other students or from internet.
Do not share your answers publicly.
Do not copy answers from the internet or from previous years. We compare the answers to the answers from previous years and to the answers from other students this year.
If you have any suggestions or improvements to the course material, please post in the course chat feedback channel, create an issue, or submit a pull request to the public repository!
Setup
The following will set-up markmyassignment to check your functions at the end of the notebook:
2 Mean estimates and the Monte Carlo standard error (MCSE)
In this exercise, we familiarise ourselves with the concept of the MCSE. As you have learned from the lecture, this allows us to determine how many decimal digits should be reported. In the below, we give you code to get started on quiz question 1.5 on MyCourses. See also The digits case study for further illustration.
set.seed(4711)NUMBER_OF_SAMPLES <-1NUMBER_OF_DRAWS <-1# create a vector to store the sample meanssample_means <-rep(NA, times = NUMBER_OF_SAMPLES)# loop over the number of samplesfor (i in1:NUMBER_OF_SAMPLES) {# generate a sample from the gamma distribution of size 400#sample <- # calculate the sample mean and save it to the vector#sample_means[i] <- mean(sample)}# calculate the standard deviation of the sample meanssd(sample_means)
[1] NA
3 Bioassay model and importance sampling
In this exercise, you will use a dose-response relation model that is used in BDA3 Section 3.7 and in the chapter reading instructions. The used likelihood is the same, but instead of uniform priors, we will use a bivariate normal distribution as the joint prior distribution of the parameters \(\alpha\) and \(\beta\).
In the prior distribution for \((\alpha,\beta)\), the marginal distributions are \(\alpha \sim N(0,2^2)\) and \(\beta \sim N(10,10^2)\), and the correlation between them is \(\mathrm{corr}(\alpha, \beta)=0.6\).
First, implement a function for computing the log importance ratios (log importance weights) when the importance sampling target distribution is the posterior distribution, and the proposal distribution is the prior distribution.
Non-log importance ratios are given by equation (10.3) in the course book. The fact that our proposal distribution is the same as the prior distribution makes this task easier. The logarithm of the likelihood can be computed with the bioassaylp function from the aaltobda package. The data required for the likelihood can be loaded with data("bioassay").
# Useful functions: bioassaylp (from aaltobda)alpha_test <-c(1.896, -3.6, 0.374, 0.964, -3.123, -1.581) beta_test <-c(24.76, 20.04, 6.15, 18.65, 8.16, 17.4)log_importance_weights<-function(alpha, beta){ # Do computation here, and return as below.# This is the correct return value for the test data provided above.c(-8.95, -23.47, -6.02, -8.13, -16.61, -14.57)}
Then, implement a function for computing normalized importance ratios from the unnormalized log ratios. In other words, exponentiate the log ratios and scale them such that they sum to one.
normalized_importance_weights <-function(alpha, beta) {# Do computation here, and return as below.# This is the correct return value for the test data provided above. c(0.045, 0.000, 0.852, 0.103, 0.000, 0.000) }
Then, Sample 4000 draws of \(\alpha\) and \(\beta\) from the prior distribution. Compute and plot a histogram of the 4000 normalized importance ratios (plotting not required for MyCourses quiz, but important for building intution and checking).
Use the function rmvnorm from the aaltobda package for sampling and the functions you implemented before.
Then, using the normalised importance weights, calculate the posterior mean.
posterior_mean<-function(alpha, beta){# Do computation here, and return as below.# This is the correct return value for the test data provided above. c(0.50326, 8.27495) }
Then, using the importance ratios, compute the importance sampling effective sample size \(S_{\text{eff}}\) and report it.
S_eff <-function(alpha, beta) { # Do computation here, and return as below. # This is the correct return value for the test data provided above. 1.354 }
Equation (10.4) in the course book.
BDA3 1st (2013) and 2nd (2014) printing have an error for\(\tilde{w}(\theta^s)\) used in the effective sample size equation (10.4). The normalized weights equation should not have the multiplier S (the normalized weights should sum to one). The later printings, the online version, and the slides have the correct equation.
Finally, let’s compute the posterior mean MCSE and report the number of digits for the means based on the MCSEs.
The values below are only a test case, you need to use 4000 draws for \(\alpha\) and \(\beta\) in the final report.
Use the same equation for the MCSE of \(\text{E}[\theta]\) as earlier (\(\sqrt{\text{Var} [\theta]/S}\)), but now replace \(S\) with \(S_{\text{eff}}\). To compute \(\text{Var} [\theta]\) with importance sampling, use the identity \(\text{Var}[\theta] = \text{E}[\theta^2] - \text{E}[\theta]^2\).
posterior_mean_MCSE <-function(alpha,beta) {# Do computation here, and return as below.# This is the correct return value for the test data provided above. c(0.3031995,4.4797743) }
4 Bioassay model
You are given 4000 independent draws from the posterior distribution of the model in the dataset bioassay_posterior in the aaltobda package.
The number of significant digits can be different for the mean and quantile estimates. In some other cases, the number of digits reported can be less than MCSE allows for practical reasons as discussed in the lecture.
Hint:
Quantiles can be computed with the quantile function. With \(S\) draws, the MCSE for \(\text{E}[\theta]\) is \(\sqrt{\text{Var} [\theta]/S}\). MCSE for the quantile estimates can be computed with the mcse_quantile function from the aaltobda package.
If you are unsure how to proceed, find the error(s) in the code below:
---title: "Notebook for Assignment 4"author: "Aki Vehtari et al."format: html: toc: true code-tools: true code-line-numbers: true number-sections: true mainfont: Georgia, serifeditor: source ---# General informationThis assignment is related to Lecture 4 and BDA3 Chapters 3 and 10.::: hint**Reading instructions:**- [**The reading instructions for BDA3 Chapter 3**](../BDA3_notes.html#ch3).- [**The reading instructions for BDA3 Chapter 10**](../BDA3_notes.html#ch10).:::{{< include includes/_general_cloze_instructions.md >}}::: {.callout-warning collapse="false"}## SetupThe following will set-up [`markmyassignment`](https://github.com/MansMeg/markmyassignment) to check your functions at the end of the notebook:```{r}if(!require(markmyassignment)){install.packages("markmyassignment")library(markmyassignment)}assignment_path =paste("https://github.com/avehtari/BDA_course_Aalto/blob/master/tests/assignment4.yml", sep="")set_assignment(assignment_path) ```The following installs and loads the `aaltobda` package:```{r}if(!require(aaltobda)){install.packages("remotes") remotes::install_github("avehtari/BDA_course_Aalto", subdir ="rpackage", upgrade="never")library(aaltobda)}```The following installs and loads the [`latex2exp` package](https://github.com/stefano-meschiari/latex2exp), which allows us to use LaTeX in plots:```{r}if(!require(latex2exp)){install.packages("latex2exp")library(latex2exp)}```:::::: showcase## Setting up advanced packages (`posterior` and `ggdist`)The following installs and loads the [`posterior` package](https://mc-stan.org/posterior/index.html), which allows us to use its [`rvar` Random Variable Datatype](https://mc-stan.org/posterior/articles/rvar.html):```{r}if(!require(posterior)){install.packages("posterior")library(posterior)}```The following installs and loads the [`ggdist` package](https://mjskay.github.io/ggdist/) for advanced plotting functions:```{r}if(!require(ggplot2)){install.packages("ggplot2")library(ggplot2)}ggplot2::theme_set(theme_minimal(base_size =14))if(!require(ggdist)){install.packages("ggdist")library(ggdist)}```:::# Mean estimates and the Monte Carlo standard error (MCSE)In this exercise, we familiarise ourselves with the concept of the MCSE. As you have learned from the lecture, this allows us to determine how many decimal digits should be reported. In the below, we give you code to get started on quiz question 1.5 on MyCourses. See also [**The digits case study**](https://users.aalto.fi/~ave/casestudies/Digits/digits.html) for further illustration.```{r}set.seed(4711)NUMBER_OF_SAMPLES <-1NUMBER_OF_DRAWS <-1# create a vector to store the sample meanssample_means <-rep(NA, times = NUMBER_OF_SAMPLES)# loop over the number of samplesfor (i in1:NUMBER_OF_SAMPLES) {# generate a sample from the gamma distribution of size 400#sample <- # calculate the sample mean and save it to the vector#sample_means[i] <- mean(sample)}# calculate the standard deviation of the sample meanssd(sample_means)```# Bioassay model and importance samplingIn this exercise, you will use a dose-response relation model that is used in BDA3 Section 3.7 and in [**the chapter reading instructions**](../BDA3_notes.html#ch3). The used likelihood is the same, but instead of uniform priors, we will use a bivariate normal distribution as the joint prior distribution of the parameters $\alpha$ and $\beta$.In the prior distribution for $(\alpha,\beta)$, the marginal distributions are $\alpha \sim N(0,2^2)$ and $\beta \sim N(10,10^2)$, and the correlation between them is $\mathrm{corr}(\alpha, \beta)=0.6$.First, implement a function for computing the log importance ratios (log importance weights) when the importance sampling **target distribution** is the posterior distribution, and the **proposal distribution** is the prior distribution.::: hintNon-log importance ratios are given by equation (10.3) in the course book. The fact that our proposal distribution is the same as the prior distribution makes this task easier. The **logarithm** of the likelihood can be computed with the `bioassaylp` function from the `aaltobda` package. The data required for the likelihood can be loaded with `data("bioassay")`.:::```{r}# Useful functions: bioassaylp (from aaltobda)alpha_test <-c(1.896, -3.6, 0.374, 0.964, -3.123, -1.581) beta_test <-c(24.76, 20.04, 6.15, 18.65, 8.16, 17.4)log_importance_weights<-function(alpha, beta){ # Do computation here, and return as below.# This is the correct return value for the test data provided above.c(-8.95, -23.47, -6.02, -8.13, -16.61, -14.57)}```Then, implement a function for computing normalized importance ratios from the unnormalized log ratios. In other words, exponentiate the log ratios and scale them such that they sum to one.```{r}normalized_importance_weights <-function(alpha, beta) {# Do computation here, and return as below.# This is the correct return value for the test data provided above. c(0.045, 0.000, 0.852, 0.103, 0.000, 0.000) }```Then, Sample 4000 draws of $\alpha$ and $\beta$ from the prior distribution. Compute and plot a histogram of the 4000 normalized importance ratios (plotting not required for MyCourses quiz, but important for building intution and checking).::: hintUse the function `rmvnorm` from the `aaltobda` package for sampling and the functions you implemented before.:::Then, using the normalised importance weights, calculate the posterior mean.```{r}posterior_mean<-function(alpha, beta){# Do computation here, and return as below.# This is the correct return value for the test data provided above. c(0.50326, 8.27495) }```Then, using the importance ratios, compute the importance sampling effective sample size $S_{\text{eff}}$ and report it.```{r}S_eff <-function(alpha, beta) { # Do computation here, and return as below. # This is the correct return value for the test data provided above. 1.354 }```::: hintEquation (10.4) in the course book.*BDA3 1st (2013) and 2nd (2014) printing have an error for* $\tilde{w}(\theta^s)$ used in the effective sample size equation (10.4). The normalized weights equation should not have the multiplier S (the normalized weights should sum to one). The later printings, the online version, and the slides have the correct equation.:::Finally, let's compute the posterior mean MCSE and report the number of digits for the means based on the MCSEs.::: hintThe values below are *only* a test case, you need to use 4000 draws for $\alpha$ and $\beta$ in the final report.Use the same equation for the MCSE of $\text{E}[\theta]$ as earlier ($\sqrt{\text{Var} [\theta]/S}$), but now replace $S$ with $S_{\text{eff}}$. To compute $\text{Var} [\theta]$ with importance sampling, use the identity $\text{Var}[\theta] = \text{E}[\theta^2] - \text{E}[\theta]^2$.:::```{r}posterior_mean_MCSE <-function(alpha,beta) {# Do computation here, and return as below.# This is the correct return value for the test data provided above. c(0.3031995,4.4797743) }```# Bioassay modelYou are given 4000 independent draws from the posterior distribution of the model in the dataset `bioassay_posterior` in the `aaltobda` package.::: hintThe number of significant digits can be different for the mean and quantile estimates. In some other cases, the number of digits reported can be less than MCSE allows for practical reasons as discussed in the lecture.**Hint**:Quantiles can be computed with the `quantile` function. With $S$ draws, the MCSE for $\text{E}[\theta]$ is $\sqrt{\text{Var} [\theta]/S}$. MCSE for the quantile estimates can be computed with the `mcse_quantile` function from the `aaltobda` package.:::If you are unsure how to proceed, find the error(s) in the code below:```{r}bioassay_posterior^2|>as_draws() |>mutate_variables(LD50 =exp(-beta/alpha)) |>mutate_variables(P =as.numeric(LD50 <0.05)) |>summarise_draws(mean, mcse_mean)```