This is the template for assignment 6. You can download the broken stan-file and the qmd-file or copy the code from this rendered document after clicking on </> Code in the top right corner.
Please replace the instructions in this template by your own text, explaining what you are doing in each exercise.
Setup
This block will only be visible in your HTML output, but will be hidden when rendering to PDF with quarto for the submission.Make sure that this does not get displayed in the PDF!
JupyterHub has all the needed packages pre-installed.
The following installs and loads the aaltobda package:
# Some additional set-up to make plots legibleggplot2::theme_set(theme_minimal(base_size =14))bayesplot::bayesplot_theme_set(theme_minimal(base_size =14))# register_knitr_engine()
The following installs and loads the cmdstanr package and tries to install cmdstan.
- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
- CmdStan path: /root/.cmdstan/cmdstan-2.31.0
- CmdStan version: 2.31.0
A newer version of CmdStan is available. See ?install_cmdstan() to install it.
To disable this check set option or environment variable CMDSTANR_NO_VER_CHECK=TRUE.
cmdstan_installed <-function(){ res <-try(out <- cmdstanr::cmdstan_path(), silent =TRUE)!inherits(res, "try-error")}if(!cmdstan_installed()){install_cmdstan()}
2 Stan warm-up: linear model of BDA retention with Stan (2 points)
2.1 (b)
Write your answers/code here!
Data preparation and sampling from the posterior
This block will only be visible in your HTML output, but will be hidden when rendering to PDF with quarto for the submission.Make sure that this does not get displayed in the PDF!
Data assembly happens here:
# These are our observations y: the proportion of students handing in each assignment (1-8),# sorted by year (row-wise) and assignment (column-wise).# While the code suggest a matrix structure,# the result will actually be a vector of length N = no_years * no_assignmentspropstudents<-c(c(176, 174, 158, 135, 138, 129, 126, 123)/176,c(242, 212, 184, 177, 174, 172, 163, 156)/242,c(332, 310, 278, 258, 243, 242, 226, 224)/332,c(301, 269, 231, 232, 217, 208, 193, 191)/301,c(245, 240, 228, 217, 206, 199, 191, 182)/245)# These are our predictors x: for each observation, the corresponding assignment number.assignment <-rep(1:8, 5)# These are in some sense our test data: the proportion of students handing in the last assignment (9),# sorted by year.# Usually, we would not want to split our data like that and instead# use e.g. Leave-One-Out Cross-Validation (LOO-CV, see e.g. http://mc-stan.org/loo/index.html)# to evaluate model performance.propstudents9 =c(121/176, 153/242, 218/332, 190/301, 175/245)# The total number of assignmentsno_assignments =9# The assignment numbers for which we want to generate predictionsx_predictions =1:no_assignments# (Cmd)Stan(R) expects the data to be passed in the below format:model_data =list(N=length(assignment),x=assignment,y=propstudents,no_predictions=no_assignments,x_predictions=x_predictions)
Sampling from the posterior distribution happens here:
# This reads the file at the specified path and tries to compile it.# If it fails, an error is thrown.retention_model =cmdstan_model("./additional_files/assignment6_linear_model.stan")
Error in initialize(...): Assertion on 'stan_file' failed: File does not exist: './additional_files/assignment6_linear_model.stan'.
# This "out <- capture.output(...)" construction suppresses output from cmdstanr# See also https://github.com/stan-dev/cmdstanr/issues/646out <-capture.output(# Sampling from the posterior distribution happens here: fit <- retention_model$sample(data=model_data, refresh=0, show_messages=FALSE))
Error in withVisible(...elt(i)): object 'retention_model' not found
Draws postprocessing happens here:
# This extracts the draws from the sampling result as a data.frame.draws_df = fit$draws(format="draws_df")
Error in eval(expr, envir, enclos): object 'fit' not found
# This does some data/draws wrangling to compute the 5, 50 and 95 percentiles of# the mean at the specified covariate values (x_predictions).# It can be instructive to play around with each of the data processing steps# to find out what each step does, e.g. by removing parts from the back like "|> gather(pct,y,-x)"# and printing the resulting data.frame.mu_quantiles_df = draws_df |>subset_draws(variable =c("mu_pred")) |>summarise_draws(~quantile2(.x, probs =c(0.05, .5, 0.95))) |>mutate(x =1:9) |>pivot_longer(c(q5, q50, q95), names_to =c("pct"))
Error in UseMethod("subset_draws"): no applicable method for 'subset_draws' applied to an object of class "function"
# Same as above, but for the predictions.y_quantiles_df = draws_df |>subset_draws(variable =c("y_pred")) |>summarise_draws(~quantile2(.x, probs =c(0.05, .5, 0.95))) |>mutate(x =1:9) |>pivot_longer(c(q5, q50, q95), names_to =c("pct"))
Error in UseMethod("subset_draws"): no applicable method for 'subset_draws' applied to an object of class "function"
Plotting happens here:
ggplot() +# scatter plot of the training data:geom_point(aes(x, y, color=assignment),data=data.frame(x=assignment, y=propstudents, assignment="1-8")) +# scatter plot of the test data:geom_point(aes(x, y, color=assignment),data=data.frame(x=no_assignments, y=propstudents9, assignment="9")) +# you have to tell us what this plots:geom_line(aes(x,y=value,linetype=pct), data=mu_quantiles_df, color='grey', linewidth=1.5) +# you have to tell us what this plots:geom_line(aes(x,y=value,linetype=pct), data=y_quantiles_df, color='red') +# adding xticks for each assignment:scale_x_continuous(breaks=1:no_assignments) +# adding labels to the plot:labs(y="assignment submission %", x="assignment number") +# specifying that line types repeat:scale_linetype_manual(values=c(2,1,2)) +# Specify colours of the observations:scale_colour_manual(values =c("1-8"="black", "9"="blue")) +# remove the legend for the linetypes:guides(linetype="none")
Error in fortify(data): object 'mu_quantiles_df' not found
Quick check for sampling convergence
This block will only be visible in your HTML output, but will be hidden when rendering to PDF with quarto for the submission.Make sure that this does not get displayed in the PDF!
If your model is correctly implemented, sampling from the posterior distribution should have been successful. You can check whether Stan thinks that sampling succeeded by inspecting the output of the below command, which you should be able to interpret with a little help from the CmdStan User’s Guide.
fit$cmdstan_diagnose()
Error in eval(expr, envir, enclos): object 'fit' not found
2.2 (c)
Write your answers/code here!
3 Generalized linear model: Bioassay with Stan (4 points)
---title: "Assignment 6"author: anonymous # <-- hand in anonymouslyformat: html: toc: true code-tools: true code-line-numbers: true number-sections: true mainfont: Georgia, serif page-layout: article pdf: geometry: - left=1cm,top=1cm,bottom=1cm,right=7cm number-sections: true code-annotations: noneeditor: source---# General informationThis is the template for [assignment 6](assignment6.html). You can download the [broken stan-file](./additional_files/assignment6_linear_model.stan) and the [qmd-file](https://avehtari.github.io/BDA_course_Aalto/assignments/template6.qmd) or copy the code from this rendered document after clicking on `</> Code` in the top right corner.**Please replace the instructions in this template by your own text, explaining what you are doing in each exercise.**:::: {.content-hidden when-format="pdf"}::: {.callout-warning collapse=false}## Setup*This block will only be visible in your HTML output, but will be hidden when rendering to PDF with quarto for the submission.***Make sure that this does not get displayed in the PDF!**JupyterHub has all the needed packages pre-installed.The following installs and loads the `aaltobda` package:```{r}if(!require(aaltobda)){install.packages("aaltobda", repos =c("https://avehtari.github.io/BDA_course_Aalto/", getOption("repos")))library(aaltobda)}```The following installs and loads the [`latex2exp` package](https://github.com/stefano-meschiari/latex2exp), which allows us to use LaTeX in plots:```{r}if(!require(latex2exp)){install.packages("latex2exp")library(latex2exp)}```The following installs and loads the [`posterior` package](https://github.com/stan-dev/posterior) which imports the `rhat_basic()` function:```{r}if(!require(posterior)){install.packages("posterior")library(posterior)}```The following installs and loads the [`ggplot2` package](https://ggplot2.tidyverse.org/), the [`bayesplot` package](https://mc-stan.org/bayesplot/index.html) and the [`dplyr` package](https://dplyr.tidyverse.org/)```{r}if(!require(ggplot2)){install.packages("ggplot2")library(ggplot2)}if(!require(bayesplot)){install.packages("bayesplot")library(bayesplot)}if(!require(dplyr)){install.packages("dplyr")library(dplyr)}if(!require(tidyr)){install.packages("tidyr")library(tidyr)}# Some additional set-up to make plots legibleggplot2::theme_set(theme_minimal(base_size =14))bayesplot::bayesplot_theme_set(theme_minimal(base_size =14))# register_knitr_engine()```The following installs and loads the [`cmdstanr` package](https://mc-stan.org/cmdstanr/) and tries to install `cmdstan`.```{r}if(!require(cmdstanr)){install.packages("cmdstanr", repos =c("https://mc-stan.org/r-packages/", getOption("repos")))library(cmdstanr)}cmdstan_installed <-function(){ res <-try(out <- cmdstanr::cmdstan_path(), silent =TRUE)!inherits(res, "try-error")}if(!cmdstan_installed()){install_cmdstan()}```:::::::# Stan warm-up: linear model of BDA retention with Stan (2 points)## (b)Write your answers/code here!:::: {.content-hidden when-format="pdf"}::: {.callout-warning collapse=false}## Data preparation and sampling from the posterior*This block will only be visible in your HTML output, but will be hidden when rendering to PDF with quarto for the submission.***Make sure that this does not get displayed in the PDF!****Data assembly happens here**:```{r}#| warning: false# These are our observations y: the proportion of students handing in each assignment (1-8),# sorted by year (row-wise) and assignment (column-wise).# While the code suggest a matrix structure,# the result will actually be a vector of length N = no_years * no_assignmentspropstudents<-c(c(176, 174, 158, 135, 138, 129, 126, 123)/176,c(242, 212, 184, 177, 174, 172, 163, 156)/242,c(332, 310, 278, 258, 243, 242, 226, 224)/332,c(301, 269, 231, 232, 217, 208, 193, 191)/301,c(245, 240, 228, 217, 206, 199, 191, 182)/245)# These are our predictors x: for each observation, the corresponding assignment number.assignment <-rep(1:8, 5)# These are in some sense our test data: the proportion of students handing in the last assignment (9),# sorted by year.# Usually, we would not want to split our data like that and instead# use e.g. Leave-One-Out Cross-Validation (LOO-CV, see e.g. http://mc-stan.org/loo/index.html)# to evaluate model performance.propstudents9 =c(121/176, 153/242, 218/332, 190/301, 175/245)# The total number of assignmentsno_assignments =9# The assignment numbers for which we want to generate predictionsx_predictions =1:no_assignments# (Cmd)Stan(R) expects the data to be passed in the below format:model_data =list(N=length(assignment),x=assignment,y=propstudents,no_predictions=no_assignments,x_predictions=x_predictions)```**Sampling from the posterior distribution happens here**:```{r}#| warning: false# This reads the file at the specified path and tries to compile it.# If it fails, an error is thrown.retention_model =cmdstan_model("./additional_files/assignment6_linear_model.stan")# This "out <- capture.output(...)" construction suppresses output from cmdstanr# See also https://github.com/stan-dev/cmdstanr/issues/646out <-capture.output(# Sampling from the posterior distribution happens here: fit <- retention_model$sample(data=model_data, refresh=0, show_messages=FALSE))```**Draws postprocessing happens here**:```{r}# This extracts the draws from the sampling result as a data.frame.draws_df = fit$draws(format="draws_df")# This does some data/draws wrangling to compute the 5, 50 and 95 percentiles of# the mean at the specified covariate values (x_predictions).# It can be instructive to play around with each of the data processing steps# to find out what each step does, e.g. by removing parts from the back like "|> gather(pct,y,-x)"# and printing the resulting data.frame.mu_quantiles_df = draws_df |>subset_draws(variable =c("mu_pred")) |>summarise_draws(~quantile2(.x, probs =c(0.05, .5, 0.95))) |>mutate(x =1:9) |>pivot_longer(c(q5, q50, q95), names_to =c("pct"))# Same as above, but for the predictions.y_quantiles_df = draws_df |>subset_draws(variable =c("y_pred")) |>summarise_draws(~quantile2(.x, probs =c(0.05, .5, 0.95))) |>mutate(x =1:9) |>pivot_longer(c(q5, q50, q95), names_to =c("pct"))```:::::::::: {.both}**Plotting happens here**:```{r}#| label: fig-posterior#| fig-cap: Describe me in your submission!ggplot() +# scatter plot of the training data:geom_point(aes(x, y, color=assignment),data=data.frame(x=assignment, y=propstudents, assignment="1-8")) +# scatter plot of the test data:geom_point(aes(x, y, color=assignment),data=data.frame(x=no_assignments, y=propstudents9, assignment="9")) +# you have to tell us what this plots:geom_line(aes(x,y=value,linetype=pct), data=mu_quantiles_df, color='grey', linewidth=1.5) +# you have to tell us what this plots:geom_line(aes(x,y=value,linetype=pct), data=y_quantiles_df, color='red') +# adding xticks for each assignment:scale_x_continuous(breaks=1:no_assignments) +# adding labels to the plot:labs(y="assignment submission %", x="assignment number") +# specifying that line types repeat:scale_linetype_manual(values=c(2,1,2)) +# Specify colours of the observations:scale_colour_manual(values =c("1-8"="black", "9"="blue")) +# remove the legend for the linetypes:guides(linetype="none")```::::::: {.content-hidden when-format="pdf"}::: {.callout-warning collapse=false}## Quick check for sampling convergence*This block will only be visible in your HTML output, but will be hidden when rendering to PDF with quarto for the submission.***Make sure that this does not get displayed in the PDF!**If your model is correctly implemented, sampling from the posterior distribution should have been successful.You can check whether Stan thinks that sampling succeeded by inspecting the output of the below command,which you should be able to interpret with a little help from the [CmdStan User's Guide](https://mc-stan.org/docs/cmdstan-guide/diagnose.html).```{r}fit$cmdstan_diagnose()```:::::::## (c)Write your answers/code here!# Generalized linear model: Bioassay with Stan (4 points)## (d)Write your answers/code here!```{r}data("bioassay")```## (e)Write your answers/code here!## (f)Write your answers/code here!## (g)Write your answers/code here!