weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
7 106 12 1 1
8 125 14 1 1
9 149 16 1 1
10 171 18 1 1
Assignment 8
LOO-CV model comparison
1 General information
The maximum amount of points from this assignment is 6.
We have prepared a quarto template specific to this assignment (html, qmd, pdf) to help you get started.
We recommend you use jupyter.cs.aalto.fi or the docker container.
2 A hierarchical model for chicken weight time series
2.1 Exploratory data analysis
In the first part of this assignment, you will explore the dataset ChickWeight
. In particular, you will see what information is recorded in the dataset, and how you can use visualisation to learn more about the dataset. More information can be found on the corresponding page of the R documentation.
2.2 Linear regression
In this section, you will build a model that predicts the weight of a chicken over time and depending on the diet. After sampling from the posteriors, you will use posterior predictive checks to see how well the predictions match the observations. Then you will adjust the model by adding more complexity, and check again.
Next, you can use the bayesplot
package to check the posterior predictions in relation to the observed data using the pp_check
function. The function plots the \(y\) values, which are the observed data, and the \(y_\text{rep}\) values, which are replicated data sets from the posterior predictive distribution.
The default density plot is not always informative, but bayesplot
has different settings that can be used to create plots more appropriate for specific data.
2.3 Log-normal linear regression
Based on the identified issues from the posterior predictive check, the model can be improved. It is advisable to change only one or a few things about a model at once. At this stage, focus on changing the observation model family to better account for the observed data.
One option is to use the lognormal observation model, which only allows positive values. In brms
you can change the observation model family to this by setting the argument family = "lognormal"
. Note that when using the log-normal observation model, the regression coefficients represent the change in the log weight of a chicken. The priors have been adjusted accordingly in the template.
2.4 Hierarchical log-normal linear regression
The model can further be improved by directly considering potential differences in growth rate for individual chicken. Some chickens may innately grow faster than others, and this difference can be included by including both population and group level effects in to the model.
To include a group effect in brms
, the code + (predictor|group)
can be added to the model formula. In this case, the predictor is Time
and the group is Chick
.
2.5 Model comparison using the ELPD
There are many ways of comparing models1. Commonly, we evaluate point predictions, such as the mean of the predictive distribution2, or accuracy of the whole posterior predictive. Whether we prioritise point or density predictive accuracy may serve different purposes and lead to different outcomes for model choice 3. It is common, however, to report predictive accuracy via log-scores and point-predictive accuracy via root-mean-squared-error based on the empirical average of the predictive distribution. To cross-validate both metrics on left out observations without need to sample from each leave-one-out posterior, we use Pareto-smoothed importance sampling as discussed in the course materials (see Lecture 9).
We start comparing models based on the log-score. Use loo::loo()
and loo::loo_compare()
to quantify the differences in predictive performance.
2.6 Model comparison using the RMSE
3 Overall quality of the report
Footnotes
In principle, when comparing models based on accuracy in predictions or parameter estimation (if true parameter values are available to you, as e.g. in simulation studies), we want to use so called strictly proper scoring rules that will always indicate when a “better” model is better and the score reaches its uniquely defined best value at the “true” model, if it is also well defined. See Gneiting and Raftery, (2007) for and in depth treatment of this topic.↩︎
NOT predictions based on the mean of the posterior parameters, but first generating the predictive distribution and then computing an average.↩︎
For instance, a unimodal and bimodal predictive density may have the same expected value, but very different areas of high posterior density and therefore very different log-scores.↩︎