Notebook for Assignment 2

Author

Aki Vehtari et al.

1 General information

This assignment is related to Lecture 2 and BDA3 Chapters 1 and 2. You may find an additional discussion about choosing priors in a blog post by Andrew Gelman.

General Instructions for Answering the Assignment Questions
  • Questions below are exact copies of the text found in the MyCourses quiz and should serve as a notebook where you can store notes and code.
  • We recommend opening these notebooks in the Aalto JupyterHub, see how to use R and RStudio remotely.
  • For inspiration for code, have a look at the BDA R Demos and the specific Assignment code notebooks
  • Recommended additional self study exercises for each chapter in BDA3 are listed in the course web page. These will help to gain deeper understanding of the topic.
  • Common questions and answers regarding installation and technical problems can be found in Frequently Asked Questions (FAQ).
  • Deadlines for all assignments can be found on the course web page and in MyCourses.
  • You are allowed to discuss assignments with your friends, but it is not allowed to copy solutions directly from other students or from internet.
  • Do not share your answers publicly.
  • Do not copy answers from the internet or from previous years. We compare the answers to the answers from previous years and to the answers from other students this year.
  • Use of AI is allowed on the course, but the most of the work needs to by the student, and you need to report whether you used AI and in which way you used them (See points 5 and 6 in Aalto guidelines for use of AI in teaching).
  • All suspected plagiarism will be reported and investigated. See more about the Aalto University Code of Academic Integrity and Handling Violations Thereof.
  • If you have any suggestions or improvements to the course material, please post in the course chat feedback channel, create an issue, or submit a pull request to the public repository!
Setup

The following installs the aaltobda package:

#| cache: true
# Caching should be fine here
install.packages("aaltobda", repos = c("https://avehtari.github.io/BDA_course_Aalto/", getOption("repos")))

2 Inference for binomial proportion

Algae status is monitored in 274 sites at Finnish lakes and rivers. The observations for the 2008 algae status at each site are presented in the dataset algae in the aaltobda package (‘0’: no algae, ‘1’: algae present).

Loading the library and the data.

library(aaltobda)
library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(latex2exp)
data("algae")
# The data are now stored in the variable `algae`.
# These are the values for the prior required in the assignment
prior_alpha = 2
prior_beta = 10

Let \(\pi\) be the probability of a monitoring site having detectable blue-green algae levels and \(y\) the observations in algae. Use a binomial model for the observations \(y\) and a \(Beta(2,10)\) prior for binomial model parameter \(\pi\) to formulate a Bayesian model. Here it is not necessary to derive the posterior distribution for \(\pi\) as it has already been done in the book and it suffices to refer to that derivation. Also, it is not necessary to write out the distributions; it is sufficient to use label-parameter format, e.g. \(Beta(\alpha,\beta)\).

Your task is to perform Bayesian inference for a binomial model and answer questions based on it.

3 Formulating probabilities

4 Summary of the posterior distribution of \(\theta\)

Keep the below name and format for the functions to work with markmyassignment:

# Useful function: qbeta()

beta_point_est <- function(prior_alpha, prior_beta, data) {
    # Do computation here, and return as below.
    
}
beta_interval <- function(prior_alpha, prior_beta, data, prob=0.9) {
    # Do computation here, and return as below.


}

5 Comparison to historical records

Keep the below name and format for the function to work with markmyassignment:

# Useful function: pbeta()

beta_low <- function(prior_alpha, prior_beta, data, pi_0=0.2) {
    # Do computation here, and return as below.

}

6 Prior sensitivity analysis

Make prior sensitivity analysis by testing a couple of different reasonable priors and plot the different posteriors. Try to summarise the results by one or two sentences.

# Useful function: dbeta()

Amend the code below to help you.

# Beta distribution density function
beta_density <- function(x, shape1, shape2) {
  dbeta(x, shape1, shape2)
}

# Binomial likelihood density function
binomial_likelihood <- function(x, n, k) {
  choose(n, k) * x^k * (1 - x)^(n - k)
}

# Parameters for the Beta prior distributions
priors <- list(c(#alpha, #beta), c(#alpha, #beta), c(#alpha, #beta)) # Define three different priors here priors, (alpha, beta)

# Parameters for the binomial likelihood (number of trials and successes)
n <-  # number of trials
k <-  # number of successes

# Create a sequence of x values from 0 to 1
x <- seq(0, 1, length.out = 100)


# Create a data frame with densities
densities_list <- lapply(priors, function(prior) {
  alpha_prior <- prior[1]
  beta_prior <- prior[2]
  alpha_posterior <- alpha_prior + k
  beta_posterior <- beta_prior + n - k
  
  data.frame(
    x = rep(x, 2),
    density = c(beta_density(x, alpha_prior, beta_prior),
                beta_density(x, alpha_posterior, beta_posterior)),
    distribution = factor(rep(c("Prior", "Posterior"), each = length(x))),
    prior = paste("Prior: Alpha =", alpha_prior, "Beta =", beta_prior)
  )
})

df <- bind_rows(densities_list)

# Plot the densities using ggplot2
ggplot(df, aes(x = x, y = density, color = distribution)) +
  geom_line(size = 1) +
  facet_wrap(~ prior, scales = "fixed", nrow = 3) +
  labs(title = "Prior and Posterior Densities for Different Priors", 
       x = TeX("$\\theta$"), 
       y = "Density", 
       color = "Distribution") +
  theme_minimal()