Bayesian Data Analysis Global South (GSU) 2021

Lecturer: Aki Vehtari

- Max 300 students with priority for global south and other underrepresented groups (GSU).
- From 4th March (first assignment deadline 12th March) to 28th May .
- All the material (textbook, videos, assignments, extra reading material) are freely available (see below) so you can also self-study in your own pace.
- The course is free (no cost) and possible to organize with help of volunteer TAs.
- This BDA course instance is aimed to support learning with peer support. By following the videos and doing assignments at the same time with others, you can discuss the material in assignments in the course slack, there is peer-grading platform to get feedback about your assignment solutions, and voluntary TAs help answering questions. As everything is volunteer based we can’t guarantee quick responses, but at least you will get something more than when studying only by yourself.
- This course is not the easiest Bayesian course available in internet, but it can be your first Bayesian course if your mathematical and programming skills are sufficient. See the prerequisites below. For easier material to start with see the end of Prerequisites section below.
- You will not get a formal certificate for passing the course from Aalto University.
- The communication happens in the course slack, please don’t email the lecturer or TAs. The slack link has been emailed to the accepted students.

All the course material is available in a git repo and via these pages are for easier navigation. All the material can be used in other courses. Text and videos licensed under CC-BY-NC 4.0. Code licensed under BSD-3.

There has been some dropouts and the registration has been re-opened here.

We have some volunteer TAs already, but a a few more would be great. All TAs will get a personal certificate from the lecturer Aki Vehtari if they actively participate helping students, answering questions, and possibly organize some TA sessions. We assume to have enough TAs that no-one needs to take part every week of the course and you can drop out if other obligations require so. The lecturer will support TAs.

To register as volunteer TA fill in your information (email, country, prerequisites check, a brief comment on justification why you think you can be a TA) here.

The electronic version of the course book Bayesian Data Analysis, 3rd ed, by by Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin is available for non-commercial purposes. Hard copies are available from the publisher and many book stores. See also home page for the book, errata for the book, and chapter notes.

- Basic terms of probability theory
- probability, probability density, distribution
- sum, product rule, and Bayes’ rule
- expectation, mean, variance, median
- in English, see e.g. Wikipedia and Introduction to probability and statistics

- Some algebra and calculus
- Basic visualisation techniques (R or Python)
- histogram, density plot, scatter plot
- see e.g. BDA R demos
- see e.g. BDA Python demos

This course has been designed so that there is strong emphasis in computational aspects of Bayesian data analysis and using the latest computational tools.

If you find BDA3 too difficult to start with, I recommend

- For regression models, their connection to statistical testing and causal analysis see Gelman, Hill and Vehtari, “Regression and Other Stories”.
- Richard McElreath’s Statistical Rethinking, 2nd ed book is easier than BDA3 and the 2nd ed is excellent. Statistical Rethinking doesn’t go as deep in some details, math, algorithms and programming as BDA course. Richard’s lecture videos of Statistical Rethinking: A Bayesian Course Using R and Stan are highly recommended even if you are following BDA3.
- For background prerequisites some students have found chapters 2, 4 and 5 in Kruschke, “Doing Bayesian Data Analysis” useful.

- The primary communication channel is the course slack. The slack link has been emailed to the accepted students.
- Don’t ask via email or direct messages. By asking via common channels in the course chat, more eyes will see your question, it will get answered faster and it’s likely that other students benefit from the answer.
- In the chat system, there are separate channels for each assignment and the project.
- Channel
**#general**can be used for any kind of general discussions and questions related to the course. - All important announcements will be posted to
**#announcements**(no discussion on this channel). - Any kind of feedback is welcome on channel
**#feedback**. - We have also channels
**#r**,**#python**, and**#stan**for questions that are not specific to assignments or the project. - The lecturer and teaching assistants have names with “(staff)” or “(TA)” in the end of their names.

- The lecturer will answer weekly the best questions about the material, either in text, recording additional video clips, or in live Q&A session (in this course instance no guarantee for weekly live session)
- In this course instance all the TAs are volunteers from different time zones, but there may be possibility for live “TA sessions” with TAs depending on the volunteers
- If you find errors in material, post in
**#feedback**channel or submit an issue in github. - Peergrade alerts: If you are worried that you forget the deadlines, you can set peergade to send you email when assignment opens for submission, 24 hours before assignment close for submission, assignment is open for reviewing, 24 hours before an assignment closes for reviewing if you haven’t started yet, someone likes my feedback (once a day). Click your name -> User Settings to choose which alerts you want.

Assignments (67%) and a project work with presentation (33%). Minimum of 50% of points must be obtained from both the assignments and project work. But as in this course there is no formal certificate, the assignment scores are just for your own self-evaluation. The biggest benefit from the course is the support and feedback from other students and volunteer TAs.

We use peergrade.io for providing peer feedback for the assignments and the project work. See more information at Assignments.

The course consists of 12 blocks from March to May 2021. The blocks don’t match exactly specific weeks. For example, it’s good start reading the material for the next block while making the assignment for one block. There are 9 assignments and a project work with presentation, and thus the assignments are not in one-to-one correspondence with the blocks. The schedule below lists the blocks and how they connect to the topics, book chapters and assignments.

Here is an overview of the schedule. Scroll down the page to see detailed instructions for each block. Remember that blocks are overlapping so that when you are working on assignment for one block, you should start watching videos and reading text for the next block.

Course practicalities, material, assignments, project work, peergrading, TA sessions, prerequisites, chat, etc.

- Login to the course chat (link will be emailed to the registered students and volunteer TAs)
- Signin to Peergrade with the class code shared in email
- Watch videos
- Computational probabilistic modeling (15min)
- Introduction to uncertainty and modelling (18min)
- Introduction to the course contents (8min)
- Slides /- Read BDA3 Chapter 1
- start with reading instructions for Chapter 1 and afterwards read the additional comments in the same document

- There are no R/Python demos for Chapter 1
- Make and submit Assignment 1.
**Deadline Friday 14 March 23:59 UTC+2**- this assignment checks that you have sufficient prerequisite skills (basic probability calculus, and R or Python)
- General information about assignments
- R markdown template for assignments
- FAQ for the assignments has solutions to commonly asked questions related RStudio setup, errors during package installations, etc.

- Optional: Make BDA3 exercises 1.1-1.4, 1.6-1.8 (model solutions available for 1.1-1.6)
- Start reading Chapters 1+2, see instructions below

BDA3 Chapters 1+2, basics of Bayesian inference, observation model, likelihood, posterior and binomial model, predictive distribution and benefit of integration, priors and prior information, and one parameter normal model.

- Read BDA3 Chapter 2
- Watch videos Lecture 2.1 and Lecture 2.2 on basics of Bayesian inference, observation model, likelihood, posterior and binomial model, predictive distribution and benefit of integration, priors and prior information, and one parameter normal model. BDA3 Ch 1+2.
- Optional summary videos:
- Read the additional comments for Chapter 2
- Check R demos or Python demos for Chapter 2
- Make and submit Assignment 2.
**Deadline Friday 19 March 23:59 UTC+2**- Rubric questions used in peergrading for Assignment 2
- Review Assignment 1 done by your peers before 23:59 UTC+2 17 March
- Reflect on your feedback

- Optional: Make BDA3 exercises 2.1-2.5, 2.8, 2.9, 2.14, 2.17, 2.22 (model solutions available for 2.1-2.5, 2.7-2.13, 2.16, 2.17, 2.20, and 2.14 is in course slides)
- Start reading Chapter 3, see instructions below

Multiparameter models, joint, marginal and conditional distribution, normal model, bioassay example, grid sampling and grid evaluation. BDA3 Ch 3.

- Read BDA3 Chapter 3
- Watch Lecture 3 on multiparameter models, joint, marginal and conditional distribution, normal model, bioassay example, grid sampling and grid evaluation. BDA3 Ch 3.
- Read the additional comments for Chapter 3
- Check R demos or Python demos for Chapter 3
- Make and submit Assignment 3.
**Deadline Friday 26 March 23:59 UTC+2**- Rubric questions used in peergrading for Assignment 3
- Review Assignment 2 done by your peers before 23:59 UTC+2 24 March, and reflect on your feedback

- Optional: Make BDA3 exercises 3.2, 3.3, 3.9 (model solutions available for 3.1-3.3, 3.5, 3.9, 3.10)
- Start reading Chapter 10, see instructions below

Numerical issues, Monte Carlo, how many simulation draws are needed, how many digits to report, direct simulation, curse of dimensionality, rejection sampling, and importance sampling. BDA3 Ch 10.

- Read BDA3 Chapter 10
- Watch Lecture 4.1 on numerical issues, Monte Carlo, how many simulation draws are needed, how many digits to report, and Lecture 4.2 on direct simulation, curse of dimensionality, rejection sampling, and importance sampling. BDA3 Ch 10.
- Read the additional comments for Chapter 10
- Check R demos or Python demos for Chapter 10
- Make and submit Assignment 4.
**Deadline Friday 9 April 23:59 UTC+3**- Rubric questions used in peergrading for Assignment 4
- Review Assignment 3 done by your peers before 23:59 UTC+3 7 April, and reflect on your feedback

- Optional: Make BDA3 exercises 10.1, 10.2 (model solution available for 10.4)
- Start reading Chapter 11, see instructions below

Markov chain Monte Carlo, Gibbs sampling, Metropolis algorithm, warm-up, convergence diagnostics, R-hat, and effective sample size. BDA3 Ch 11.

- Read BDA3 Chapter 11
- Watch Lecture 5.1 on Markov chain Monte Carlo, Gibbs sampling, Metropolis algorithm, and Lecture 5.2 on warm-up, convergence diagnostics, R-hat, and effective sample size. BDA3 Ch 11.
- Read the additional comments for Chapter 11
- Check R demos or Python demos for Chapter 11
- Make and submit Assignment 5.
**Deadline Friday 16 April 23:59 UTC+3**- Rubric questions used in peergrading for Assignment 5
- Review Assignment 4 done by your peers before 23:59 UTC+3 14 April, and reflect on your feedback

- Optional: Make BDA3 exercise 11.1 (model solution available for 11.1)
- Start reading Chapter 12 + Stan material, see instructions below

HMC, NUTS, dynamic HMC and HMC specific convergence diagnostics, probabilistic programming and Stan. BDA3 Ch 12 + extra material

- Read BDA3 Chapter 12
- Watch Lecture 6.1 on HMC, NUTS, dynamic HMC and HMC specific convergence diagnostics, and Lecture 6.2 on probabilistic programming and Stan. BDA3 Ch 12 + extra material.
- Slides
- Optional: Stan Extra introduction recorded 2020 Golf putting example, main features of Stan, benefits of probabilistic programming, and comparison to some other software.

- Read the additional comments for Chapter 12
- Read Stan introduction article
- Check R demos for RStan or Python demos for PyStan
- Additional material for Stan:
- Documentation
- RStan installation
- PyStan installation
- Basics of Bayesian inference and Stan, Jonah Gabry & Lauren Kennedy Part 1 and Part 2

- Make and submit Assignment 6.
**Deadline Friday 23 April 23:59 UTC+3**- Rubric questions used in peergrading for Assignment 6
- Review Assignment 5 done by your peers before 23:59 UTC+3 21 April, and reflect on your feedback

- Start reading Chapter 5 + Stan material, see instructions below

Hierarchical models and exchangeability. BDA3 Ch 5.

- Read BDA3 Chapter 5
- Watch Lecture 7.1 on hierarchical models, and Lecture 7.2 on exchangeability. BDA3 Ch 5.
- Read the additional comments for Chapter 5
- Check R demos or Python demos for Chapter 5
- Make and submit Assignment 7.
**Deadline Friday 30 April 23:59 UTC+3**- Rubric questions used in peergrading for Assignment 7
- Review Assignment 6 done by your peers before 23:59 UTC+3 28 April, and reflect on your feedback

- Optional: Make BDA3 exercises 5.1 and 5.1 (model solution available for 5.3-5.5, 5.7-5.12)
- Start reading Chapters 6-7 and additional material, see instructions below.

Model checking and cross-validation.

- Read BDA3 Chapters 6 and 7 (skip 7.2 and 7.3)
- Read Visualization in Bayesian workflow
- more about workflow and examples of prior predictive checking and LOO-CV probability integral transformations

- Read Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC (Journal link)
- replaces BDA3 Sections 7.2 and 7.3 on cross-validation

- Watch Lecture 8.1 on model checking, and Lecture 8.2 on cross-validation part 1. BDA3 Ch 6-7 + extra material.
- Read the additional comments for Chapter 6 and Chapter 7
- Check R demos or Python demos for Chapter 6
- Additional reading material
- No new assignment in this block
- Start the project work
- Optional: Make BDA3 exercise 6.1 (model solution available for 5.3-5.5, 5.7-5.12)

PSIS-LOO, K-fold-CV, model comparison and selection. Extra lecture on variable selection with projection predictive variable selection.

- Read Chapter 7 (no 7.2 and 7.3)
- Read Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC (Journal link)
- replaces BDA3 Sections 7.2 and 7.3 on cross-validation

- Watch Lecture 9.1 PSIS-LOO and K-fold-CV.
- Optional: Lecture 9.2 model comparison and selection, and Lecture 9.3 extra lecture on variable selection with projection predictive variable selection. Extra material.
- Additional reading material
- Make and submit Assignment 8.
**Deadline Friday 7 May 23:59 UTC+3**- Rubric questions used in peergrading for Assignment 8
- Review Assignment 7 done by your peers before 23:59 UTC+3 5 May, and reflect on your feedback

- Start reading Chapter 9, see instructions below.

Decision analysis. BDA3 Ch 9.

- Read Chapter 9
- Watch Lecture 10.1 on decision analysis. BDA3 Ch 9.
- Project presentation info will be updated soon.
- Make and submit Assignment 9.
**Deadline Friday 14 May 23:59 UTC+3**- Rubric questions used in peergrading for Assignment 9
- Review Assignment 8 done by your peers before 23:59 UTC+3 12 May, and reflect on your feedback

- Start reading Chapter 4, see instructions below.

Normal approximation (Laplace approximation), and large sample theory and counter examples. BDA3 Ch 4.

- Read Chapter 4
- Watch Lecture 11.1 on normal approximation (Laplace approximation) and Lecture 11.2 on large sample theory and counter examples. BDA3 Ch 4.
- No new assignment. Work on project.
- Review Assignment 9 done by your peers before 23:59 UTC+3 19 May, and reflect on your feedback

Frequency evaluation of Bayesian methods, hypothesis testing and variable selection. Overview of modeling data collection, BDA3 Ch 8, linear models, BDA Ch 14-18, lasso, horseshoe and Gaussian processes, BDA3 Ch 21.

- These lectures are optional, but especially the lecture on hypothesis testing and variable selection is useful for project work.
- Watch Lecture 12.1 on frequency evaluation, hypothesis testing and variable selection and Lecture 12.2 overview of modeling data collection, BDA3 Ch 8, linear models, BDA Ch 14-18, lasso, horseshoe and Gaussian processes, BDA3 Ch 21.
- Work on project. TAs help with projects.
**Project deadline 21 May 23:59 UTC+3**

- Project report deadline 21 May 23:59 UTC+3 (submit to peergrade).
- Review project reports done by your peers before 26 May 23:59 UTC+3, and reflect on your feedback

We strongly recommend using R in the course as there are more packages for Stan and statistical analysis in R. If you are already fluent in Python, but not in R, then using Python may be easier, but it can still be more useful to learn also R. Unless you are already experienced and have figured out your preferred way to work with R, we recommend

- installing RStudio Desktop,

See FAQ for frequently asked questions about R problems in this course. The demo codes provide useful starting points for all the assignments.

- For learning R programming basics
- For learning basic and advanced plotting using R