Bayesian Data Analysis course - Aalto 2024
Aalto 2024 course can be taken online except for the final project presentation. The lectures will be given on campus, but recorded and the recording will be made available online after the lecture. If you are unable to register for the course at the moment in the Sisu, there is no need to email the lecturer. You can start taking the course and register before the end of the course. Sisu shows rooms on campus for the computer exercises, and you can come to ask questions on campus, but you can also ask in Zulip during the same times. You can choose which TA session to join each week separately, without a need to register for those sessions.
- MyCourses is used for initial announcements, linking to Zulip, assignment quizzes, peergrading, and some questionnaires.
- Most of the communication happens in the course chat (see below)
- Aalto University Code of Academic Integrity and Handling Violations Thereof
- Use of AI is allowed on the course, but the most of the work needs to be made by the student, and you need to report whether you used AI and in which way you used them (See points 5 and 6 in Aalto guidelines for use of AI in teaching). We have tested some AI on the course topics and assignments and the output can be copy of existing text without attribution (ie plagiarism), vague or have mistakes, so you need to be careful when using such outputs.
All the course material is available in a git repo and in Panopto, (and these pages are for easier navigation). All the material can be used in other courses. Text and videos licensed under CC-BY-NC 4.0. Code licensed under BSD-3.
The material will be updated during the course. Exercise instructions and slides will be updated at latest on Monday of the corresponding week. The updated material will appear on the web pages, or you can clone the repo and pull before checking new material. If you don’t want to learn git, you can download the latest zip file.
Book: BDA3
The electronic version of the course book Bayesian Data Analysis, 3rd ed, by Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin is available for non-commercial purposes. Hard copies are available from the publisher and many book stores. Aalto library has also copies. See also home page for the book, errata for the book, and chapter notes.
Prerequisites
- Basic terms of probability theory
- probability, probability density, distribution
- sum, product rule, and Bayes’ rule
- expectation, mean, variance, median
- in Finnish, see e.g. Stokastiikka ja tilastollinen ajattelu
- in English, see e.g. Wikipedia and Introduction to probability and statistics
- Some algebra and calculus
- Basic visualisation techniques (R or Python)
- histogram, density plot, scatter plot
- see e.g. BDA R demos
- see e.g. BDA Python demos
This course has been designed so that there is strong emphasis in computational aspects of Bayesian data analysis and using the latest computational tools. The project brings together the overall Bayesian workflow aspects.
If you find BDA3 too difficult to start with, I recommend
- For regression models, their connection to statistical testing and causal analysis see Gelman, Hill and Vehtari: “Regression and Other Stories”.
- Richard McElreath: Statistical Rethinking, 2nd ed book is easier than BDA3 and the 2nd ed is excellent. Statistical Rethinking doesn’t go as deep in some details, math, algorithms and programming as BDA course. Richard’s lecture videos of Statistical Rethinking: A Bayesian Course Using R and Stan are highly recommended even if you are following BDA3.
- Johnson, Ott, and Dogucu: Bayes Rules! An Introduction to Applied Bayesian Modeling
- For background prerequisites some students have found chapters 2, 4 and 5 in Kruschke, “Doing Bayesian Data Analysis” useful.
Communication channels
- MyCourses is used for some intial announcements, linking to Zulip and Peergrade, and some questionnaires.
- The primary communication channel is the Zulip course chat (link in MyCourses, login with Aalto account)
- Don’t ask via email or direct messages. By asking via common streams in the course chat, more eyes will see your question, it will get answered faster and it’s likely that other students benefit from the answer.
- Login with Aalto account to the Zulip course chat. You can adjust the notifications in the settings.
- If you have any questions, please ask in the public streams and get answers from course staff or other students (active students helping others will get bonus points).
- In the chat system, we will have separate streams for each assignment and the project.
- Stream #general can be used for any kind of general discussions and questions related to the course.
- All important announcements will be posted to #announcements (no discussion on this stream).
- Any kind of feedback is welcome on stream #feedback.
- We have also streams #r, #python, and #stan for questions that are not specific to assignments or the project.
- Stream #queue is used as a queue for getting help during TA sessions.
- The lecturer and teaching assistants have names with “(staff)” or “(TA)” in the end of their names.
- A weekly lecture time on campus includes times for questions and answers
- If you need one-to-one help, please take part in the TA sessions and ask there.
- If you find errors in material, post in #feedback stream or submit an issue in github.
Assessment
Assignments (40%), e-exam (10%), and a project work with presentation (50%). Minimum of 50% of points must be obtained from each. You can get bonus points from chat activity (e.g. helping other students and reporting typos in the material) and answering time usage questionnaries.
Schedule 2024
The course consists of 12 lectures, 9 assignments, a project work, and a project presentation in periods I and II. It’s good start reading the material for the next lecture and assignment while making the assignment related to the previous lecture. There are 9 assignments and a project work with presentation, and thus the assignments are not in one-to-one correspondence with the lectures. The schedule below lists the lectures and how they connect to the topics, book chapters and assignments.
Schedule overview
Here is an overview of the schedule. Scroll down the page to see detailed instructions for each block. When you are working on assignment related to previous lecture, it is good to start reading the book chapters relaed to the next lecture and assignment. The schedule links to 2023 lecture videos until couple hours after the 2024 lecture has been recorded.
1) Course introduction, BDA 3 Ch 1, prerequisites assignment
Course practicalities, material, assignments, project work, peergrading, QA sessions, TA sessions, prerequisites, chat, etc.
- Login with Aalto account to the Zulip course chat with link in MyCourses
- Introduction/practicalities lecture Monday 2024-09-02 14:15-16, hall C, Otakaari 1**
- Read BDA3 Chapter 1
- start with reading instructions for Chapter 1 and afterwards read the additional comments in the same document
- There are no R demos for Chapter 1
- Make and submit 2024 Assignment 1. Deadline Sunday 2024-09-15 23:59
- We highly recommend to submit all assignments Friday before 3pm so that you can get TA help before submission. As the course has students who work weekdays (e.g. FiTech students), the late submission until Sunday night is allowed, but we can’t provide support during the weekends.
- this assignment checks that you have sufficient prerequisite skills (basic probability calculus, and R or Python)
- General information about assignments
- R markdown template for assignments
- FAQ for the assignments has solutions to commonly asked questions related RStudio setup, errors during package installations, etc.
- Get help in TA sessions 2024-09-04 14-16, Y342a, Otakaari 1 2024-09-05 12-14, Y429c-d, Otakaari 1
- in Sisu these are marked as exercise sessions, but we call them TA sessions
- these are optional and you can choose which one to join
- see more info about TA sessions
- Highly recommended, but optional: Make BDA3 exercises 1.1-1.4, 1.6-1.8 (model solutions available for 1.1-1.6)
- Start reading Chapters 1+2, see instructions below
2) BDA3 Ch 1+2, basics of Bayesian inference
BDA3 Chapters 1+2, basics of Bayesian inference, observation model, likelihood, posterior and binomial model, predictive distribution and benefit of integration, priors and prior information, and one parameter normal model.
- Read BDA3 Chapter 2
- Lecture Monday 2024-09-16 14:15-16, hall T1, CS building
- Slides 2
- Videos: 2023 Lecture 2.1 (2024 recording failed), 2024 Lecture 2.2 on basics of Bayesian inference, observation model, likelihood, posterior and binomial model, predictive distribution and benefit of integration, priors and prior information, and one parameter normal model. BDA3 Ch 1+2.
- Read the additional comments for Chapter 2
- Check R demos or Python demos for Chapter 2
- Make and submit Assignment 2. Deadline Sunday 2024-09-22 23:59
- TA sessions 2024-09-18 14-16, Y342a, Otakaari 1 2024-09-19 12-14, Y429c-d, Otakaari 1
- Highly recommended, but optional: Make BDA3 exercises 2.1-2.5, 2.8, 2.9, 2.14, 2.17, 2.22 (model solutions available for 2.1-2.5, 2.7-2.13, 2.16, 2.17, 2.20, and 2.14 is in course slides)
- Start reading Chapter 3, see instructions below
3) BDA3 Ch 3, multidimensional posterior
Multiparameter models, joint, marginal and conditional distribution, normal model, bioassay example, grid sampling and grid evaluation. BDA3 Ch 3.
- Read BDA3 Chapter 3
- Lecture Monday 2024-09-23. 14:15-16, hall T1, CS building
- Slides 3
- Videos: 2023 Lecture 3.1 2023 Lecture 3.2 on multiparameter models, joint, marginal and conditional distribution, normal model, bioassay example, grid sampling and grid evaluation. BDA3 Ch 3.
- Read the additional comments for Chapter 3
- Check R demos or Python demos for Chapter 3
- Make and submit Assignment 3. Deadline Sunday 2024-09-29 23:59
- TA sessions 2024-09-25 14-16, 2024-09-26 12-14,
- Highly recommended, but optional: Make BDA3 exercises 3.2, 3.3, 3.9 (model solutions available for 3.1-3.3, 3.5, 3.9, 3.10)
- Start reading Chapter 10, see instructions below
4) BDA3 Ch 10, Monte Carlo
Numerical issues, Monte Carlo, how many simulation draws are needed, how many digits to report, direct simulation, curse of dimensionality, rejection sampling, and importance sampling. BDA3 Ch 10.
- Read BDA3 Chapter 10
- Lecture Monday 2024-09-30 14:15-16, hall T1, CS building
- Slides 4
- Videos: 2024 Lecture 4.1 on numerical issues, Monte Carlo, how many simulation draws are needed, how many digits to report, and 2024 Lecture 4.2 on Pareto-\(\hat{k}\) diagnostic, direct simulation, rejection sampling, and importance sampling. BDA3 Ch 10.
- Read the additional comments for Chapter 10
- Check R demos or Python demos for Chapter 10
- Make and submit Assignment 4. Deadline Sunday 2024-10-06 23:59
- TA sessions 2024-10-02 14-16, 2024-10-03 12-14,
- Highly recommended, but optional: Make BDA3 exercises 10.1, 10.2 (model solution available for 10.4)
- Start reading Chapter 11, see instructions below
5) BDA3 Ch 11, Markov chain Monte Carlo
Markov chain Monte Carlo, Gibbs sampling, Metropolis algorithm, warm-up, convergence diagnostics, R-hat, and effective sample size. BDA3 Ch 11.
- Read BDA3 Chapter 11
- Lecture Monday 2024-10-07 14:15-16, hall T1, CS building
- Slides 5
- Videos: 2024 Lecture 5.1 on Markov chain Monte Carlo, Gibbs sampling, Metropolis algorithm, and 2024 Lecture 5.2 on warm-up, convergence diagnostics, R-hat, and effective sample size.
- Read the additional comments for Chapter 11
- Check R demos or Python demos for Chapter 11
- Make and submit Assignment 5. Deadline Sunday 2024-10-13 23:59
- TA sessions 2024-10-09 14-16, 2024-10-10 12-14,
- Highly recommended, but optional: Make BDA3 exercise 11.1 (model solution available for 11.1)
- Start reading Chapter 12 + Stan material, see instructions below
6) BDA3 Ch 12 + Stan, HMC, PPL, Stan
HMC, NUTS, dynamic HMC and HMC specific convergence diagnostics, probabilistic programming and Stan. BDA3 Ch 12 + extra material
- Read BDA3 Chapter 12
- Lecture Monday 2024-10-14 14:15-16, hall T1, CS building
- Slides 6
- Videos: 2023 Lecture 6.1 on HMC, NUTS, dynamic HMC and HMC specific convergence diagnostics, and 2024 Lecture 6.2 on probabilistic programming and Stan. BDA3 Ch 12 + extra material.
- Optional: Stan Extra introduction recorded 2020 Golf putting example, main features of Stan, benefits of probabilistic programming, and comparison to some other software.
- Read the additional comments for Chapter 12
- Read Stan introduction article
- Check R demos for RStan or Python demos for PyStan
- Additional material for Stan:
- Documentation
- RStan installation
- PyStan installation
- Basics of Bayesian inference and Stan, Jonah Gabry & Lauren Kennedy Part 1 and Part 2
- Make and submit Assignment 6. DeadlineSunday 2024-10-27 23:59 (two weeks for this assignment)
- TA sessions 2024-10-16 14-16, 2024-10-17 12-14,
- Start reading Chapter 5 + Stan material, see instructions below
7) BDA3 Ch 5, hierarchical models
Hierarchical models and exchangeability. BDA3 Ch 5.
- Read BDA3 Chapter 5
- Lecture Monday 2024-10-28 14:15-16, hall T2, CS building
- Slides 7
- Videos: 2023 Lecture 7 on hierarchical models and exchangeability.
- Read the additional comments for Chapter 5
- Check R demos or Python demos for Chapter 5
- Make and submit Assignment 7. Deadline Sunday 2024-11-10 23:59 (two weeks for this assignment)
- TA sessions 2024-10-30 14-16, 2024-10-31 12-14,
- Highly recommended, but optional: Make BDA3 exercises 5.1 and 5.2 (model solution available for 5.3-5.5, 5.7-5.12)
- Start reading Chapters 6-7 and additional material, see instructions below.
8) BDA3 Ch 6+7 + extra material, model checking, cross-validation
Model checking and cross-validation.
- Read BDA3 Chapters 6 and 7 (skip 7.2 and 7.3)
- Read Visualization in Bayesian workflow
- more about workflow and examples of prior predictive checking and LOO-CV probability integral transformations
- Read Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC (Journal link)
- replaces BDA3 Sections 7.2 and 7.3 on cross-validation
- Lecture Monday 2024-11-04 14:15-16, hall T2, CS building
- Slides 8a, Slides 8b
- Videos: 2024 Lecture 8.1 on model checking, and 2024 Lecture 8.2 on cross-validation part 1. BDA3 Ch 6-7 + extra material.
- Read the additional comments for Chapter 6 and Chapter 7
- Check R demos or Python demos for Chapter 6
- Additional reading material
- No new assignment in this block
- Start the project work
- TA sessions 2024-11-06 14-16, 2024-11-07 12-14,
- Highly recommended, but optional: Make BDA3 exercise 6.1 (model solution available for 6.1, 6.5-6.7)
9) BDA3 Ch 7, extra material, model comparison and selection
PSIS-LOO, K-fold-CV, model comparison and selection. Extra lecture on variable selection with projection predictive variable selection.
- Read Chapter 7 (no 7.2 and 7.3)
- Read Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC (Journal link)
- replaces BDA3 Sections 7.2 and 7.3 on cross-validation
- Lecture Monday 2024-11-11 14:15-16, hall T2, CS building
- Slides 9
- Videos: Lecture 9.1 and Lecture 9.2 on model comparison, selection, and hypothesis testing.
- Additional reading material
- Make and submit Assignment 8. Sunday 2024-11-17 23:59
- TA sessions 2024-11-13 14-16, 2024-11-14 12-14,
- Start reading Chapter 9, see instructions below.
10) BDA3 Ch 9, decision analysis + BDA3 Ch 4 Laplace approximation and asymptotics
Decision analysis. BDA3 Ch 9. + Laplace approximation and asymptotics. BDA Ch 4.
- Read Chapter 9 and 4
- Lecture Monday 2024-11-18 14:15-16, hall T2, CS building
- Slides 10a, Slides 10b
- Videos: 2023 Lecture 10.1 (2024 recording failed, but the content is almost the same) on decision analysis. BDA3 Ch 9, and 2024 Lecture 10.2 on Laplace approximation, and asymptotics, BDA3 Ch 4.
- Make and submit Assignment 9. Sunday 2024-11-24 23:59
- TA sessions 2024-11-20 14-16, 2024-11-21 12-14,
- Start reading Chapter 4, see instructions below.
11) Variable selection with projpred, project presentation example, extra
- Lecture Monday 2024-11-25 14:15-16, hall T2, CS building
- Slides 11a, Slides Project Presentation, Slides 11 extra
- Videos: 2024 Lecture 11.1 on variable selecion with projpred, 2024 Project presentation advice.
- No new assignment. Work on project. TAs help with projects.
- TA sessions 2024-11-27 14-16, 2024-11-28 12-14,
12) TBA
- Lecture Monday 2024-12-02 14:15-16, hall T2, CS building
- TBA
- Work on project. TAs help with projects. Project deadline 1.12. 23:59
- TA sessions 2024-12-04 14-16, 2024-12-05 12-14,
13) Project evaluation
- Project report deadline 1.12. 23:59 (submit to peergrade).
- Review project reports done by your peers before 5.12. 23:59, and reflect on your feedback.
- Project presentations 9.-13.12. (evaluation week)
R
R is used in the course as there are more packages for Stan and statistical analysis. Unless you are already experienced and have figured out your preferred way to work with R, we recommend
See FAQ for frequently asked questions about R problems in this course. The demo codes provide useful starting points for all the assignments.
- For learning R programming basics
- R Bootcamp Very short interactive tutorial for ggplot2 and nadling dataframes.
- Garrett Grolemund, Hands-On Programming with R
- For learning basic and advanced plotting using R
English-Finnish-English statistics dictionary
Excellent online English-Finnish-English statistics dictionary:
Shorter English-Finnish dictionary for the terms specific for this course
Sanasta “bayesilainen” esiintyy Suomessa muutamaa erilaista kirjoitustapaa. Olen käyttänyt muotoa “bayesilainen”, joka on muodostettu yleisen vieraskielisten nimien taivutussääntöjen mukaan: “Jos nimi on kirjoitettuna takavokaalinen mutta äännettynä etuvokaalinen, kirjoitetaan päätteseen tavallisesti takavokaali etuvokaalin sijasta, esim. Birminghamissa, Thamesilla.” Terho Itkonen, Kieliopas, 6. painos, Kirjayhtymä, 1997.
Suomen tilastoseura sen sijaan suosittaa muotoa “bayseiläinen”. Heidän perustelunsa löytyy Tilastotieteen sanastosta (ks. linkki yllä). Tilastotieteen sanaston verkkoversiossa on hakutoiminto, ja PDF-versio sisältää käännösten perusteluita sekä hieman tilastotieteen varhaista historiaa Suomessa.