# Assignment 1

# 1 General information

The exercises of this assignment are meant to test whether or not you have sufficient knowledge to participate in the course. The first question checks that you remember basic terms of probability calculus. The second exercise checks your basic computer skills and guides you to learn some R functions. In the last three ones you will first write the math for solving the problems (you can, for example, write the equations in markdown or include a photo of hand written answers), and then implement the final equations in R (and then you can use `markmyassignment`

to check your results). The last question checks that you have found the course book.

**The maximum amount of points from this assignment is 3.**

We prepared a quarto template specific to this assignment to help you get started. You can inspect this and future templates

- as a qmd file,
- as a rendered html file
- or as a rendered pdf file

or you can download all template `qmd`

files and some additional files at templates.zip (also available on Aalto JupyterHub under `/coursedata`

).

# 2 Basic probability theory notation and terms

This can be trivial or you may need to refresh your memory on these concepts (see, e.g. Aalto course *First Course in Probability and Statistics*). Explain each of the following terms with one sentence:

- probability
- probability mass (function)
- probability density (function)
- probability distribution
- discrete probability distribution
- continuous probability distribution
- cumulative distribution function (cdf)
- likelihood

# 3 Basic computer skills

This task deals with elementary plotting and computing skills needed during the rest of the course. You can use either R or Python, although R is the recommended language in this course and we will only guarantee support in R. For documentation in R, just type `?{function name here}`

.

# 4 Bayes’ theorem 1

A group of researchers has designed a new inexpensive and painless test for detecting lung cancer. The test is intended to be an initial screening test for the population in general. A positive result (presence of lung cancer) from the test would be followed up immediately with medication, surgery or more extensive and expensive test. The researchers know from their studies the following facts:

- Test gives a positive result in \(98\%\) of the time when the test subject has lung cancer.
- Test gives a negative result in \(96\%\) of the time when the test subject does not have lung cancer.
- In general population approximately one person in 1000 has lung cancer.

Here are some probability values that can help you figure out if you copied the right conditional probabilities from the question.

- P(Test gives positive | Subject does not have lung cancer) = \(4\%\)
- P(Test gives positive
**and**Subject has lung cancer) = \(0.098\%\) this is also referred to as the**joint probability**of*test being positive*and the*subject having lung cancer*.

# 5 Bayes’ theorem 2

We have three boxes, A, B, and C. There are

- 2 red balls and 5 white balls in the box A,
- 4 red balls and 1 white ball in the box B, and
- 1 red ball and 3 white balls in the box C.

Consider a random experiment in which one of the boxes is randomly selected and from that box, one ball is randomly picked up. After observing the color of the ball it is replaced in the box it came from. Suppose also that on average box A is selected 40% of the time and box B \(10\%\) of the time (i.e. \(P(A) = 0.4\)).

# 6 Bayes’ theorem 3

Assume that on average fraternal twins (two fertilized eggs and then could be of different sex) occur once in 150 births and identical twins (single egg divides into two separate embryos, so both have the same sex) once in 400 births (**Note!** This is not the true value, see Exercise 1.6, page 28, in BDA3). American male singer-actor Elvis Presley (1935 – 1977) had a twin brother who died in birth. Assume that an equal number of boys and girls are born on average.