6 Bayesian Methods

6.1 Reminder example: conditional probabilities

knitr::include_graphics(here('graphics','Hoff_ch2_social_mobility.png'))

6.2 Bayesian methods: Introduction via simple example

Suppose you want to estimate the fraction of a population that is infected with some disease.

\(\theta \in [0,1]\) : true value

Test a random sample of \(20\) from the population.

\(Y \in \{0,1,\ldots,20\}\) : # of positive results.

Question: What does realized value of \(Y\) tell us about the true value of \(\theta\)?

6.2.1 Sampling model

\(Y | \theta\) ~ binomial\((20,\theta)\): For \(y = 0, 1, \ldots, 20\), (i.i.d.)

\[l(y|\theta) = \Pr(Y=y | \theta) = {20 \choose y} \theta^y (1-\theta)^{(20-y)}\]

where \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\)

\(l(y|\theta)\) called the likelihood function.

knitr::include_graphics(here('graphics','Hoff_fig_1-1.png'))

Idea: For any \(0< \theta < 1\), all values of \(Y\) are possible, but some are more likely than others.

The likelihood function tells us how likely is each possible observation, for a given \(\theta\).

If, say, \(Y = 15\), that provides evidence that \(\theta\) is not small.

Core of Bayesian reasoning: work out all the different combinations of \(Y, \theta\) that could have generated the observed sample data.

6.2.2 Prior information

Suppose we have some background knowledge about the likely values of \(\theta\).

Represent this knowledge by means of a prior distribution \(\pi(\theta)\) over \([0,1]\).

Obviously, there are many (infinitely many) possible such distributions.

For convenience, we typically choose to model priors as chosen from a parametrized family of distributions.

6.2.3 The Beta distribution

\[\theta \sim \text{beta}(a,b)\]

Then

\[E[\theta] = \frac{a}{a+b}\]

6.2.4

For our case, let’s suppose our prior beliefs correspond to:

\[\theta \sim \text{beta}(2,20)\]

6.2.5

\[\theta \sim \text{beta}(2,20)\]

implies

knitr::include_graphics(here('graphics','Hoff_ch1_beta_moments.png'))

6.3 Bayes Theorem

Let \(\pi(\theta | y)\) denote our posterior distribution over values of \(\theta\).

This means: our updated beliefs about the likelihood that \(\theta\) takes various values, after we’ve received our test results.

Bayes Theorem says:

\[\pi(\theta | y) = \frac{l(y|\theta) \pi(\theta)}{Pr\{Y = y\}} = \frac{l(y|\theta) \pi(\theta)}{\int_\Theta l(y|\tilde{\theta})\pi(\tilde{\theta}) d\tilde{\theta}}\]

6.3.1

Can be shown:

If \(\theta \sim \text{beta}(2,20)\) and \(Y = 0\), then \(\theta | y \sim \text{beta}(2,40)\).

More generally:

If \(\theta \sim \text{beta}(a,b)\) and \(Y = y\), then \(\theta | y \sim \text{beta}(a+y,b+20-y)\).

6.3.2

knitr::include_graphics(here('graphics','Exp_beta_posterior.png'))

6.4 Sensitivity analysis

knitr::include_graphics(here('graphics','Hoff_fig_1-2.png'))

6.5 Building a predictive model

6.5.1

knitr::include_graphics(here('graphics','Hoff_diabetes_model.png'))

6.5.2

knitr::include_graphics(here('graphics','Hoff_fig_1-3.png'))

6.5.3

knitr::include_graphics(here('graphics','Hoff_fig_1-4.png'))

6.6 Prediction via the predictive distribution

knitr::include_graphics(here('graphics','Hoff_ch3_predictive_distribution.png'))