Advanced Statistical Inference
EURECOM
\[ \require{physics} \definecolor{input}{rgb}{0.42, 0.55, 0.74} \definecolor{params}{rgb}{0.51,0.70,0.40} \definecolor{output}{rgb}{0.843, 0.608, 0} \definecolor{vparams}{rgb}{0.58, 0, 0.83} \definecolor{noise}{rgb}{0.0, 0.48, 0.65} \definecolor{latent}{rgb}{0.8, 0.0, 0.8} \definecolor{function}{rgb}{0.75, 0.75, 0.12} \]
Setup: I toss the coin \(n\) times and observe \(\textcolor{output}{y}\) heads
Question: Is the coin fair? What is the probability of heads?
Steps:
Assumptions:
We model the number of heads \(\textcolor{output}{y}\) with a binomial distribution and probability \(\textcolor{params}{\theta}\):
\[ p(\textcolor{output}{y}\mid \textcolor{params}{\theta}) = \binom{n}{\textcolor{output}{y}} \textcolor{params}{\theta}^{\textcolor{output}{y}} (1-\textcolor{params}{\theta})^{n-\textcolor{output}{y}} \]
where \(\textcolor{params}{\theta}\) is the probability of heads, \(n\) is the number of tosses, \(\textcolor{output}{y}\) is the number of heads, and
\[ \binom{n}{\textcolor{output}{y}}= \frac{n!}{\textcolor{output}{y}!(n-\textcolor{output}{y})!} \]
is the binomial coefficient.
We need a prior distribution for \(\textcolor{params}{\theta}\).
How to choose it? Recall what \(\textcolor{params}{\theta}\) represents: the probability of heads.
Beta distribution:
\[ p(\textcolor{params}{\theta}) = \frac{1}{B(\alpha, \beta)} \textcolor{params}{\theta}^{\alpha-1} (1-\textcolor{params}{\theta})^{\beta-1} \]
where
\[ B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)} \]
The beta prior can be interpreted as prior observations:
Interpretation:
By Bayes’ rule:
\[ p(\textcolor{params}{\theta}\mid \textcolor{output}{y}) = \frac{p(\textcolor{output}{y}\mid \textcolor{params}{\theta}) p(\textcolor{params}{\theta})}{p(\textcolor{output}{y})} \]
Because beta is conjugate to binomial, the posterior is also beta:
\[ p(\textcolor{params}{\theta}\mid \textcolor{output}{y}) = \frac{1}{B(\alpha', \beta')} \textcolor{params}{\theta}^{\alpha'-1} (1-\textcolor{params}{\theta})^{\beta'-1} \]
So we only need \(\alpha'\) and \(\beta'\).
From conjugacy:
\[ p(\textcolor{params}{\theta}\mid \textcolor{output}{y}) = \frac{1}{B(\alpha', \beta')} \textcolor{params}{\theta}^{\alpha'-1} (1-\textcolor{params}{\theta})^{\beta'-1} \]
From Bayes’ rule:
\[ p(\textcolor{params}{\theta}\mid \textcolor{output}{y}) \propto \textcolor{params}{\theta}^{\textcolor{output}{y}} (1-\textcolor{params}{\theta})^{n-\textcolor{output}{y}} \textcolor{params}{\theta}^{\alpha-1} (1-\textcolor{params}{\theta})^{\beta-1} \]
Matching powers gives:
\[ \alpha' = \alpha + \textcolor{output}{y}, \qquad \beta' = \beta + n - \textcolor{output}{y} \]
If \(\textcolor{params}{\theta}\mid \textcolor{output}{y}\sim \mathrm{Beta}(\alpha',\beta')\), then
\[ {\mathbb{E}}[\textcolor{params}{\theta}\mid \textcolor{output}{y}] = \frac{\alpha'}{\alpha'+\beta'} \]
\[ \mathrm{Var}(\textcolor{params}{\theta}\mid \textcolor{output}{y}) = \frac{\alpha'\beta'}{(\alpha'+\beta')^2(\alpha'+\beta'+1)} \]
If \(\alpha',\beta' > 1\), the MAP is
\[ hetasc_{\mathrm{MAP}} = \frac{\alpha'-1}{\alpha'+\beta'-2} \]
As \(n\) grows, posterior mean and MAP become close.
Assume the coin is fair, \(\widehat{\textcolor{params}{\theta}}=0.5\).
Use the same likelihood as before, but for one trial (\(n=1\)):
\[ p(y_{\mathrm{new}} \mid \textcolor{params}{\theta}, n=1)=\binom{1}{y_{\mathrm{new}}}\textcolor{params}{\theta}^{y_{\mathrm{new}}}(1-\textcolor{params}{\theta})^{1-y_{\mathrm{new}}} \]
So for a head (\(y_{\mathrm{new}}=1\)):
\[ p(y_{\mathrm{new}}=1 \mid \textcolor{params}{\theta}, n=1)=\binom{1}{1}\textcolor{params}{\theta}^1(1-\textcolor{params}{\theta})^0=\textcolor{params}{\theta} \]
Since \(\textcolor{params}{\theta}\) is unknown, we integrate over its posterior:
\[ p(y_{\mathrm{new}}=1 \mid \textcolor{output}{y}) = \int p(y_{\mathrm{new}}=1 \mid \textcolor{params}{\theta}, n=1) p(\textcolor{params}{\theta}\mid \textcolor{output}{y}) \, \mathrm{d}\textcolor{params}{\theta}= \int \textcolor{params}{\theta}\, p(\textcolor{params}{\theta}\mid \textcolor{output}{y}) \, \mathrm{d}\textcolor{params}{\theta}= {\mathbb{E}}[\textcolor{params}{\theta}\mid \textcolor{output}{y}] \]
With beta-binomial conjugacy:
\[ {\mathbb{E}}[\textcolor{params}{\theta}\mid \textcolor{output}{y}] = \frac{\alpha+\textcolor{output}{y}}{\alpha+\beta+n} \]
Useful diagnostics:
If credible interval is tight around \(0.5\), data supports fairness.