\documentclass[fleqn]{article}

\usepackage{haldefs}
\usepackage{notes}
\usepackage{url}

\begin{document}
\lecture{Machine Learning}{HW14: Bayesian Learning}{CS5350, Fall 2009}

% IF YOU ARE USING THIS .TEX FILE AS A TEMPLATE, PLEASE REPLACE
% "CS5350, Fall 2009" WITH YOUR NAME AND UID.

Hand in at: \url{http://www.cs.utah.edu/~hal/handin.pl?course=cs5350}.
Remember that only PDF submissions are accepted.  We encourage using
\LaTeX\ to produce your writeups.  See \verb+hw01.tex+ for an example
of how to do so.  You can make a \verb+.pdf+ out of the \verb+.tex+ by
running ``\verb+pdflatex hw14.tex+''.

\section{Additional Exercises}

\bee
\i We've seen an example of conjugate priors: the Beta/Binomial case.
Here, th binomial distribution has the form $p(x \| pi) = \pi^x
(1-\pi)^{1-x}$ and the Beta prior has the form $p(\pi \| a,b) = \frac
{\Ga(a+b)} {\Ga(a)\Ga(b)} pi^{a-1} (1-\pi)^{b-1}$.  The Binomial is
like flipping a coin; the multinomial is like rolling a $K$-sided
die.  The multinomial has distribution $p(\vec x \| \vec \th) =
\prod_k \th_k^{x_k}$, where we have used the encoding $x = \langle
0,0,1,0,0, \dots, 0,0\rangle$ to denote that the die came up on the
side labeled ``three'' (i.e., it is an indicator vector).  The
conjugate prior for the Multinomial is called a Dirichlet, has a
\emph{vector} of hyperparamters $\vec \al = \langle \al_1, \al_2,
\dots \al_K \rangle$ which must all be positive.  It's density is
$p(\vec \th \| \vec \al) = \frac {\Ga(\sum_k \al_k)} {\prod_k
  \Ga(\al_k)} \prod_k \th_k^{\al_k-1}$.

\bee
\i Show that the Dirichlet is a generalization of the Beta, in the
sense that when $K=2$ they are equivalent.

\i How that the Dirichlet is actually the conjugate prior to the
Multinomial.  To do this, show that the product of a Dirichlet and a
Multinomial is a Dirichlet.  What is the update rule?  That is, if you
start with a Dirichlet prior with parameter $\vec \al$, and observe
$N$ rolls of the $K$-sided dice, what is the resulting posterior on
$\vec \th$?
\ene

\i (6350 only) Write two sentences briefly describing each of the following
approximation methods: MCMC, Laplace approximation or variational EM.
You should state whether they're deterministic or stochastic, whether
they are guaranteed to give you the right answer (and under what
conditions) and how you might decide which to use.
\ene

\end{document}

