\documentclass[fleqn]{article}

\usepackage{haldefs}
\usepackage{notes}
\usepackage{url}

\begin{document}
\lecture{Machine Learning}{HW04: Generative/Discriminative}{CS5350, Fall 2009}

% IF YOU ARE USING THIS .TEX FILE AS A TEMPLATE, PLEASE REPLACE
% "CS5350, Fall 2009" WITH YOUR NAME AND UID.

Hand in at: \url{http://www.cs.utah.edu/~hal/handin.pl?course=cs5350}.
Remember that only PDF submissions are accepted.  We encourage using
\LaTeX\ to produce your writeups.  See \verb+hw01.tex+ for an example
of how to do so.  You can make a \verb+.pdf+ out of the \verb+.tex+ by
running ``\verb+pdflatex hw04.tex+''.

\section{PRML Exercises}

\bee
\i *4.9
\i *4.10
\i  4.11 (6350 only)
\ene

\section{Additional Exercises}

\bee 

\i * Suppose that we train a linear classifier (eg., logistic
regression or hinge regression) with an $\ell_2$ regularizer on the
weights.  That is, we minimize $\sum_n \log(1 + \exp[-y_n \vec w\T\vec
x_n]) + \frac \la 2 \norm{\vec w}^2$.  (I've left off the bias because
it's irrelevant for this question.)  I end up with some optimal weight
vector which I'll call $\hat {\vec w}$.  Now, suppose that I change
all of my inputs by \emph{duplicating} the first feature.  That is, if
I used to have $100$ features, I now have $101$ features where feature
$1$ and feature $2$ are \emph{identical}.  Now, I relearn weights and
compute an optimal set $\hat {\vec w}'$.  How do $\hat {\vec w}$ and
$\hat {\vec w}'$ relate?  In particular, how does $\hat w_1$ relate to
$\hat w_1'$ and $\hat w_2'$?  What does this tell us about using
feature magnitudes as a measure of feature relevance?

\i (6350 only) What happens to the above question if I use an $\ell_1$
regularizer instead of an $\ell_2$ regularizer?
\ene

\end{document}

