\documentclass[fleqn]{article}

\usepackage{haldefs}
\usepackage{notes}
\usepackage{url}

\begin{document}
\lecture{Machine Learning}{HW03: Linear classification}{CS5350, Fall 2009}

% IF YOU ARE USING THIS .TEX FILE AS A TEMPLATE, PLEASE REPLACE
% "CS5350, Fall 2009" WITH YOUR NAME AND UID.

Hand in at: \url{http://www.cs.utah.edu/~hal/handin.pl?course=cs5350}.
Remember that only PDF submissions are accepted.  We encourage using
\LaTeX\ to produce your writeups.  See \verb+hw01.tex+ for an example
of how to do so.  You can make a \verb+.pdf+ out of the \verb+.tex+ by
running ``\verb+pdflatex hw03.tex+''.

\section{PRML Exercises}

\bee
\i  3.4 (6350 only)
\i  4.7
\i *4.14
\ene

\section{Additional Exercises}

\bee
\i The standard perceptron update is $\vec w \leftarrow \vec w + y_n
\vec x_n$ when a mistake is made.  Suppose we were to add a learning
rate $\eta > 0$ so that the update became $\vec w \leftarrow \vec w +
\eta y_n \vec x_n$.  Does this change the final learned classifier?
If so, how?  If not, why not?

\i *Suppose that instead of regularizing with an $\ell_2$ norm, we were
to regularize with an $\ell_1$ norm.  As we discussed in class, we can
no longer use gradient descent, because the objective is not longer
differentiable everywhere.  Suggest a solution (you don't have to to
work through math: just tell me how you would approach this problem).

\i (6350 only) work through the math.
\ene

\end{document}
