\documentclass[fleqn]{article}

\usepackage{haldefs}
\usepackage{notes}
\usepackage{url}
\usepackage{graphicx}

\begin{document}
\lecture{CS5350: Machine Learning}{HW1: Decision trees and generalization}{Due 9 Sep 2008}


\section{Written Exercises}

Answer the following questions in 25-100 words each:

\bee 

\i Consider a data set consisting of $400$ data points from class $0$
and $400$ data points from class $1$.  Suppose that a tree model $\cA$
splits these into $(300,100)$ at the first leaf node and $(100,300)$
at the second leaf node.  (Here, $(n,m)$ denotes that $n$ points are
assigned to class $0$ and $m$ points are assigned to class $1$.)
Similarly, suppose that a second tree model $\cB$ splits them into
$(200,400)$ and $(200,0)$.  (See the figure below.)  Evaluate the
misclassification rates for the two trees: are they equal or not?
Similarly, evaluate the information gain for the two trees and use
these to compare the trees.  Do you get different answers?  Does this
make sense?

\includegraphics[width=0.7\textwidth]{hw1-trees.pdf}


\i A cousin of the decision tree is the \emph{decision list}.  In a
decision list, the ``left'' child of any branch must be a leaf.  In
programming terms, a decision list has the form ``if $X_1$ then return
$C_1$ else if $X_2$ then return $C_2$ else if $\dots$'', where the
$X_i$s are features and the $C_i$s are classes.  Do you imagine that
entropy or the Gini index would be a good criteria for building
decision lists?  Why or why not?  Can you think of something that
might do better?

\i What purpose does the idea of ``development'' (or ``validation'')
data serve?  What about cross-validation?  Why is it important to do
these things?

\i Why can we not just estimate hyperparameters on the training data?
\ene

\end{document}
