Machine Learning
CS 5350/CS 6350
Fall 2009
![]() |
Machine Learning
CS 5350/CS 6350 Fall 2009
|
![]() |
Machine learning is all about finding patterns in data. The whole
idea is to replace the "human writing code" with a "human supplying
data" and then let the system figure out what it is that the person
wants to do by looking at the examples. The most central concept in
machine learning is generalization: how to generalize beyond
the examples that have been provided at "training time" to new
examples that you see at "test time." A very large fraction of what
we'll talk about has to do with figuring out what generalization
means. We'll look at it from lots of different perspectives and
hopefully gain some understanding of what's going on.
This class will showcase machine learning technology in the context of
recommender systems, ala what you see on Amazon or NetFlix (or
eHarmony). The data we'll be working with is recommendations for CS
courses at the U! But, for part of the final
project, you are encouraged to work on the NetFlix prize, from which you
could (probably no
longer :() win $1,000,000.
There are a few cool things about machine learning that I hope to get
across in class. The first is that it's broadly applicable. These
techniques have led to significant advances in many fields, including
stock trading, robotics, machine translation, computer vision,
medicine, etc. The second is that there is a very close connection
between theory and practice. While this course is more on the
"practical" side of things, almost everything we will talk about has a
huge amount of accompanying theory. The third is that once you
understand the basics of machine learning technology, it's a very open
field and lots of progress can be made quickly, effectively by
figuring out ways to formalize whatever we can figure out about the
world.
Prerequisites: I take prerequisites seriously. There will be a
lot of math in this class and if you do not come prepared, life
will be rough. You should be able to take derivatives by hand
(preferably of multivariate functions), you should know what dot
products are and how they are related to projections onto subspaces,
you should know what Bayes' rule is and you should know that it's okay
for the density of a Gaussian probability distribution to be greater
than one. I've provided some reading
material to refresh these issues in your head, but if you haven't
at least seen these things before, you should beef up your math
background before class begins. On the
programming side, projects will be in Python; you should understand basic
computer science concepts (like recursion), basic data structures
(trees, graphs), and basic algorithms (search, sorting, etc.). (If
you know matlab, here's a nice cheat sheet.)
The purpose of grading (in my mind) is to provide extra incentive for
you to keep up with the material and to ensure that you exit the class
as a machine learning genius. If everyone gets an A, that would make
me happy (sadly, it hasn't happened yet). The components of grading
are:| 40% | Programming projects There are four programming projects, each worth 10% of your final grade. You will be graded on both code correctness as well as your analysis of the results. These may be completed in teams of at most three students. | |
| 35% | Written homeworks There are fourteen written homeworks (one per week), each is worth 2.5% of your final grade. They will be graded on a high-pass (100%), low-pass (50%) or fail (0%) basis. These are to be completed individually. | |
| 20% | Final project Everyone is to complete a final project, in teams of size up to three. We will have a canned final project that we encourage everyone to do. However, you may also choose from a list of problem ideas that I think are interesting, or propose something specific to me. | |
| 5% | Class participation You will be graded on your in-class presentations of homework questions and other general participation. This is mostly subjective. |
![]() |
The textbook is the new-ish book by Chris Bishop, Pattern
Recognition and Machine Learning (ISBN 0387310738).
Other
recommended (but not required) books:
|
| Date | Topics | Readings | Due | Notes |
| 25 Aug |
Welcome to Machine Learning
What is ML, what is this class and some history |
- | - | ![]() |
| 27 Aug |
Decision trees, parameterized models
Fitting parameters to data, probabilistic view of learning |
PRML 1-1.1,1.3-1.4 | HW01 | ![]() |
| 01 Sep |
Math you've forgotten
Refresher on linear algebra and probability |
PRML 1.2-1.2.4, App C | P0 | ![]() |
| 03 Sep |
Linear regression
Least squares and probabilistic interpretation |
PRML 3-3.1.4, 1.2.5 | HW02 | py |
| 08 Sep |
Linear classification
Logistic and hinge regression, gradient descent |
PRML 4-4.1.2, 4.3-4.3.2 | - | ![]() |
| 10 Sep |
Linear classification
Perceptron algorithm |
PRML 4.1.7, Perceptron | HW03 | ![]() |
| 15 Sep |
Generative classification
Naive Bayes and generative vs discriminative |
PRML 4.2 (optional: ng) | - | ![]() |
| 17 Sep |
Practical issues
Features and evaluation |
eval | HW04 | - |
| 22 Sep |
Neural networks
Multiple layers and back propagation |
PRML 5-5.2 | - | ![]() |
| 24 Sep |
Neural networks II
Back propagation and invariances |
PRML 5.3-5.3.2, 5.5-5.5.3 (optional: 5.5.5) | P1,HW05 | ![]() |
| 29 Sep |
Instance-Based Learning
Nearest neighbors and locally weighted regression |
mitchell-instance 8-8.4 | - | ![]() |
| 01 Oct |
Kernel methods
Optimizing in the dual |
PRML 6-6.2 | HW06 | ![]() |
| 06 Oct |
Support vector machines
Support vectors and hinge loss |
PRML 7-7.1.2 | - | ![]() |
| 08 Oct |
Learning theory
PAC learning and VC dimension |
pac 7-7.4 | HW07 | ![]() |
| 20 Oct |
Hierarchical clustering
Algorithms and measures |
hier (skip 14.1.3) | P2 | ![]() |
| 22 Oct |
K-means clustering
Clustering as optimization |
PRML 9-9.1.1 | HW08 | ![]() |
| 27 Oct |
Mixture of Gaussians
Soft clustering and psuedo-counts |
PRML 9.2-9.3.2 | - | ![]() |
| 29 Oct |
Expectation maximization
Jensen's inequality and majorization |
PRML 9.4 | HW09 | ![]() |
| 03 Nov |
Linear dimensionality reduction
Principle components analysis and factor analysis |
PRML 12-12.1.4, 12.2.4 | - | ![]() |
| 05 Nov |
Non-linear dimensionality reduction
ISOMAP, LLE, kernel PCA and Laplacian Eigenmaps |
PRML 12.3, 12.4.3, manifold | HW10 | ![]() |
| 10 Nov |
Topic modeling for text
Latent semantic analysis and pLSA |
LSA pLSA | - | - |
| 12 Nov |
Sequence modeling
Hidden markov models and Viterbi |
PRML 13-13.2 (stop before 13.2.1) viterbi | HW11 | ![]() |
| 17 Nov |
Sequence modeling II
Forward-backward algorithm |
PRML 13.2.1, 13.2.2 | P3, PP | ![]() |
| 19 Nov |
Statistical relational learning
Conditional random fields |
crf (esp. 1-1.3) | HW12 | ![]() |
| 24 Nov |
Collaborative filtering
Matrix factorization |
factorize | - | - |
| 01 Dec |
Semi-supervised learning
Expectation maximization and generative/discriminative |
nigam | P4 | - |
| 03 Dec |
Bayesian inference I
Thinking Bayesian, linear regression revisited |
PRML 1.2.3, 1.2.6, 3.3, (optional: 3.5) | HW13 | - |
| 08 Dec |
Multimodal learning
Matching words and pictures |
wordspics | - | - |
| 10 Dec |
Bayesian inference II
Approximation methods: variational EM and sampling |
PRML 10.1-10.1.1, 11-11.1.2, 11.2 | HW14 | - |
| 15 Dec | Final exam party (3:30 pm - 5:30 pm) | - | PW | - |
| BT0: L0 and L1 regularization | tex | (due 09 Oct) | |
| BT1: Generative vs discriminative | tex | (due 09 Oct) | |
| BT2: Deep belief networks | tex | (due 29 Oct) | |
| BT3: Learning the kernel | tex | (due 6 Nov) | |
| BT4: String and subsequence kernels | tex | (due 6 Nov) | |
| BT5: Infinite mixture models | tex | (due 27 Nov) | |
| BT6: Latent Dirichlet allocation | tex | (due 27 Nov) | |
| BT7: SVMs for structured outputs | tex | (due 11 Dec) | |
| BT8: Max-margin matrix factorization | tex | (due 11 Dec) | |
| BT9: Error correcting tournaments | tex | (due 11 Dec) |