Machine Learning
CS 5350/CS 6350
Fall 2009
Schedule: Tue/Thr 3:40-5:00pm
Location: MEB 1208
Instructor: Hal Daume III: me AT hal3 DOT name
Office Hours: MEB 3126; Thr 9-10:30 or by appointment
Mailing list: cs5350@list.eng.utah.edu -- PLEASE subscribe (but don't post)!
teach-cs5350@list.eng.utah.edu -- PLEASE post (by don't subscribe)!
RSS feed
TA: Seth Juarez (office hours: Thr 1:30-3:30 and by appt)

Jump to: [Background] [Structure] [Grading] [Textbooks] [Schedule] [Homework] [Links] [Policies]

 Background and Description

Machine learning is all about finding patterns in data. The whole idea is to replace the "human writing code" with a "human supplying data" and then let the system figure out what it is that the person wants to do by looking at the examples. The most central concept in machine learning is generalization: how to generalize beyond the examples that have been provided at "training time" to new examples that you see at "test time." A very large fraction of what we'll talk about has to do with figuring out what generalization means. We'll look at it from lots of different perspectives and hopefully gain some understanding of what's going on.

This class will showcase machine learning technology in the context of recommender systems, ala what you see on Amazon or NetFlix (or eHarmony). The data we'll be working with is recommendations for CS courses at the U! But, for part of the final project, you are encouraged to work on the NetFlix prize, from which you could (probably no longer :() win $1,000,000.

There are a few cool things about machine learning that I hope to get across in class. The first is that it's broadly applicable. These techniques have led to significant advances in many fields, including stock trading, robotics, machine translation, computer vision, medicine, etc. The second is that there is a very close connection between theory and practice. While this course is more on the "practical" side of things, almost everything we will talk about has a huge amount of accompanying theory. The third is that once you understand the basics of machine learning technology, it's a very open field and lots of progress can be made quickly, effectively by figuring out ways to formalize whatever we can figure out about the world.

Prerequisites: I take prerequisites seriously. There will be a lot of math in this class and if you do not come prepared, life will be rough. You should be able to take derivatives by hand (preferably of multivariate functions), you should know what dot products are and how they are related to projections onto subspaces, you should know what Bayes' rule is and you should know that it's okay for the density of a Gaussian probability distribution to be greater than one. I've provided some reading material to refresh these issues in your head, but if you haven't at least seen these things before, you should beef up your math background before class begins. On the programming side, projects will be in Python; you should understand basic computer science concepts (like recursion), basic data structures (trees, graphs), and basic algorithms (search, sorting, etc.). (If you know matlab, here's a nice cheat sheet.)

 Structure of Class

I will take a slightly non-standard approach to class time. I will not spend 3 hours per week going over material that was in the readings. As a result, you should read. And you should do the short written assignments. My responsibility will be to help you understand things that are hard, and to give you an insider's view of the field. Class time will be interactive. You should come to class with a short (but non-empty) list of questions related to the reading and homework. We will spend class time discussing the difficulties. Certain homework problems will be marked for in-class presentation, where I'll draw names (without replacement) to present solutions (with the help of your classmates and me if necessary). The rest of class time will be spent talking about issues that arise, things that I think are particularly interesting, doing activities and/or demos.

Your responsibilities are as follows:

Given that this is a nine credit class, I expect you to spend nine hours per week working on machine learning stuff. Three of those hours will be in class. Of the remaining six, I expect about two to be spent reading (one hour per assignment), two to be spent on written homeworks and two to be spent on projects. If things are taking significantly more time than this, you should talk to us.


 Grading

The purpose of grading (in my mind) is to provide extra incentive for you to keep up with the material and to ensure that you exit the class as a machine learning genius. If everyone gets an A, that would make me happy (sadly, it hasn't happened yet). The components of grading are:
40%Programming projects
There are four programming projects, each worth 10% of your final grade. You will be graded on both code correctness as well as your analysis of the results. These may be completed in teams of at most three students.
35%Written homeworks
There are fourteen written homeworks (one per week), each is worth 2.5% of your final grade. They will be graded on a high-pass (100%), low-pass (50%) or fail (0%) basis. These are to be completed individually.
20%Final project
Everyone is to complete a final project, in teams of size up to three. We will have a canned final project that we encourage everyone to do. However, you may also choose from a list of problem ideas that I think are interesting, or propose something specific to me.
5%Class participation
You will be graded on your in-class presentations of homework questions and other general participation. This is mostly subjective.

I have also provided a list of ten "brain teasers." These are written assignments related to topics we discuss in class, but significantly more advanced than we'll have time for. Most will come with some additional associated reading. Students in 5350 may do up to four of these for extra credit (each is worth 5%). Students in 6350 are required to do two of these and may do two more for extra credit. For 6350, these will account for 10% of your grade, with the above percentages uniformly decreased to "make room." You may work in teams of up to three. I'm more than happy to meet with groups to discuss these topics (actually, I strongly encourage this, since I'll grade these relatively strictly: you'd really better understand them), but you'll have to do a writeup. The listed due dates are the due dates for the writeup, so if you'd like to meet with me, please schedule a slot well in advance.

Late homeworks are not allowed (without prior approval). This is because I need to put solutions up on the web page. You may hand any project in up to 48 hours late; however, once it is late by one minute, your final score will be halved.

We will post solutions to homeworks and projects quickly after the due dates. We will also email you your scores once grading has been completed. If you handed something in and do not get a score for an assignment, you have a one week moritorium on complaints.

You can view the current grades (indexed by the key you provided us in HW01).


 Textbooks

The textbook is the new-ish book by Chris Bishop, Pattern Recognition and Machine Learning (ISBN 0387310738).

Other recommended (but not required) books:


 Schedule (tentative)

The following schedule is subject to change, but likely not by very much. The readings listed are readings that you should have finished by that date. Everything is due by 3:00pm on the date listed on the schedule. Programming assignments are to be completed in Python. Written assignments are to be handed in in PDF format.

One thing that students have pointed out in the past that I'll point out to you is that Wikipedia has a bunch of good articles related to machine learning and statistics. Especially basic statistics stuff (various distributions, rules of probability, etc.) are very well explained there. I highly recommend it as an alternative source of information.

Date Topics Readings Due Notes
25 Aug Welcome to Machine Learning
What is ML, what is this class and some history
- -
27 Aug Decision trees, parameterized models
Fitting parameters to data, probabilistic view of learning
PRML 1-1.1,1.3-1.4 HW01
Linear Models
01 Sep Math you've forgotten
Refresher on linear algebra and probability
PRML 1.2-1.2.4, App C P0
03 Sep Linear regression
Least squares and probabilistic interpretation
PRML 3-3.1.4, 1.2.5 HW02 py
08 Sep Linear classification
Logistic and hinge regression, gradient descent
PRML 4-4.1.2, 4.3-4.3.2 -
10 Sep Linear classification
Perceptron algorithm
PRML 4.1.7, Perceptron HW03
15 Sep Generative classification
Naive Bayes and generative vs discriminative
PRML 4.2 (optional: ng) -
17 Sep Practical issues
Features and evaluation
eval HW04 -
Non-linear Models
22 Sep Neural networks
Multiple layers and back propagation
PRML 5-5.2 -
24 Sep Neural networks II
Back propagation and invariances
PRML 5.3-5.3.2, 5.5-5.5.3 (optional: 5.5.5) P1,HW05
29 Sep Instance-Based Learning
Nearest neighbors and locally weighted regression
mitchell-instance 8-8.4 -
01 Oct Kernel methods
Optimizing in the dual
PRML 6-6.2 HW06
06 Oct Support vector machines
Support vectors and hinge loss
PRML 7-7.1.2 -
08 Oct Learning theory
PAC learning and VC dimension
pac 7-7.4 HW07
Unsupervised Learning
20 Oct Hierarchical clustering
Algorithms and measures
hier (skip 14.1.3) P2
22 Oct K-means clustering
Clustering as optimization
PRML 9-9.1.1 HW08
27 Oct Mixture of Gaussians
Soft clustering and psuedo-counts
PRML 9.2-9.3.2 -
29 Oct Expectation maximization
Jensen's inequality and majorization
PRML 9.4 HW09
03 Nov Linear dimensionality reduction
Principle components analysis and factor analysis
PRML 12-12.1.4, 12.2.4 -
05 Nov Non-linear dimensionality reduction
ISOMAP, LLE, kernel PCA and Laplacian Eigenmaps
PRML 12.3, 12.4.3, manifold HW10
10 Nov Topic modeling for text
Latent semantic analysis and pLSA
LSA pLSA - -
Learning in Complex Settings
12 Nov Sequence modeling
Hidden markov models and Viterbi
PRML 13-13.2 (stop before 13.2.1) viterbi HW11
17 Nov Sequence modeling II
Forward-backward algorithm
PRML 13.2.1, 13.2.2 P3, PP
19 Nov Statistical relational learning
Conditional random fields
crf (esp. 1-1.3) HW12
24 Nov Collaborative filtering
Matrix factorization
factorize - -
01 Dec Semi-supervised learning
Expectation maximization and generative/discriminative
nigam P4 -
03 Dec Bayesian inference I
Thinking Bayesian, linear regression revisited
PRML 1.2.3, 1.2.6, 3.3, (optional: 3.5) HW13 -
08 Dec Multimodal learning
Matching words and pictures
wordspics - -
10 Dec Bayesian inference II
Approximation methods: variational EM and sampling
PRML 10.1-10.1.1, 11-11.1.2, 11.2 HW14 -
15 Dec Final exam party (3:30 pm - 5:30 pm) - PW -

 Homework Assignments

All written homeworks are due on Thursday. See the schedule above for due dates. You may handin your homework/projects here. You're free to use the LaTeX source in any way you want, but you'll need haldefs.sty and notes.sty to build them.

Written Homeworks

Programming Projects

Brain Teasers

BT0: L0 and L1 regularization tex  (due 09 Oct)
BT1: Generative vs discriminative tex  (due 09 Oct)
BT2: Deep belief networks tex  (due 29 Oct)
BT3: Learning the kernel tex  (due 6 Nov)
BT4: String and subsequence kernels tex  (due 6 Nov)
BT5: Infinite mixture models tex  (due 27 Nov)
BT6: Latent Dirichlet allocation tex  (due 27 Nov)
BT7: SVMs for structured outputs tex  (due 11 Dec)
BT8: Max-margin matrix factorization tex  (due 11 Dec)
BT9: Error correcting tournaments tex  (due 11 Dec)

 Final Project

The final project is worth a substantial portion of your grade. You should use it as an opportunity to show off to me what you've learned and how creative you can be. You have three options for a final project: There are three key requirements for the projects:
 Useful Links and Software

This course has been taught (by me!) in the past: Fall 2008, Spring 2008 and Spring 2007 .

This course is similar to several other machine learning courses, taught at other universities: CMU (Tom Mitchell and Andrew Moore), Stanford (Andrew Ng), Cornell (Thorsten Joachims) and Edinburgh (Sethu Vijayakumar). There have also been a series of summer schools on machine learning, some of which have videos up.

Although you won't need to use any of this software for your homeworks/projects, there are a large number of open-source machine learning toolkits out there. (Some of these may be useful for the competition.) A small sample:


 Course Policies

Cheating: Any assignment or exam that is handed in must be your own work. However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else's solution, you are cheating. If you let someone else copy your solution, you are cheating. If someone dictates a solution to you, you are cheating. Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an E in the course and referred to the University Student Behavior Committee. Please don't take that chance - if you're having trouble understanding the material, please let us know and we will be more than happy to help.

ADA: The University of Utah conforms to all standards of the Americans with Disabilities Act (ADA). If you wish to qualify for exemptions under this act, notify the Center for Disabled Students Services, 160 Union.

College guidelines: Document concerning adding, dropping, etc. here.