Machine Learning
CS 5350/CS 6350
Spring 2008
Instructor: Hal Daume III: me AT hal3 DOT name
Office Hours: MEB 3126; by appointment
Schedule: Tuesday/Thursday, 9:10 - 10:30am
Location: WEB 110
Mailing list: Cs5350 -- PLEASE subscribe (but don't post)!
teach-cs5350 (post questions here)
TA: Mina Jeong (office hours MW 11-12, MEB 3157)

Jump to: [Syllabus] [Readings] [Homework] [Project] [Links+Software]

 Background and Description

Machine learning is all about finding patterns in data. The whole idea is to replace the "human writing code" with a "human supplying data" and then let the system figure out what it is that the person wants to do by looking at the examples. The most central concept in machine learning is generalization: how to generalize beyond the examples that have been provided at "training time" to new examples that you see at "test time." A very large fraction of what we'll talk about has to do with figuring out what generalization means. We'll look at it from lots of different perspectives and hopefully gain some understanding of what's going on.

There are a few cool things about machine learning that I hope to get across in class. The first is that it's broadly applicable. These techniques have led to significant advances in many fields, including stock trading, robotics, machine translation, computer vision, medicine, etc. The second is that there is a very close connection between theory and practice. While this course is more on the "practical" side of things, almost everything we will talk about has a huge amount of accompanying theory. The third is that once you understand the basics of machine learning technology, it's a very open field and lots of progress can be made quickly, effectively by figuring out ways to formalize whatever we can figure out about the world.

This course covers the basics of machine learning (supervised and unsupervised learning: essentially, learning with and without a teacher), plus reinforcement learning. A good, brief overview of the field is available here.

The catalog lists CS 3510 as a prerequisite; this can be waived if you have or can quickly acquire reasonable programming skills (in Matlab) or if you are a graduate student. There will be a fair amount of math in this class, but all I'll really expect you to know coming in is how to take derivatives. Some assignments will probably go quicker if you have some background in continuous math (probability and/or linear algebra), but we'll cover in class all you need to know there.

 Topics Covered


 Grading

Your grade will be based on: homework assignments (there are 12 of them, but fear not: they're pretty short), programming projects (there are four of them), and either a final exam or a course project. Students in 6350 will also have to read and summarize five papers throughout the semester. If you're in 5350, you may either take a final exam or do a course project. If you're in 6350, you must do the course project. (More details on the project are below.)

Homeworks may be turned in up to 24 hours late, at a 50% penalty; after that, late turnins will not be accepted (because I will need to put solutions up on the web page). Your lowest homework score over the semester will be dropped before your final grade is computed.

Each "part" of a homework assignment is worth one point (your worst homework score will be dropped at the end of the semester). Each programming project is worth four points. Each paper summary (for those in 6350) is also worth one point. The final/course project is worth eight points. Thus, scores for students in 5350 will be out of 11+4*4+8=35 points. Scores for students in 6350 will be out of 11+4*4+5+8=40 points.


 Textbooks

The official course textbook is Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman (ISBN 0387952845). We will also make use of the online textbook, Machine Learning: A probabilistic approach, by David Barber. Other recommended (but not required) books:

Pattern Recognition and Machine Learning by Chris Bishop (SBN 0387310738)

Information Theory, Inference and Learning Algorithms by David MacKay (ISBN 0521642981)

Machine Learning by Tom Mitchell (ISBN 0070428077)

An Introduction to Computational Learning Theory by Michael Kearns and Umesh Vazirani (ISBN 0262111934)


 Syllabus (tentative)

The following syllabus is subject to change, but likely not by very much. The readings listed are readings that you should have finished by that date. Readings listed as "ESL" are from Elements of Statistical Learning. Readings preficed by "PA" are from Machine Learning: A probabilistic approach, an online book of notes by David Barber.

There are some readings that are marked for 6350 students only. In fact, there are (at least) 12 of these. Students in 6350 are expected to select a subset of (at least) five of these readings. They will read the papers (all of which are quite recent) and write a half-page summary. 5350 students are obviously encouraged to read these as well, and they will recieve extra credit for writing summaries. Summaries are due by the end of the semester; of course, you'll get more useful feedback on the summary if you hand it in in a timely fashion.

Homework assignments are due by before class (by 9:10am) on the date listed on the syllabus. Programming assignments are to be completed in Matlab.

Date Topics Readings HW Notes
8 Jan Course overview, general classes of problems
Data, features, curse of dimensionality
- -
10 Jan Learning: It's all about generalization
Overfitting, experimental design, probability and optimization
PA 1 -
Supervised Learning
15 Jan Learning to play 20 questions
Decision trees (entropy and overfitting)
ESL 9.2, [Q86] HW0 due
17 Jan Matlab tutorial and Q/A session
(attendance optional)
(Piyush Rai will present; Hal is out of town)
- -
22 Jan I am what I look most like
Nearest neighbors (geometry of data, feature vectors)
ESL 13.3-13.5
6350: [WBS06]
HW1A due
24 Jan Splitting down the middle
Hyperplane classifiers and margins, hardness results
ESL 4.5,
[N06] (ignore
3.6-3.13, 4.2-4.6)
-
29 Jan Splitting down the middle, almost optimally
Support vector machines
ESL 12.1-12.3.2,
[B98] sec 1-3
-
31 Jan Making infinite hyperplanes
Kernels
ESL 12.3.3-12.3.4,
[B98] remainder
6350: [KSD06]
HW1B due
5 Feb Catch up - - -
7 Feb Catch up - HW1C due -
12 Feb How much is that car worth?
Linear regression, support vector regression
Also: feature selection
ESL 3.2,
12.3.5-12.3.7
-
Learning Theory
14 Feb I might kind of do a good job
PAC learning and VC dimension
ESL 10-10.7,
[BDHLZ05]
6350: [BBBCL07]
-
19 Feb Continued... - P1 due
21 Feb Getting mileage out of old algorithms
Boosting and reductions
ESL 7.4-7.6, 7.9
6350: [M03]
-
26 Feb Continued... 6350: [L05] HW2A due
28 Feb Learning with someone staring over your shoulder
Online learning: perceptron and winnow
TBA
6350: [S99], [MPKWJW05]
-
Unsupervised Learning
4 Mar Put me with my neighbors
K-means clustering
ESL 14.3-14.3.6,
14.3.10
-
6 Mar Finding out my ancestry
Agglomerative clustering
ESL 14.3.12
6350: [TDR07]
HW2B due -
11 Mar Catch up - P2 due -
13 Mar Getting rid of excess dimensions
Principle component analysis
ESL 14.5
(tutorial)
-
25 Mar Catch up - HW3A due -
27 Mar Warping dimensions
MDS, ISOMAP, LLE and kernel PCA
ESL 14.7, [SWHSL06]
6350: [WSZS07]
-
Probabilistic Methods
1 Apr Paint pictures, not math
Graphical models and naive Bayes
PA 10,11,18 [J04] HW3B due -
3 Apr What to do when stuff is missing
Latent variables and Gaussian mixture models
PA 19.2 -
8 Apr Missing stuff, in the general case
Expectation maximization for PCA and factor analysis
PA 14-14.3, [D04]
6350: [L03]
HW4A due
10 Apr Catch up - P3 due -
15 Apr Being uncertain about everything
Hierarchical Bayesian modeling
PA 15,16, [GS04]
6350: [BNJ03], [AFDJ03], [T08]
- -
Reinforcement Learning
17 Apr Controlling robots in block world
Markov decision processes, Bellman equations
[SB98] ch. 1,3
6350: [SB98] ch 2
HW4B due
22 Apr Learning to control over time
Value iteration, policy iteration
[SB98] ch 4
6350: [KKJ03]
HW4C due
P4 due

 Extra Readings

Several of the topics we will discuss are not covered in sufficient depth in Elements. These are provided instead in tutorial and research papers, listed here. These readings are required.

[AFDJ03] An introduction to MCMC for machine learning
by C. Andrieu, N. de Freitas, A. Doucet and M. I. Jordan.
Machine Learning, 2003.
[B98] A Tutorial on Support Vector Machines for Pattern Recognition
by Chris Burges.
KDDM, 1998.
[BBBCL07] Robust Reductions from Ranking to Classification
by Nina Balcan, Nikhil Bansal, Alina Beygelzimer, Don Coppersmith, John Langford, and Greg Sorkin.
COLT 2007.
[BDHLZ05] Reductions Between Classification Tasks
by Alina Beygelzimer, Varsha Dani, Tom Hayes, John Langford and Bianca Zadronzny.
ICML, 2005.
[BKNS04] Policy search by dynamic programming
by J. Andrew Bagnell, Sham Kakade, Andrew Y. Ng and Jeff Schneider.
NIPS 2004.
[BNJ03] Latent Dirichlet allocation
by Dave Blei, Andrew Ng and Michael Jordan.
JMLR, 2003. (You can ignore Section 5 (Inference and Parameter Estimation)).
[D04] Using EM to Estimate a Probability Density with a Mixture of Gaussians
by Aaron D'Souza.
Note, 2004.
[GE03] An Introduction to Variable and Feature Selection
by Isabelle Guyon and Andre Elisseeff.
JMLR 2003.
[GS04] Finding scientific topics
by Tom Griffiths and Mark Steyvers.
PNAS, 2004.
[J04] Graphical models
by Michael I. Jordan.
Statistical Science 2004.
[KKJ03] Exploration in Metric State Spaces
by Sham Kakade, Michael Kearns, and John Langford.
ICML 2003.
[KSD06] Learning Low-Rank Kernel Matrices
by Brian Kulis, Matyas Sustik, Inderjit Dhillon.
ICML 2006.
[L03] Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
by Neil Lawrence.
NIPS 2003.
[L05] Tutorial on Practical Prediction Theory for Classification
by John Langford.
JMLR 2005.
[M03] Simplified PAC-Bayesian Margin Bounds
by David McAllester.
COLT 2003.
[MPKWJW05] Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE
by R. McDonald, F. Pereira, S. Kulick, S. Winters, Y. Jin, and P. White.
ACL 2005.
[N06] Linear algebra review and reference
by Andrew Ng.
Draft tutorial, 2006.
[NG00] PEGASUS: A policy search method for large MDPs and POMDPs
by Andrew Y. Ng and Michael I. Jordan.
UAI 2000.
[NMM06] Semi-supervised Text Classification Using EM
by Kamal Nigam, Andrew McCallum and Tom Mitchell.
In Semi-supervised Learning, 2006.
[PS07] Policy Gradient Methods for Robotics
by Jan Peters and Stefan Schaal.
IROS 2006.
[Q86] Induction of Decision Trees
by J.R. Quinlan.
MLJ, 1986.
[S99] Perceptron, Winnow, and PAC Learning
by R. Servedio.
COLT 1999.
[SB98] Reinforcement Learning: An Introduction
by Rich Sutton and Andrew Barto.
MIT Press, 1998.
[SM06] An Introduction to Condition Random Fields for Relational Learning
by Charles Sutton and Andrew McCallum.
Book Chapter in Introduction to Statistical Relational Learning, 2006.
[SWHSL06] Spectral methods for dimensionality reduction
by L. Saul, K. Weinberger, J. Ham, F. Sha and D. Lee.
In "Semisupervised learning" 2006.
[T08] Dirichlet Processes
by Yee Whye Teh.
Draft tutorial, 2008.
[TDR07] Bayesian Agglomerative Clustering with Coalescents
by Yee Whye Teh, Hal Daumé III and Daniel Roy
NIPS 2007.
[WBS06] Distance Metric Learning for Large Margin Nearest Neighbor Classification
by Kilian Weinberger, John Blitzer and Lawrence Saul.
NIPS 2006.
[WSZS07] Graph Laplacian methods for large-scale semidefinite programming, with an application to sensor localization
by Kilian Weinberger, Fei Sha, Qihui Zhu and Lawrence Saul.
NIPS 2007.
[ZGL03] Semi-supervised learning using Gaussian fields and harmonic functions
by Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty.
ICML 2003.

Here are some reading suggestions for students in 6350 based on interests. Note that these don't necessarily cover all five papers that you'll need.


 Homework Assignments

See the syllabus above for due dates. Please see the handin instructions. Homeworks are all equally weighted.
 Course Project

If you are in 5350, you have the choice of (A) taking a final exam or (B) doing a course project. You must decide by the end of February which option you would like to take. If you are in 6350, you must do a project; teams of two are allowed, but not required.

For those who are doing a project, you must (1) meet with me for 20-30 minutes by the first week of March to discuss project topics (schedule an appointment with me at least one week in advance). (2) After meeting with me, you will turn in a 1 page project proposal (PP) that describes your project topic, methods and evaluation. (3) Write up the results of the project in the final project write-up (PW), which is due by the end of final's week. There is more detail about the project and some canned ideas here. For some example projects from similar courses in previous years, see here, here, here or here.

 Useful Links and Software

This course is similar to several other machine learning courses, taught at other universities: CMU (Tom Mitchell and Andrew Moore), Stanford (Andrew Ng), Cornell (Thorsten Joachims) and Edinburgh (Sethu Vijayakumar). There have also been a series of summer schools on machine learning, some of which have videos up.


 Course Policies

Cheating: Any assignment or exam that is handed in must be your own work. However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else's solution, you are cheating. If you let someone else copy your solution, you are cheating. If someone dictates a solution to you, you are cheating. Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an E in the course and referred to the University Student Behavior Committee. Please don't take that chance - if you're having trouble understanding the material, please let us know and we will be more than happy to help.

ADA: The University of Utah conforms to all standards of the Americans with Disabilities Act (ADA). If you wish to qualify for exemptions under this act, notify the Center for Disabled Students Services, 160 Union.

College guidelines: Document concerning adding, dropping, etc. here.