Machine Learning
CS 5350/CS 6350
Spring 2007
Instructor: Hal Daume III: me AT hal3 DOT name
Office Hours: MEB 3126; Tue 10:30-noon (or by appointment)
Schedule: Tuesday/Thursday, 9:10 - 10:30am
Locantion: NS 207 (NEW ROOM!)
Mailing list: Cs5350 -- PLEASE subscribe (but don't post)!
TA: Eun Yong Kang (office hours Wed 2-5pm, MEB 4158)

Jump to: [Syllabus] [Readings] [Homework] [Project] [Links+Software]

 Background and Description

The field of machine learning attempts to build algorithms that are able to discover and exploit patterns in data. These techniques have led to significant advances in many fields, including stock trading, robotics, machine translation, computer vision, medicine, etc. This course covers the basics of machine learning (supervised and unsupervised learning: essentially, learning with and without a teacher) as well as some more advanced, recent research topics. A good, brief overview of the field is available here.

The catalog lists CS 3510 as a prerequisite; this can be waived if you have or can quickly acquire reasonable programming skills (in C, Matlab, Java, R, Perl, whatever; talk to me) or if you are a graduate student. There will be a fair amount of math in this class, but all I'll really expect you to know coming in is how to take multivariate derivatives. Some assignments will probably go quicker if you have some background in continuous math (probability and/or linear algebra), but we'll cover in class all you need to know there. See also Reading 0 for some more background.

 Topics Covered


 Grading

Your grade will be determined by your performance in the following areas: homework, midterm exam, course project and final exam. There are five assignments, which frequently involve both programming and written aspects. There is one midterm exam, just preceding spring break, and a final exam. There is a large course project, the topic of which is largely of your choosing. Teams of two to three are allowed for the project, but everything else will be done individually.

Each homework is worth 1 point (for a total of 5 points). The midterm is worth 4 points. The project is worth 4 points if you're enrolled in 5350 and 6 points if you're enrolled in 6350. The final exam is worth 6 points if you're enrolled in 5350 and 4 points if you're enrolled in 6350. One additional point is withheld for class participation, etc. There will be limited opportunity for extra credit on some homework assignments, midterm and final. I do not curve, but do adjust cutoffs based on overall performance (though the cutoffs for 5350 and 6350 will be determined separately).

 Textbooks

The textbook will be the new book by Chris Bishop, Pattern Recognition and Machine Learning (ISBN 0387310738). Other recommended (but not required) books:

Information Theory, Inference and Learning Algorithms by David MacKay (ISBN 0521642981)

Machine Learning by Tom Mitchell (ISBN 0070428077)

An Introduction to Computational Learning Theory by Michael Kearns and Umesh Vazirani (ISBN 0262111934)

Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman (ISBN 0387952845)


 Syllabus (tentative)

The following syllabus is subject to change, but likely not by very much. The readings listed are readings that you should have finished by that date. That is, you should have read 1.1-1.4 in PRML and additional reading 0 by the time you come to class on Jan 11. Homework assignments are due by before class (by 9:10am) on the date listed on the syllabus. I highly recommend that you use Matlab for the programming aspects of the assignments: the upfront cost should be low and it will make your assignments much easier (I will also sometimes provide shells in Matlab for the assignments). One other language (TBD) will be allowed.

Date Topics Readings HW Notes
09 Jan What is machine learning?
Learning theory, Bias, Overfitting
- 0 out  
11 Jan Math refresher
Statistic, linear algebra and calculus
1.1-1.4, [0] -  
SUPERVISED LEARNING
16 Jan Linear models for regression
Least-squares techniques
Bias/variance trade-off
3.1-3.2 0 due  
18 Jan Linear models for classification
Logistic regression, naive Bayes
4.1-4.3, [1] 1 out  
23 Jan Linear models for classification (cont'd)
Perceptron algorithm, linear SVMs
7.1.1-7.1.3,
[2]
-  
25 Jan Linear models (cont'd)
Linear SVMs, VC dimension
6.1-6.2, [2] - (same)
30 Jan Non-linear models
Nearest neighbors, decision trees
[3] -  
01 Feb Non-linear models
Non-linear SVMs, Kernels
[4] 1 due
2 out
 
06 Feb Combining models
Boosting, bagging and reductions
14.2, 14.3,
[6], [7]
-  
08 Feb Feature selection
Discussion of projects
[8] -  
UNSUPERVISED LEARNING
13 Feb Clustering
K-means, agglomerative
9.1, [9] -  
15 Feb Clustering (cont'd)
Mixtures of Gaussians
9.2, [10] 2 due
3 out
 
20 Feb Expectation maximization 9.4, [11] -  
22 Feb Expectation maximization (cont'd)
Semi-supervised learning
[12] -  
27 Feb Low-dimensional representations
Principle component analysis
Locally linear embedding
12.1, [13] -  
STRUCTURED PREDICTION
01 Mar Sequence labeling
Hidden Markov Models
13.1-13.2 -  
06 Mar Sequence labeling and beyond
Maximum Entropy Markov Models
Conditional Random Fields
[14], [15] -  
08 Mar Sequence labeling and beyond (cont'd)
Structured Perceptron
Search-based Structured Prediction
[16], [18] 3 due
4 out
 
13 Mar Catch-up - PP due -
15 Mar MIDTERM - - -
20 Mar Spring break - - -
22 Mar Spring break - - -
BAYESIAN LEARNING
27 Mar Introduction to Bayesian learning
Probability distributions, graphical models
1.5, 2.1-2.4,
8.1
-  
29 Mar Cancelled
(Hal out of town)
- - -
03 Apr Inference
Exact inference, basic sampling
8.2,
8.4.1-8.4.2
4 due
5 out
 
05 Apr Inference (cont'd)
Markov Chain Monte Carlo
Latent Dirichlet allocation
11.1-11.3,
[19], [20]
-  
10 Apr Classification and regression
Bayesian linear/logistic regression
Laplace approximation
3.3, 4.4-4.5 -  
12 Apr Catch-up - - -
PROJECT PRESENTATIONS
17 Apr Project presentations
Virost; Hansen/Oh/Zhu; Kim/Rai; Lanka
- - -
19 Apr Project presentations
Gilbert/Bresee; Tandon; Valentine/Alfeld; Ha/Santos/Wang
- - -
24 Apr Project presentations
Gerber; Abbasi/Quist; Hetrick/Milyavskaya; Andrade
- 5 due
PW due
-

 Extra Readings

Several of the topics we will discuss are not covered in sufficient depth in PRML. These are provided instead in tutorial and research papers, listed here. These readings are required.

  1. Linear Algebra Review and Reference by Andrew Ng.
    Don't be afraid if not everything is 100% familiar. You can ignore Sections 3.6-3.13 and 4.2-4.6.
  2. A Brief Maxent Tutorial by Adam Berger.
  3. A Tutorial on Support Vector Machines for Pattern Recognition by Chris Burges. KDDM, 1998. (Also as PDF)
    You only need to read sections 1-3 for now.
  4. Induction of Decision Trees by J.R. Quinlan. MLJ, 1986.
  5. A Tutorial on Support Vector Machines for Pattern Recognition by Chris Burges. KDDM, 1998.
    Now, read sections 4-10 (the rest).
  6. Computational Learning Theory by Sally Goldman.
    You only need to read sections 1-3 (though you are encouraged to read the whole thing).
  7. Reductions Between Classification Tasks by Alina Beygelzimer, Varsha Dani, Tom Hayes, John Langford and Bianca Zadronzny. ICML, 2005.
  8. A brief introduction to boosting by Robert Schapire. IJCAI, 1999.
  9. An Introduction to Variable and Feature Selection by Isabelle Guyon and Andre Elisseeff. JMLR, 2003.
  10. Handout: Chapter from Hastie et al.
  11. Using EM To Estimate A Probablity Density With A Mixture Of Gaussians by Aaron D'Souza.
  12. The Expectation Maximization Algorithm: A short tutorial by Sean Borman.
  13. Semi-supervised Text Classification Using EM by Kamal Nigam, Andrew McCallum and Tom Mitchell. In Semi-supervised Learning, 2006.
  14. An Introduction to Locally Linear Embedding by Lawrence Saul and Sam Roweis. 2001.
  15. Maximum Entropy Markov Models for Information Extraction and Segmentation by Andrew McCallum, Dayne Freitag and Fernando Pereira. ICML, 2000.
  16. An Introduction to Conditional Random Fields for Relational Learning by Charles Sutton and Andrew McCallum. In Intro to SRL, 2006.
  17. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms by Michael Collins. EMNLP, 2002.
  18. Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin and Daphne Koller. NIPS, 2003.
  19. Search-based Structured Prediction by Hal Daume III, John Langford and Daniel Marcu. MLJ, under review.
  20. Latent Dirichlet allocation by Dave Blei, Andrew Ng and Michael Jordan. JMLR, 2003.
    You can ignore Section 5 (Inference and Parameter Estimation).
  21. Finding scientific topics by Tom Griffiths and Mark Steyvers. PNAS, 2004.

 Homework Assignments

See the syllabus above for due dates. There are five assignments (1-5) plus an assignment zero. HW0 does not affect your score, but you must turn it in (if you do not, you will automatically receive a zero on all other assignments). Please see the handin instructions.
 Course Project

You are required to carry out a substantial project in the area of machine learning. The exact expectations differ slightly depending on whether you are enrolled in 5350 or 6350 (the project for those in 6350 should contain more novel developments than simply applying known techniques to some data). There are four components to the project. (1) You will meet with me for about 20-30 minutes sometime between 14 Feb and 23 Feb to discuss project topics (you must contact me before 16 Feb to schedule an appointment). (2) You will turn in a formal project proposal (PP) that describes your project. (3) You will present the results of your project to the class during the last two weeks. (4) You will turn in a final project write-up (PW).

There is more detail about the project and some canned ideas here. For some example projects from similar courses in previous years, see here, here, here or here.

 Useful Links and Software

This course is similar to several other machine learning courses, taught at other universities: CMU (Tom Mitchell and Andrew Moore), Stanford (Andrew Ng), Cornell (Thorsten Joachims) and Edinburgh (Sethu Vijayakumar). The major recent and upcoming conferences in this area are: ICML (2004, 2005, 2006), UAI (2004, 2005, 2006) and NIPS. The major journals are: Journal of Machine Learning Research and Machine Learning Journal. There have also been a series of summer schools on machine learning, some of which have videos up.

The following software may be of interest:


 Cheating

Any assignment or exam that is handed in must be your own work. However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else's solution, you are cheating. If you let someone else copy your solution, you are cheating. If someone dictates a solution to you, you are cheating. Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an E in the course and referred to the University Student Behavior Committee. Please don't take that chance - if you're having trouble understanding the material, please let us know and we will be more than happy to help.


 ADA

The University of Utah conforms to all standards of the Americans with Disabilities Act (ADA). If you wish to qualify for exemptions under this act, notify the Center for Disabled Students Services, 160 Union.


 College Guidelines

Document concerning adding, dropping, etc. here.