Machine Learning
CS 5350/CS 6350
Spring 2008
![]() |
Machine Learning
CS 5350/CS 6350 Spring 2008
|
![]() |
Machine learning is all about finding patterns in data. The whole
idea is to replace the "human writing code" with a "human supplying
data" and then let the system figure out what it is that the person
wants to do by looking at the examples. The most central concept in
machine learning is generalization: how to generalize beyond
the examples that have been provided at "training time" to new
examples that you see at "test time." A very large fraction of what
we'll talk about has to do with figuring out what generalization
means. We'll look at it from lots of different perspectives and
hopefully gain some understanding of what's going on.
There are a few cool things about machine learning that I hope to get
across in class. The first is that it's broadly applicable. These
techniques have led to significant advances in many fields, including
stock trading, robotics, machine translation, computer vision,
medicine, etc. The second is that there is a very close connection
between theory and practice. While this course is more on the
"practical" side of things, almost everything we will talk about has a
huge amount of accompanying theory. The third is that once you
understand the basics of machine learning technology, it's a very open
field and lots of progress can be made quickly, effectively by
figuring out ways to formalize whatever we can figure out about the
world.
This course covers the basics of machine learning (supervised and
unsupervised learning: essentially, learning with and without a
teacher), plus reinforcement learning. A good, brief overview of the
field is available here.
The catalog lists CS 3510 as a prerequisite; this can be waived if you
have or can quickly acquire reasonable programming skills (in Matlab)
or if you are a graduate student. There will be a fair amount of math
in this class, but all I'll really expect you to know coming in is how
to take derivatives. Some assignments will probably go quicker if you
have some background in continuous math (probability and/or linear
algebra), but we'll cover in class all you need to know there.
Your grade will be based on: homework assignments (there are 12 of
them, but fear not: they're pretty short), programming projects (there
are four of them), and either a final exam or a course project.
Students in 6350 will also have to read and summarize five papers
throughout the semester. If you're in 5350, you may either take a
final exam or do a course project. If you're in 6350, you must do the
course project. (More details on the project are below.)
Homeworks may be turned in up to 24 hours late, at a 50% penalty;
after that, late turnins will not be accepted (because I will need to
put solutions up on the web page). Your lowest homework score over
the semester will be dropped before your final grade is computed.
Each "part" of a homework assignment is worth one point (your worst
homework score will be dropped at the end of the semester). Each
programming project is worth four points. Each paper summary (for
those in 6350) is also worth one point. The final/course project is
worth eight points. Thus, scores for students in 5350 will be out of
11+4*4+8=35 points. Scores for students in 6350 will be out of
11+4*4+5+8=40 points.
The official course textbook is Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman (ISBN 0387952845). We will also make use of the online textbook, Machine
Learning: A probabilistic approach, by David Barber. Other recommended (but not required) books:
Pattern Recognition and Machine Learning by Chris Bishop (SBN 0387310738)
Information Theory, Inference and Learning Algorithms by David MacKay (ISBN 0521642981)
Machine Learning by Tom Mitchell (ISBN 0070428077)
An Introduction to Computational Learning Theory by Michael Kearns and Umesh Vazirani (ISBN 0262111934)
| Date | Topics | Readings | HW | Notes |
| 8 Jan |
Course overview, general classes of problems
Data, features, curse of dimensionality |
- | - | ![]() |
| 10 Jan |
Learning: It's all about generalization
Overfitting, experimental design, probability and optimization |
PA 1 | - | ![]() |
| 15 Jan |
Learning to play 20 questions
Decision trees (entropy and overfitting) |
ESL 9.2, [Q86] | HW0 due | ![]() ![]() |
| 17 Jan |
Matlab tutorial and Q/A session
(attendance optional) (Piyush Rai will present; Hal is out of town) |
- | - | ![]() |
| 22 Jan |
I am what I look most like
Nearest neighbors (geometry of data, feature vectors) |
ESL 13.3-13.5 6350: [WBS06] |
HW1A due | ![]() ![]() |
| 24 Jan |
Splitting down the middle
Hyperplane classifiers and margins, hardness results |
ESL 4.5, [N06] (ignore 3.6-3.13, 4.2-4.6) |
- | ![]() |
| 29 Jan |
Splitting down the middle, almost optimally
Support vector machines |
ESL 12.1-12.3.2, [B98] sec 1-3 |
- | ![]() ![]() |
| 31 Jan |
Making infinite hyperplanes
Kernels |
ESL 12.3.3-12.3.4, [B98] remainder 6350: [KSD06] |
HW1B due | ![]() |
| 5 Feb | Catch up | - | - | - |
| 7 Feb | Catch up | - | HW1C due | - |
| 12 Feb |
How much is that car worth?
Linear regression, support vector regression Also: feature selection |
ESL 3.2, 12.3.5-12.3.7 |
- | ![]() ![]() |
| 14 Feb |
I might kind of do a good job
PAC learning and VC dimension |
ESL 10-10.7, [BDHLZ05] 6350: [BBBCL07] |
- | ![]() ![]() |
| 19 Feb | Continued... | - | P1 due | ![]() |
| 21 Feb |
Getting mileage out of old algorithms
Boosting and reductions |
ESL 7.4-7.6, 7.9 6350: [M03] |
- | ![]() |
| 26 Feb | Continued... | 6350: [L05] | HW2A due | ![]() |
| 28 Feb |
Learning with someone staring over your shoulder
Online learning: perceptron and winnow |
TBA 6350: [S99], [MPKWJW05] |
- | ![]() |
| 4 Mar |
Put me with my neighbors
K-means clustering |
ESL 14.3-14.3.6, 14.3.10 |
- | ![]() |
| 6 Mar |
Finding out my ancestry
Agglomerative clustering |
ESL 14.3.12 6350: [TDR07] |
HW2B due | - |
| 11 Mar | Catch up | - | P2 due | - |
| 13 Mar |
Getting rid of excess dimensions
Principle component analysis |
ESL 14.5 (tutorial) |
- | ![]() |
| 25 Mar | Catch up | - | HW3A due | - |
| 27 Mar |
Warping dimensions
MDS, ISOMAP, LLE and kernel PCA |
ESL 14.7, [SWHSL06] 6350: [WSZS07] |
- | ![]() |
| 1 Apr |
Paint pictures, not math
Graphical models and naive Bayes |
PA 10,11,18 [J04] | HW3B due | - |
| 3 Apr |
What to do when stuff is missing
Latent variables and Gaussian mixture models |
PA 19.2 | - | ![]() |
| 8 Apr |
Missing stuff, in the general case
Expectation maximization for PCA and factor analysis |
PA 14-14.3, [D04] 6350: [L03] |
HW4A due | ![]() |
| 10 Apr | Catch up | - | P3 due | - |
| 15 Apr |
Being uncertain about everything
Hierarchical Bayesian modeling |
PA 15,16, [GS04] 6350: [BNJ03], [AFDJ03], [T08] |
- | - |
| 17 Apr |
Controlling robots in block world
Markov decision processes, Bellman equations |
[SB98] ch. 1,3 6350: [SB98] ch 2 |
HW4B due | ![]() |
| 22 Apr |
Learning to control over time
Value iteration, policy iteration |
[SB98] ch 4 6350: [KKJ03] |
HW4C due P4 due |
![]() |
Several of the topics we will discuss are not covered in sufficient
depth in Elements. These are provided instead in tutorial and
research papers, listed here. These readings are required.
| [AFDJ03] | An introduction to MCMC for machine learning |
| by C. Andrieu, N. de Freitas, A. Doucet and M. I. Jordan. | |
| Machine Learning, 2003. | |
| [B98] | A Tutorial on Support Vector Machines for Pattern Recognition |
| by Chris Burges. | |
| KDDM, 1998. | |
| [BBBCL07] | Robust Reductions from Ranking to Classification |
| by Nina Balcan, Nikhil Bansal, Alina Beygelzimer, Don Coppersmith, John Langford, and Greg Sorkin. | |
| COLT 2007. | |
| [BDHLZ05] | Reductions Between Classification Tasks |
| by Alina Beygelzimer, Varsha Dani, Tom Hayes, John Langford and Bianca Zadronzny. | |
| ICML, 2005. | |
| [BKNS04] | Policy search by dynamic programming |
| by J. Andrew Bagnell, Sham Kakade, Andrew Y. Ng and Jeff Schneider. | |
| NIPS 2004. | |
| [BNJ03] | Latent Dirichlet allocation |
| by Dave Blei, Andrew Ng and Michael Jordan. | |
| JMLR, 2003. (You can ignore Section 5 (Inference and Parameter Estimation)). | |
| [D04] | Using EM to Estimate a Probability Density with a Mixture of Gaussians |
| by Aaron D'Souza. | |
| Note, 2004. | |
| [GE03] | An Introduction to Variable and Feature Selection |
| by Isabelle Guyon and Andre Elisseeff. | |
| JMLR 2003. | |
| [GS04] | Finding scientific topics |
| by Tom Griffiths and Mark Steyvers. | |
| PNAS, 2004. | |
| [J04] | Graphical models |
| by Michael I. Jordan. | |
| Statistical Science 2004. | |
| [KKJ03] | Exploration in Metric State Spaces |
| by Sham Kakade, Michael Kearns, and John Langford. | |
| ICML 2003. | |
| [KSD06] | Learning Low-Rank Kernel Matrices |
| by Brian Kulis, Matyas Sustik, Inderjit Dhillon. | |
| ICML 2006. | |
| [L03] | Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data |
| by Neil Lawrence. | |
| NIPS 2003. | |
| [L05] | Tutorial on Practical Prediction Theory for Classification |
| by John Langford. | |
| JMLR 2005. | |
| [M03] | Simplified PAC-Bayesian Margin Bounds |
| by David McAllester. | |
| COLT 2003. | |
| [MPKWJW05] | Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE |
| by R. McDonald, F. Pereira, S. Kulick, S. Winters, Y. Jin, and P. White. | |
| ACL 2005. | |
| [N06] | Linear algebra review and reference |
| by Andrew Ng. | |
| Draft tutorial, 2006. | |
| [NG00] | PEGASUS: A policy search method for large MDPs and POMDPs |
| by Andrew Y. Ng and Michael I. Jordan. | |
| UAI 2000. | |
| [NMM06] | Semi-supervised Text Classification Using EM |
| by Kamal Nigam, Andrew McCallum and Tom Mitchell. | |
| In Semi-supervised Learning, 2006. | |
| [PS07] | Policy Gradient Methods for Robotics |
| by Jan Peters and Stefan Schaal. | |
| IROS 2006. | |
| [Q86] | Induction of Decision Trees |
| by J.R. Quinlan. | |
| MLJ, 1986. | |
| [S99] | Perceptron, Winnow, and PAC Learning |
| by R. Servedio. | |
| COLT 1999. | |
| [SB98] | Reinforcement Learning: An Introduction |
| by Rich Sutton and Andrew Barto. | |
| MIT Press, 1998. | |
| [SM06] | An Introduction to Condition Random Fields for Relational Learning |
| by Charles Sutton and Andrew McCallum. | |
| Book Chapter in Introduction to Statistical Relational Learning, 2006. | |
| [SWHSL06] | Spectral methods for dimensionality reduction |
| by L. Saul, K. Weinberger, J. Ham, F. Sha and D. Lee. | |
| In "Semisupervised learning" 2006. | |
| [T08] | Dirichlet Processes |
| by Yee Whye Teh. | |
| Draft tutorial, 2008. | |
| [TDR07] | Bayesian Agglomerative Clustering with Coalescents |
| by Yee Whye Teh, Hal Daumé III and Daniel Roy | |
| NIPS 2007. | |
| [WBS06] | Distance Metric Learning for Large Margin Nearest Neighbor Classification |
| by Kilian Weinberger, John Blitzer and Lawrence Saul. | |
| NIPS 2006. | |
| [WSZS07] | Graph Laplacian methods for large-scale semidefinite programming, with an application to sensor localization |
| by Kilian Weinberger, Fei Sha, Qihui Zhu and Lawrence Saul. | |
| NIPS 2007. | |
| [ZGL03] | Semi-supervised learning using Gaussian fields and harmonic functions |
| by Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. | |
| ICML 2003. |