Machine Learning
CS 5350/CS 6350
Fall 2008
![]() |
Machine Learning
CS 5350/CS 6350 Fall 2008
|
![]() |
Machine learning is all about finding patterns in data. The whole
idea is to replace the "human writing code" with a "human supplying
data" and then let the system figure out what it is that the person
wants to do by looking at the examples. The most central concept in
machine learning is generalization: how to generalize beyond
the examples that have been provided at "training time" to new
examples that you see at "test time." A very large fraction of what
we'll talk about has to do with figuring out what generalization
means. We'll look at it from lots of different perspectives and
hopefully gain some understanding of what's going on.
There are a few cool things about machine learning that I hope to get
across in class. The first is that it's broadly applicable. These
techniques have led to significant advances in many fields, including
stock trading, robotics, machine translation, computer vision,
medicine, etc. The second is that there is a very close connection
between theory and practice. While this course is more on the
"practical" side of things, almost everything we will talk about has a
huge amount of accompanying theory. The third is that once you
understand the basics of machine learning technology, it's a very open
field and lots of progress can be made quickly, effectively by
figuring out ways to formalize whatever we can figure out about the
world.
The catalog lists CS 3510 as a prerequisite; this can be waived if you
have or can quickly acquire reasonable programming skills (in Matlab)
or if you are a graduate student. There will be a fair amount of math
in this class, but all I'll really expect you to know coming in is how
to take derivatives. Some assignments will probably go quicker if you
have some background in continuous math (probability and/or linear
algebra), but we'll cover in class all you need to know there.
Grading differs between 5350 and 6350. The components are:
| Date | Topics | Readings | HW | Notes |
| 26 Aug |
Course overview, general classes of problems
Supervised, unsupervised, online and reinforcement learning Background in discrete probability |
- | - | ![]() |
| 28 Aug |
Decision trees
Entropy and information gain Overfitting and pruning |
dtrees.pdf (p1-10) | - | ![]() |
| 2 Sep |
Learning: It's all about generalization
Evaluating learned hypotheses Cross validation and statistical significance |
eval.pdf | HW0 due | - |
| 4 Sep |
Nearest neighbors
Geometry of data, feature vectors |
knn.pdf | - | ![]() |
| 9 Sep |
Neural networks
Network structure and perceptron |
nnet.pdf (p1-5) | HW1 due | ![]() |
| 11 Sep | Matlab tutorial and Q/A session (in CADE Lab 2) | - | - | ![]() |
| 16 Sep |
Fitting parametric functions
Gradient descent and back-propagation |
nnet.pdf (p6-p14) | HW2 due | - |
| 18 Sep |
Support vector machines
Optimizing the mistake bound: margins |
Burges98.pdf (sec 3-3.4) | - | ![]() |
| 23 Sep | Catch-up | - | HW3 due P1 due |
- |
| 25 Sep |
Loss functions and regularization
0/1 loss in NP-hard Hinge loss, log loss, exponential loss |
Burges98.pdf (sec 3.5-3.7) | - | - |
| 30 Sep |
Loss functions and regularization II
L0, L1, L2 and Linfinity penalties |
- | HW4 due | - |
| 2 Oct |
Kernels
Dual formulation of SVMs |
Burges98.pdf (sec 4) | - | - |
| 7 Oct |
Kernels II
From linear to non-linear learning |
- | - | ![]() |
| 9 Oct | Catch-up and Midterm review | - | - | - |
| 21 Oct |
PAC learning
Definition of PAC learning |
pac.pdf (through 1.3) | HW5 due P2 due |
![]() |
| 23 Oct | MIDTERM | - | - | - |
| 28 Oct | PAC II: VC dimension | pac.pdf (rest) | - | ![]() |
| 30 Oct |
Boosting
Weak learners |
schapire99boosting.pdf | - | ![]() |
| 4 Nov |
Reductions
Multiclass to binary: OVA, AVA |
langford05wap.pdf | - | ![]() |
| 6 Nov |
Reductions II
Cost-sensitive classification, ranking |
- | HW7 due | - |
| 11 Nov |
Conditional models
Logistic and linear regression, revisited |
prob.pdf linreg.pdf (esp p1-5) | P3 due | ![]() |
| 13 Nov |
Inference in Bayesian models
Maximum a posterior and clustering |
- | - | - |
| 18 Nov |
Inference in Bayesian models II
(Markov Chain) Monte Carlo |
- | HW8 due | - |
| 20 Nov |
Inference in Bayesian models III
Expectation maximization |
mix_gauss.pdf | - | ![]() |
| 25 Nov |
Latent variable models II
Matching words and pictures |
slides-bayes.pdf | P4 due | - |
| 2 Dec |
Structured prediction
Conditional random fields and max-margin Markov networks |
- | - | ![]() |
| 4 Dec |
Collaborative Filtering
The NetFlix Challenge |
slides-cf.pdf | HW9 due | - |
| 9 Dec | Natural language processing (Riloff) | - | - | - |
| 11 Dec | Statistical image processing (Fletcher) | - | P5 due HW10 due |
- |
Written Homeworks