Machine Learning
CS5350/6350
Fall 2011

Instructor: Piyush Rai: (email: piyush AT cs DOT utah DOT edu)
Office Hours: MEB 3424; Wed/Fri (11:30am - 12:30pm), or by appointment
Piazza: class#fall2011/cs5350/
Class Schedule: Tue/Thu 3:40-5:00pm
Location: WEB 2250
Mailing List: cs5350@list.eng.utah.edu (SUBSCRIBE but don't post), teach-cs5350@list.eng.utah.edu (POST but don't subscribe).
TA Office Hours: Zhan Wang (Tue/Thu, 1:20pm-3:20pm, MEB 4150), Preethi Kotari (Mon/Wed, 3pm-4pm, Room: MEB 3423), Bigyan Mukherjee (Mon/Wed, 4pm-5pm, MEB 3115)
Jump to: [Schedule] [Homework] [Readings] [Links+Software]

Background and Course Description

As more and more applications domains (e.g., web, bioinformatics, computer vision, robotics, computer systems, finance, social-sciences, etc.) are beginning to witness large amounts of complex data, there is a pressing need to come up with effective ways of automatically mining useful information out of it. The field of Machine Learning (now considered a relatively mature sub-discipline of AI) offers several techniques to (automatically) infer useful patterns in the data, and making predictions from such data. The primary focus of this course will be on Supervised Learning (learning with labeled training data to predict labels of future data) and Unsupervised Learning (learning with unlabeled data for finding useful patterns and structures in it). In addition, we will explore some more advanced topics such as Semi-supervised Learning (leveraging information from unlabeled data when you have very little labeled data for the supervised learning task), Active Learning (variant of supervised learning when you have a budget on labeled example,s and the learner can *ask* which labeled examples it wants to learn from), Structured Prediction (variant of supervised learning when the output you want to predict has a structure), Transfer Learning (learning to solve problem 'B' when you have already learned solving a related problem 'A'), Bayesian Learning (quantifying uncertainty in the learned models), etc. The goal of the course will be to equip students with the basic machine learning techniques to solve problems in the application domain(s) they care about, and also to familiarize them with the state-of-the-art of the more recent/advanced methods to deal with problems that the traditional machine learning methods are unable to handle.

Prerequisites: The course would assume some familiarity with the fundamentals of linear algebra, statistics, and probability. Admittedly, grasping the material of this course would be easier if you are already fairly comfortable on these topics, but just so everyone is on the same page we will be introducing the relevant concepts as and when they are needed.

Syllabus

Supervised Learning: Decision Trees and K-Nearest-Neighbors, Linear and Ridge Regression, Perceptron, Support Vector Machines (SVM), Kernels and nonlinear SVMs, Probabilistic Models (Linear Regression and Logistic Regression), Model Selection (AIC/BIC/Cross-validation, etc.), Feature Selection, Learning Theory; Unsupervised Learning: Hierarchical and Flat Clustering, Gaussian Mixture Models (via Expectation Maximization), Linear Dimensionality Reduction and Matrix Factorization, Nonlinear Dimensionality Reduction and Manifold Learning; Assorted Topics: Boosting, Reductions, Structured Prediction, Ranking, Semi-supervised Learning, Active Learning, Reinforcement Learning, Bayesian Learning, Topic Models for Text.

Books

There will not be any dedicated textbook for this class. In lieu of that, we will have lecture slides, online notes, tutorials, and papers for the topics that will be covered in this course. In some cases, photocopies of book chapters will also be made available. Some recommended (although not required) books are:
- Pattern Recognition and Machine Learning by Chris Bishop (ISBN 0387310738):
- Elements of Statistical Learning (2nd Edition) by Trevor Hastie, Robert Tibshirani and Jerome Friedman (ISBN 0387952845). Also available online as PDF.
- Bayesian Reasoning and Machine Learning by David Barber. Also available online as PDF.
- Information Theory, Inference, and Learning Algorithms by David MacKay (ISBN 0521642981). Also available online as PDF.

Grading

The grading will take into account the followings (note: CS5350 and CS6350 will have different grading curves):
Homework Assignments: There will be a total of 5 homeworks, each of which will consist of a number of written problems and some programming assignments (worth 60%). Homework assignments are due by 11:59pm on the date listed on the class schedule. Programming assignments are to be completed in MATLAB (available on CADE lab machines). An open source (and free) alternative to MATLAB is Octave which supports most of the MATLAB functionalities (except a few subtleties).
Final Exam: There will be a final exam, required for both 5350 and 6350 (worth 20%).
Final Project: Each student is to do a final class project that can be done in teams of up to three. You can propose your own project, or can talk to me for some possible project ideas (worth 20%).

Schedule (Tentative)

Date Topic Readings/Further References Deadlines Slides/Notes
Aug 23 Introduction and Class Logistics Background material crib-sheet, MATLAB tutorial, [M06] HW0 out slides (print-version)
Supervised Learning
Aug 25 K-Nearest-Neighbors and Decision Trees knn.pdf, dtrees.pdf, [Q86], [I04], [WBS06] slides (print-version), info-theory notes
Aug 30 Decision Trees (Contd.) and Data Representation HW0 due slides (print-version)
Sep 6 Learning Models by Fitting Parameters: Linear and Ridge Regression (+Maths Refresher) lin-reg.pdf, some notes, [KD08], [MD10], [PP08] HW1 out slides (print-version)
Sep 8 Learning Hyperplane Separators: Perceptron and (Intro to) Support Vector Machines perceptron.pdf, svm.pdf, [B98] slides (print-version)
Sep 13 Support Vector Machines (Contd.), Loss Fuctions and Regularization svm.pdf, [B98], [BL07], [C07], [YHL11], [SFR09] slides (print-version)
Sep 15 Kernels Methods and Nonlinear Classification svm.pdf (section 7), Learning with Kernels, [HSS08], Learning Kernels slides (print-version)
Sep 20 Learning Probabilistic Models: Linear Regression (revisited) and Logistic Regression parameter-estimation.pdf, [M03] HW1 due slides (print-version)
Sep 22 Model Selection and Feature Selection evaluation.pdf, An Introduction to Variable and Feature Selection, [K95] HW2 out slides (print-version)
Sep 27 Learning Theory computational-learning-theory.pdf slides (print-version)
Sep 29 Supervised Learning: Odds and Ends no slides
Unsupervised Learning
Oct 4 Clustering: K-means and hierarchical clustering clustering.pdf, [JMF99], [J08] slides (print-version)
Oct 6 Probabilistic approaches to clustering (Gaussian Mixture Models via Expectation Maximization) mixture-models-em.pdf (Sections 9.2-9.3.2), EM for Mixture of Gaussians HW2 due Gaussian Mixture Models notes
Oct 11 Fall Break
Oct 13 Fall Break
Oct 18 GMM Recap and the general Expectation Maximization algorithm EM for Mixture of Gaussians, Expectation Maximization tutorial, Project Proposal The Expectation Maximization algorithm
Oct 20 Linear Dimensionality Reduction pca.pdf, PCA Tutorial (good intuitive explanations and different perspectives), [BCR04] (advanced reading) HW3 out slides (print-version)
Oct 25 Nonlinear Dimensionality Reduction: Kernel PCA, Manifold Learning (LLE, ISOMAP) LLE, Isomap, Spectral Methods for Dimensionality Reduction slides (print-version)
Assorted Topics
Oct 27 Ensemble Methods: Bagging and Boosting Bagging, Boosting, and C4.5, Ensemble Methods (SDM'10 Tutorial) AdaBoost Introduction, slides
Nov 1 Imbalanced Data and Multiclass Classification The Class Imbalance Problem, Survey on Multiclass Classification Methods, Label Embedding Trees for Large Multi-Class Tasks (optional, but read section 4 on related work) Imbalanced and Multiclass Classification (up to section 5.2), Perceptrons for Imbalanced and Multiclass Classification
Nov 3 Ranking and Collective Classification Ranking Tutorial (optional reading) HW3 due (now due on 5/11) Ranking and Collective Classification (required reading)
Nov 8 Semi-supervised Learning Semi-Supervised Learning Literature Survey (required reading: sections 1-4, 5, 5.1, 6-7; other sections optional) slides (print-version)
Nov 10 Active Learning Active Learning Literature Survey (sections 1-3; other sections optional) slides (print-version)
Nov 15 Naïve Bayes Classification; Generative vs Discriminative Models Principled Hybrids of Generative and Discriminative Models (optional reading) Book Chapter (required reading: section 1,2,4,5)
Nov 17 Structured Prediction (1): Hidden Markov Models HW4 Due HMM notes
Nov 22 Structured Prediction (2) (Optional readings) MEMM, CRF, Other references can be found from this seminar webpage Structured Prediction
Nov 24 Thanksgiving Break
Nov 29 Reinforcement Learning (1): Discrete MDPs, Value Iteration, Policy Iteration (Optional reading) Reinforcement Learning: An Introduction (chapters 1,3,4) slides (print-version)
Dec 1 Reinforcement Learning (2): Continuous MDPs Andrew Ng's notes (section 4 onwards)
Dec 6 Intro to Bayesian Learning, and class wrap-up (Optional readings) Bayesian Modelling in Machine Learning: A Tutorial Review, MCMC tutorial, Conjugate Prior Relationships Bayesian Learning intro (up to section 3)
Dec 8 Final Exam

Homework Assignments

See the syllabus above for due dates. Please see the handin instructions to know how to submit your homeworks. Homeworks are all equally weighted. We encourage using LaTeX to produce your writeups. The LaTeX source for each assignment will be provided so you can just edit the source to prepare your writeups (you will also need mydefs.sty and notes.sty to build the LaTeX source).

  • Homework 0: Basic concepts and course survey (latex source)
  • Homework 1: K-Nearest Neighbors and Decision Trees (latex source, code and data,solutions)
  • Homework 2: Linear Regression and Classification, Kernel Methods (latex source, code and data, solutions)
  • Homework 3: Probabilistic Models, Feature Selection, Learning Theory, Clustering (latex source, code and data, solutions)
  • Homework 4: Dimensionality Reduction, Ensemble Methods, Multiclass/Ranking (latex source, code and data, solutions)
  • Homework 5: Assorted Topics (latex source)
  • Bonus Assignment (latex source)

    Suggested/Further Readings

    [KD08] Linear Algebra Tutorial
    [MD10] Probability Tutorial
    [PP08] Matrix Cookbook (a very handy desktop reference for matrices)
    [M06] The discipline of machine learning, by Tom Mitchell
    [WBS06] Distance Metric Learning for Large Margin Nearest Neighbor Classification, by Weinberger et al. NIPS, 2006
    [I04] Nearest Neighbors In High-Dimensional Spaces, by Piotr Indyk.
    [Q86] Induction of Decision Trees, by J.R. Quinlan. MLJ, 1986.
    [B98] A tutorial on support vector machines for pattern recognition, by Chris Burges. KDDM, 1998.
    [BL07] Support Vector Machine Solvers, Leon Bottou, Chih-Jen Lin, Book Chapter from Large Scale Kernel Machines (2007)
    [C07] Training a Support Vector Machine in the Primal, Olivier Chapelle (2007)
    [YHL11] Recent Advances of Large-scale Linear Classification, by Guo-Xun Yuan, Chia-Hua Ho, and Chih-Jen Lin, 2011
    [SFR09] Optimization Methods for L1 Regularization, Mark Schmidt, Glenn Fung, Romer Rosales, Technical Report, 2009
    [HSS08] Kernel Methods for Machine Learning, Thomas Hofmann, Bernhard Scholkopf, and Alexander J. Smola, The Annals of Statistics, 2008
    [M03] A comparison of numerical optimizers for logistic regression, Thomas Minka, Technical Report, 2003
    [K95] A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, Ron Kohavi, IJCAI 1995
    [JMF99] Data Clustering: A Review, A.K. Jain, M.N. Murty, and P.J. Flynn, ACM Computing Surveys, 1999
    [J08] Data Clustering: 50 Years Beyond K-Means, A.K. Jain, 2008
    [S03] A Tutorial on Principal Component Analysis, Jon Shlens
    [BCR04] Eigenproblems in Pattern Recognition, Tijl De Bie, Nello Cristianini, and Roman Rosipal
    [SR00] An Introduction to Locally Linear Embedding, Lawrence Saul and Sam Roweis (2000)
    [TSL00] A Global Geometric Framework for Nonlinear Dimensionality Reduction, Joshua B. Tenenbaum, Vin de Silva, John C. Langford (2000)
    [S05] Spectral Methods for Dimensionality Reduction, Lawrence Saul
    [BWG10] Label Embedding Trees for Large Multi-Class Tasks, Samy Bengio, Jason Weston, and David Grangier (2010)
    [Z08] Semi-Supervised Learning Literature Survey, Jerry Zhu
    [S10] Active Learning Literature Survey, Burr Settles
    [LBM06] Principled Hybrids of Generative and Discriminative Models, Julia A. Lasserre, Christopher M. Bishop, and Thomas P. Minka
    More to come..

    Useful Links and Softwares

    - Related courses from some other universities: CMU, Stanford, UPenn, MIT, Berkeley; Fall 2009 offering from Utah

    There are a number of machine learning libraries and softwares publicly available which you may find useful for your class projects (or otherwise):

    - scikit-learn: Python based. Integrates several machine learning algorithms into Python's scientific packages such as SciPy, NumPy, etc.
    - Torch: C++ based machine learning library
    - Weka: Java based machine learning and data mining library
    - Spider: MATLAB based (with object oriented support) machine learning library
    - Statistical Pattern Recognition Toolbox: Collection of machine learning algorithms implemented in MATLAB
    - MATLABAresnal: MATLAB based toolbox implementing several classification algorithms
    - libSVM: A very efficient library for SVMs
    - SVM-light: Another very efficient library for SVMs
    - MegaM: Optimization software for maximum entropy models, uses conjugate gradient for binary/binomial problems and LM-BFGS for multiclass problems
    - FastDT: Very fast decision tree learner that implements bagging and boosting
    - Summer schools in machine learning

    Course Policies

    Cheating: Any assignment or exam that is handed in must be your own work. However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else's solution, you are cheating. If you let someone else copy your solution, you are cheating. If someone dictates a solution to you, you are cheating. Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an E in the course and referred to the University Student Behavior Committee. Please don't take that chance - if you're having trouble understanding the material, please let us know and we will be more than happy to help.

    ADA: The University of Utah conforms to all standards of the Americans with Disabilities Act (ADA). If you wish to qualify for exemptions under this act, notify the Center for Disabled Students Services, 160 Union.

    College guidelines: Document concerning adding, dropping, etc. here.