MLRG/fall10
From ResearchWiki
Semisupervised and Active Learning
Fri 2:00-3:20pm
MEB 3105
Contents |
Synopsis
Supervised learning algorithms usually require a good amount of labeled data in order to learn a reliable model. Since getting large quantities of labeled data can be expensive and/or difficult, much effort in machine learning has been devoted on coming up with ways to learn with a limited amount of labeled data. There are many ways of doing this. Two very important paradigms we will be looking at in this seminar are (1) semi-supervised learning which involves augmenting a small amount of available labeled data with a large amount of additional unlabeled data (which is usually very easy to obtain), and (2) active learning which involves judiciously selecting the most informative/useful labeled examples to be given to a supervised learning algorithm. In this seminar, we will be looking at some representative papers from both these paradigms. As it will not be possible to cover all important papers in a single seminar, for those interested, a bunch of papers will be added under the suggested readings.
Participants
- Piyush Rai, PhD Student, School of Computing
- Suresh Venkat, Asst. Prof, School of Computing
- Ruihong Huang, PhD Student, School of Computing
Schedule
(subject to change; * means will probably need a rescheduling)
| Date | Topic | Outline and Paper(s) | Presenter |
|---|---|---|---|
| Sep 3 | Outline, Motivation | Seminar logistics. Brief introduction to semisupervised learning (section 1 - FAQ - of this survey), and active learning (section 1 of this survey) | Piyush |
| Semisupervised Learning | |||
| Sep 10 | Bootstrapping/weak-supervision | Combining Labeled and Unlabeled Data with Co-Training | |
| Sep 17 | Low density regions and the cluster assumption for SSL | Semi-Supervised Classification by Low Density Separation, (also see section 5 of the SSL survey for other methods and further references) | |
| Sep 24 | Imposing function smoothness: Graph based SSL | A geometric framework for learning from labeled and unlabeled examples, (also see section 6 of the SSL survey for other methods and further references) | |
| Oct 1 | Probabilistic approaches: Expectation Maximization for SSL | Semi-Supervised Text Classification Using EM | |
| Oct 8 | Using unlabeled data to learn predictive functional structures | A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data | |
| Oct 22 | Harnessing unlabeled test data: Application to ranking | Learning to Rank with Partially-Labeled Data | |
| Oct 29 | Semi-supervised Learning Theory | An Augmented PAC Model for SemiSupervised Learning | |
| Nov 5 | Semi-unsupervised Learning (Clustering/Dimensionality Reduction) | Integrating constraints and metric learning in semi-supervised clustering, Semi-Supervised Dimensionality Reduction | |
| Active Learning | |||
| Nov 12 | Pool-based active learning, Query by committee, Query by uncertainty | Support Vector Machine Active Learning with Applications to Text Classification, Active Learning survey (sections 3.1 and 3.2) | |
| Nov 19 | Stream-based active learning | Worst-Case Analysis of Selective Sampling for Linear Classification | |
| *Nov 26 | Dealing with sampling bias and using cluster-structure for active learning | Hierarchical Sampling for Active Learning | |
| Dec 3 | Semi-supervised learning and active learning | Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions | |
| Dec 10 | Multiview active learning | Active Learning with Multiple Views | |
Suggested Readings
Will be updated with more papers.