Data Mining Seminar : Sampling
Instructor : Jeff Phillips
Spring 2012 | Wednesday 1:25 pm - 2:45 pm
Location : ??? (hopefully LCR)
Catalog number: CS 7941 01 ???


Description:
One of the most obvious ways to deal with the "big data" problem is to simply sample the "big" data set to create a smaller one. Then we can run our algorithms/analysis on the smaller data set, which will complete much quicker. This raises three obvious questions: This seminar will answer these questions, covering classic results as well as fascinating recent approaches.

This 1-credit seminar will meet once a week and be student driven; each student will be responsible for giving 1 lecture in class (depending on class size). Topics will include (this list is not exclusive and is subject to change):
Schedule:
Date Topic Speaker
Wed 8.22 Overview Jeff Phillips
Wed 8.29 --- Travel Day ---
Wed 9.05 Accuracy: Summaries (e.g. eps-samples and eps-nets)
Wed 9.12 Accuracy: Importance ( WP ) and Rejection ( WP ) Sampling ( notes )
Wed 9.19 Markov Chains: Definitions and Introduction ( WP, Chapter 1 )
Wed 9.26 Markov Chains: Rapidly Mixing and Convergence Analysis ( Chapter 4 )
Wed 10.03 Markov Chains: Metropolis ( WP, for ML, Chapter 10 ) and Gibbs ( WP, Chapter 6 ) Sampling ( notes, BUGS )
Wed 10.10 (Fall Break - No Class)
Wed 10.17 Markov Chains: Advanced Sampling ( WP, tempered analysis )
Wed 10.24 Markov Chains: Coupling from the Past ( WP, Java App, Chapter 22, more )
Wed 10.31 Efficiency: Reservior Sampling + Beyond ( WP, Vitter, blog )
Wed 11.07 Efficiency: Variance-Optimal Sampling ( Priority Sampling, VarOpt Sampling )
Wed 11.14 Efficiency: Precision Sampling ( in stream ) Suresh Venkatasubramanian
Wed 11.21 Beyond Random: Discrepancy ( Chazelle, Matousek )
Wed 11.28 Beyond Random: Hyperbolic Cosine + other discrepancy-based heuristics
Wed 12.05 Beyond Random: Structure-aware VarOpt Sampling ( range, stream )



Description:
Most resources will be linked directly from each weeks row in the above schedule. However, some great references are available on the more general topics of Markov Chains.
  • A First Course in Bayesian Statistical Analysis by Peter Hoff.
    Provides a nice view of practical MCMC sampling for Bayesian statistics.
  • Markov Chains and Mixing Times by David A. Levin and Yuval Peres and Elizabeth L. Wilmer.
    Provides a nice overview of the theory behind Markov Chains.