MLRG/fall08
From ResearchWiki
(→Schedule and Readings) |
m |
||
| Line 124: | Line 124: | ||
| F 7 Nov | | F 7 Nov | ||
|| Semi-definite relaxations (WJ 9) | || Semi-definite relaxations (WJ 9) | ||
| - | || | + | || John M. (CS) |
|- | |- | ||
| F 14 Nov | | F 14 Nov | ||
Revision as of 01:56, 30 August 2008
Contents |
CS7941: Inference in Graphical Models
Time: Fri 2:00-3:20pm (see schedule for each week)
Location: MEB 3105
Abstract
Graphical models (both Bayesian networks and Markov random fields) are a convenient method for encoding and reasoning about probability distributions. This seminar will focus initially on representational issues and then move to computational methods for inference in graphical models. Our interest will primarily be on approximate inference, since exact inference in most models is intractable. The focus will be on non-Bayesian approaches, though many things we discuss are applicable to Bayesian problems as well.
Students with a working knowledge of probability are well suited for this seminar. Each student should expect to present at least one week during the semester.
The rough plan is to begin with classical approaches to graphical models, including exact inference via the junction tree algorithm. We will quickly move on to approximate methods, focusing mostly on message passing and linear programming approaches.
The structure of the semester will be roughly as follows:
- Introduction to Bayes' nets and graphical models: semantics, variable elimination, sum/max product algorithm, and factor graphs
- Computational complexity: junction trees, triangulations and tree-width, belief propagation
- Exponential families: maximum entropy principle, cumulant generating functions, conjugate duality
- Variational inference: exact inference on trees, mean-field and structured mean-field, inner bounds of the marginal polytope
- Statistical physics: Bethe free energy, log-determinant relaxation and outer bounds of the marginal polytope
- Linear programming methods: tree-reweighted message passing, more outer bounds
This is a lot of material. Things will be dropped or skimmed depending on time and interest levels. Students should only come if they are prepared to do the required reading before the seminar.
Expected Student Involvement
Presentations will be given by pairs of students. The logistics of this will depend somewhat on the make-up of the participants, but the goal would be to pair CS students with non-CS students in most cases. Sign-ups will happen quickly, so please take care to make sure you're signed up for at least one week, though probably two. The expectations, with timing, are:
- At least 3 days before a presentation, the presenters should meet with Hal for one hour to discuss the material and decide what to present.
- By class time, everyone should have submitted at least 2 questions to the Wiki for discussion. (Click on the "discussion" tab along the top of the wiki to enter your questions.)
- Everyone should actively participate in the discussions.
If you're interested in participating in the programming parts, please visit the programming subpage.
Participants
- Hal Daumé III, Assistant Professor, School of Computing
- Piyush Rai, PhD Student, School of Computing
- Nathan Gilbert, PhD Student, School of Computing
- Suresh Venkatasubramanian, Assistant Professor, School of Computing
- Siddharth Patwardhan, PhD Student, School of Computing
- Ruihong Huang, PhD Student, School of Computing
- John Moeller, Non-Matriculated Student, School of Computing
- Zhan Wang, PhD Student, School of Computing
- Amit Goyal, PhD Student, School of Computing
- Zheng Cai, PhD Student, Department of Biomedical Informatics
- Stephen Piccolo, PhD Student, Department of Biomedical Informatics
- Avishek Saha, PhD Student, School of Computing
- Jiarong Jiang, PhD Student, School of Computing
- Arvind Agarwal, PhD Student, School of Computing
- Jagadeesh J, PhD Student, School of Computing
- Anthony Wong, PhD Student, Department of Biomedical Informatics
- Seth Juarez, PhD Student, School of Computing
- Shuying Shen, PhD Student, Department of Biomedical Informatics
- Jeffrey Ferraro, PhD Student, Department of Biomedical Informatics
Schedule and Readings
Most of our initial readings will be from the technical report Graphical models, exponential families, and variational inference by Martin Wainwright and Mike Jordan. I know this tome is a bit daunting, but it's not as bad as it looks at first glance! We refer to this in the readings as WJ.
| Date | Papers | Presenters |
|---|---|---|
| Introduction to Bayes' nets and graphical models | ||
| F 29 Aug | Introduction, semantics and applications (WJ, 1-2.4) | Hal |
| F 5 Sep | Factor graphs, sum-product and max-product (WJ, 2.5-2.5.1; KFL01, 1,2,5) | Piyush (CS), Zheng(BI) |
| F 12 Sep | Junction tree algorithm and triangulations (WJ, 2.5.2; HD96, 4) | Nathan (CS), Stephen (BI) |
| Variational Inference | ||
| F 19 Sep | Math camp: exponential families and conjugate duality (WJ, 3) | Amit |
| F 26 Sep | Variational principle, exact inference on trees (WJ, 4) | Arvind |
| F 3 Oct | Mean-field and structured mean-field (WJ, 5) | Amit |
| Statistical Physics | ||
| F 10 Oct | Bethe free energy and sum-product (WJ, 6) | Anthony |
| F 24 Oct | Kikuchi free energy and generalized sum-project (WJ 7) | |
| Convex Relaxations | ||
| F 31 Oct | Convex relaxations (WJ 8) | Avishek (CS) |
| F 7 Nov | Semi-definite relaxations (WJ 9) | John M. (CS) |
| F 14 Nov | Mode computations (WJ 10) | |
| F 21 Nov | Exact max-product and outer bounds (GJ07) | Ruihong(CS) |
| F 5 Dec | Outer bounds on the marginal polytope (SJ07 and SMGJW08) | |
Paper Summaries
29 Aug: Introduction, semantics and applications (Hal)
High-level summary: graphical models = graph theory + probability theory.
Applications:
- Hierarchical Bayesian Models. Almost all hierarchical Bayesian models can be (and often are!) represented by directed graphs, with nodes as r.v.s and edges denoting conditional dependence. The standard problems in Bayesian modeling (computing marginal likelihoods, computing expectations, etc.) are graphical model inference problems. (Cautionary note: most hierarchical Bayesian models involve rvs with continuous density; the stuff we'll talk about this semester doesn't really address this. Our focus is basically entirely on discrete rvs, only.)
- Bioinformatics and Language: Biology problems and language problems can often be seen as inference problems over relatively simple structures (like sequences or trees). These are naturally encoded as, say, HMMs or PCFGs, which are amenable to graphical model analysis.
- Image Processing: A standard image representation is as a 2D lattice (undirected!) model, with the nodes representing pixels.
- Error-correcting Coding: We won't talk about this, but if you're interested, see David MacKay's Information Theory, Inference and Learning Algorithms book.
Background:
In most of the applications listed above, the underlying graph structures are relatively "dense." This is not promising, because non-probabilistic algorithms on graphs typically have trouble with dense graphs. It turns out (as we shall see soon) that the same is true in graphical model inference. The denser the graph, the harder (computationally!) the inference problem. For most real world problems, we're talking NP-hard.
Since these problems are NP-hard (and usually the really bad variety of NP-hard), we're going to go for approximations. One standard approximation method is to construct an MCMC algorithm. We won't discuss MCMC here (perhaps some other semester, or perhaps take one of the Stochastics classes in the Math department.
The approach we take instead is variational: that is, we attempt to convert the inference problems into optimization problems, and solve the optimization problems. The optimization problem will, of course, also be computationally intractable. However, by relaxing the optimization problem, we will obtain an (efficient) approximate solution, which will lead to an (efficient) approximate solution to the inference problem.
Graphical Models:
Let G = (V,E) be a graph. We will associate each vertex
with a random variable
. The graphical model consists of a collection of probability distributions over the random variables defined by the vertices of the graph that factorizes according to the edges. (Notation: if
, define
. There are two types of graphical models, just as there are two types of graphs:
- Directed graphical models: here, let π(s) be the set of parents of a node and then we have
- Undirected graphical models: here, the distribution factorizes according to functions defined on the cliques of the graph (aka fully-connected subsets of V). For every clique C, let ψC be some function that maps the values of the random variables in C (i.e., xC) to a positive real value. Then,
.
Inference Problems:
Given a probability distribution
defined by a graphical model, we might want to:
- compute the likelihood of some observed data (observed data = fixed values of a subset of the rvs)
- compute the marginal distribution p(xA) for some fixed
(test question: why not
?)
- compute the conditional distribution p(xA | xB) for disjoint A,B with
(same question!)
- compute the mode of the density; i.e., solve
Note that the first three are essentially the same: they involve one or more marginalizations. The last one is fundamentally different.