Talk:MLRG/fall08
From ResearchWiki
(Difference between revisions)
(→29 Aug) |
|||
| Line 27: | Line 27: | ||
:: (hal) It's basically chain rule + independence assumptions. Order the variables topologically and apply the chain rule. Now, remove from the conditioning side anything that is not a parent. Viola. | :: (hal) It's basically chain rule + independence assumptions. Order the variables topologically and apply the chain rule. Now, remove from the conditioning side anything that is not a parent. Viola. | ||
::: (sam) Where does the independence assumption come into play? Isn't the topology of the graph implying dependencies? In the directed case a node depends on it's parents, in the undirected case a node depends on it's cliques and the compatibility functions on them. | ::: (sam) Where does the independence assumption come into play? Isn't the topology of the graph implying dependencies? In the directed case a node depends on it's parents, in the undirected case a node depends on it's cliques and the compatibility functions on them. | ||
| - | + | :::: (John) If I understand correctly, you'd get the chain rule if you had a graph where every vertex had as parents each vertex that preceded it topologically, thus implying ''maximal'' dependency (because that's a maximal DAG). Removing edges from such a graph allows you to remove terms from the individual factors in the product. I have to admit that I'm a still a little confused about the clique relationships (though Hal indicated that more clarity on cliques in GM's was forthcoming). | |
: Is <math>p(\vec x)</math> the probability of all the vertices' r.v.'s matching the values in <math>\vec x</math>? (i.e., does <math>[x_s] = \vec x</math>?) | : Is <math>p(\vec x)</math> the probability of all the vertices' r.v.'s matching the values in <math>\vec x</math>? (i.e., does <math>[x_s] = \vec x</math>?) | ||
:: (hal) <math>\vec x</math> is just the collection of all the rvs. | :: (hal) <math>\vec x</math> is just the collection of all the rvs. | ||
::: (John) Ok. I think that's what I was getting at. I was getting hung up on notation a little. | ::: (John) Ok. I think that's what I was getting at. I was getting hung up on notation a little. | ||
Revision as of 19:52, 30 August 2008
29 Aug
Hal
- This is just an example of a question?
- And another example?
- Hal's alter-ego: The answer is blah.
Seth
- What is the general difference between the MCMC methods and the variational methods presented in this paper?
- (Piyush): On a high level, both MCMC and Variational methods are approximate inference techniques where the goal is to approximate some target probability distribution. The difference lies in the kind of approximations they provide. MCMC gives a sampling based stochastic approximation (i.e. which can vary for different runs) whereas variational inference is an optimization based deterministic approximation (i.e. same across runs). The important thing to remember however is that, in the limit of infinite computational resources, MCMC can produce exact samples (i.e. if the chain is run long enough). So the approximation arises only because of the finite limit on resources. On the other hand, variational methods solve an optimization problem of minimizing the difference between the true distribution and a candidate distribution. Exact optimization is computationally intractable so we usually solve a relaxed problem (thereby yielding an approximate solution).
- On page 3 it mentions a compatibility function
when describing the factorization of a probability distribution in an undirected graphical model. What does this function represent?
- (Piyush): The compatibility function for each clique can be though of as an (unnormalized) probability distribution over the set of nodes in that clique. A high value of the compatibility function, for a certain configuration of the nodes, would indicate that such a configuration is highly likely.
John
- Given the factorization of a directed graph model, it seems obvious to me that it would need to be acyclic. How would a non-acyclic directed graph model work? Would it be anything like the undirected graph model (working with SCC's perhaps)? Or are non-acyclic directed graph models just rare enough that they don't need to be defined?
- (hal) Cyclic (or "non-acyclic"!) graphs definitely exist and they will be a fairly major problem for the simple algorithms! I usually think of cycles as a sort of "feedback loop." However, in practice, it's fairly rare that we end up with cyclic (aka "loopy") directed models. Almost all examples of loopy models that we'll see are undirected.
- (John) Ok, fine, I'll just call them "loopy." "Cyclic" implies exclusivity to me. :-p
- (hal) Cyclic (or "non-acyclic"!) graphs definitely exist and they will be a fairly major problem for the simple algorithms! I usually think of cycles as a sort of "feedback loop." However, in practice, it's fairly rare that we end up with cyclic (aka "loopy") directed models. Almost all examples of loopy models that we'll see are undirected.
Suresh
- Why is the undirected graphical model formulated in terms of cliques instead of in terms of neighbors as the directed model is ?
- (Piyush) In the parent-child relationship of directed graphical models, each term in the factorized distribution represents the probability of a node given all its parents. In undirected graphical models, however, we don't have such parent-child relationships. Thus we define it in terms of a set of cliques, with each clique representing a probability distribution of how likely a particular configuration of the set of nodes in that clique is. Also, during factorization, we use maximal cliques since smaller cliques must be subsets of the maximal cliques.
John
- Factorizing distributions: this seems to be (especially in the directed case) a variation of the chain Bayes' rule. Is my characterization very far off?
- (hal) It's basically chain rule + independence assumptions. Order the variables topologically and apply the chain rule. Now, remove from the conditioning side anything that is not a parent. Viola.
- (sam) Where does the independence assumption come into play? Isn't the topology of the graph implying dependencies? In the directed case a node depends on it's parents, in the undirected case a node depends on it's cliques and the compatibility functions on them.
- (John) If I understand correctly, you'd get the chain rule if you had a graph where every vertex had as parents each vertex that preceded it topologically, thus implying maximal dependency (because that's a maximal DAG). Removing edges from such a graph allows you to remove terms from the individual factors in the product. I have to admit that I'm a still a little confused about the clique relationships (though Hal indicated that more clarity on cliques in GM's was forthcoming).
- (sam) Where does the independence assumption come into play? Isn't the topology of the graph implying dependencies? In the directed case a node depends on it's parents, in the undirected case a node depends on it's cliques and the compatibility functions on them.
- (hal) It's basically chain rule + independence assumptions. Order the variables topologically and apply the chain rule. Now, remove from the conditioning side anything that is not a parent. Viola.
- Is
the probability of all the vertices' r.v.'s matching the values in
? (i.e., does
?)
- (hal)
is just the collection of all the rvs.
- (John) Ok. I think that's what I was getting at. I was getting hung up on notation a little.
- (hal)