Probability Overview


Probability is the likelihood of an event occurring.  This site discusses the following topics:

          Random variables

          Joint and conditional distributions

          Invariance

          Bayes’ rule

          Independence

 

A probability is writen in the following manner: P(R).  This is read as the probability of the event R.

 

Random variables – an aspect of the world that is uncertain, denoted by capital letters.

          Random variable can have a domain, or range of possible values.

          For true/ false, r = true, ¬r = false or not true

 

Basic Properties of Probability

: event A or B occurs, or both

* : both events A and B occur

* : event A happens, but not B

 : event A does not happen, S is the space that contains all possibilities

 

Additional Properties

If  then

 

Joint and conditional distributions

Probabilistic models will have a variety of likely outcomes.  Joint distributions are likelihoods of those outcomes occurring.  Joint distributions of probabilistic models are normalized to 1.  In other words, the probabilities of the different outcomes add up to 1.

 

An example of this is as following:  A survey is performed over a group of people over what time of food they like.  From the survey, if a person is picked at random he is likely to prefer the following type of food an any given night with the following distribution:

 

 

 

 

Should a probability have multiple factors, it is written as P(F, G). This is read as the probability of F and G. 

 

 

Many probabilities have multiple conditions or factors.  An example of this is as follows: it is more likely to be good skiing if it snowed heavily last night, it is clear, sunny, and a regular work day today.  A distribution of probabilities that contains multiple factors or conditions is called a conditional distribution.  Conditional Distributions are probability distributions that have dependencies. Continuing the previous example:

 

 

 

 

One can query the table for a conditional probability like P(p|m).  This is read as the probability of pizza given the person is male.  In order to do so, the following equation is used:

 

 

The last part of the equation, , is the probabilities summed over the domain of A.  To apply this equation to the given example, P(p, m) = .36 will be divided by the sum over the domain of F, P(p, m) + P(h, m).  Or, P(p, m) will be divided by the sum of P(p, m) = .36 and P(h, m) = .24.

 

 

 

 

 

Marginalization

Joint distributions can be marginalized, or broken up into smaller joint distributions. This is done my summing out unwanted variables.

 

 

 

 

Normalization

Should a table of conditional probabilities be desired, do the following:

 

 

 

 

Normalization works by the same equation as querying over conditional probabilities,

 

 

 

Inference

Inference with joint distributions is simply looking up the probabilities in the distributions table then summing, multiplying and dividing the numbers to achieve the desired result. Finding P(p | m) from above, repeated below, is an example of inference.

 

 

 

 

Another example is P(h | f):

 

 

 

 

Product Rule

Should one have a table of conditional probability but desires joint distributions, do the reverse as above.

 

 

 

 

 

 

When multiplying joint distributions together the different variables have to matched up,

P(male)*P(pizza | male) = P(pizza, male),

P(male)*P(hamburgers | male) = P(hamburgers, male), etc.

 

 

Bayes’ Rule

Should one conditional distribution be known and another desired, Bayes’ Rule is an effective method of obtaining the desired conditional distribution.

 

 

 

 

Independence

Two variables are independent if they obey .  If this is the case then, , A is independent of B.  In the given example, Food is independent of Gender.  It does not matter if the person is a male or not, pizza was more likely to be chosen over hamburgers. 

 

True independence of variables is rare. When independence exists, it can be used with marginalization to store several smaller distributions.  This speeds up calculation time and shrinks the necessary storage needed.

 

While true independence of variables may be rare, conditional independence is a little more common.  A conditional independence is where two variables are independent if a third variable is known.  Mathematically this is

 

Extending the given example, Food and Gender are independent if it is the weekend.

 

 

 

 

Chain Rule

A joint distribution can be represented as a product of conditional distributions

 

 

The following is how the chain rule can be used to express the given example:

 

 

                                                              

 

 

 

For more information please see the following

 

http://www.cs.utah.edu/~hal/courses/2009S_AI/cs5300-day14-probability.pdf

 

Please also see Chapter 13 sections 1 through 6 of Artificial Intelligence: a Modern Approach (Second Edition) by Stuart Russell and Peter Norvig.