Discrete Distributions in R

The “seq” command creates a sequence of numbers. Lets do that with integers from 0 to 20.

x = seq(0, 20)

If you ever want to read the help file for a command, type a question mark followed by the command name. For example:

?seq

R has built in functions for the pdf or cdf for many different types of random variables. A prefix of ‘d’ is for the density (or pdf), and the prefix of ‘p’ is for the cdf. Here is an example of the binomial distribution pdf:

dbinom(5, 10, 0.5)
## [1] 0.2460938

The 5 is the ‘k’ variable (number of successes), the 10 is the ‘n’ variable (number of trials), and the 0.5 is the ‘p’ variable (probability of success) Now lets plot the pdf for Bin(20, 0.25):

plot(x, dbinom(x, 20, 0.25))

Notice we are using the ‘x’ sequence we defined above, and plotting all values for x = 0,1,…,20

We can also do fancy things like change the color (using col=“red”) or shape (using pch=16)

plot(x, dbinom(x,20,0.25), col="red", pch=16)

I can also add a line to connect the dots. Is that a good idea?

plot(x, dbinom(x,20,0.25), col="red", pch=16)
lines(x, dbinom(x,20,0.25), col="red")

Similarly we can plot the cdf:

plot(x, pbinom(x, 20, 0.25))

Or with a stair-step function. I like this, since it always makes sense to evaluate a cdf for any value.

plot(x, pbinom(x,20,0.25), type="s")

Note: if you want to plot in a new window, first call “dev.new()”

You might also try out the geometric distribution Notice there is no longer an ‘n’ parameter

x = seq(0,20)
plot(x, dgeom(x, 0.5)) 

plot(x, pgeom(x, 0.5))

Try plotting these distributions with different values of ‘p’ and ‘n’. How does the shape of the pdf change?

Another cool feature is drawing a random element from a distribution. This code draws 5 random samples from a binomial and then a geometric distribution

rbinom(5,20,0.25)
## [1] 4 3 3 4 5
rgeom(5,0.25)
## [1] 4 0 1 0 1

Now let’s try the Monty Hall example where we repeat the experiment 46 times, and each time we use the “switch doors” strategy. This should follow a Bin(46, 2/3) distribution. First let’s plot the pdf and cdf:

x = seq(0, 46) 
plot(x, dbinom(x, 46, 2/3)) 

plot(x, pbinom(x, 46, 2/3))

Now what is the probability that the “switching door” strategy will actually be a losing strategy (that is, we lose more than we win)? We want to know the probability $P(X < 23)$. Since events are discrete and integer, \(P(X < 23) = P(X \leq 22)\). Moreover, we know the cdf of a random variable \(F_X(a) = P(X \leq a)\). We for a binomial random variable with \(2/3\) success probability and \(46\) trials, the probability at most \(22\) are successful is \(F_X(22)\) for \(X \sim Bin(46,2/3)\). As an R command, this is

pbinom(22, 46, 2/3)
## [1] 0.006405318

You should read the help file for all the commands in this example (using ‘?’)