## This is how to do Fisher's exact test. First construct a 2x2 contingency ## table (this is a perfect result for the lady drinking tea) (C = matrix(c(4,0,0,4), 2, 2)) ## Now run Fisher's test on the table. alternative = "greater" option means that ## we are testing the probability of getting an equal or even better table by ## random luck (i.e., one with higher numbers on the diagonal of the table) fisher.test(C, alternative = "greater") ## Notice this is the same as the probability of getting exactly this table by ## random luck, because there is no better table. There are "8 choose 4" ## possible guesses, so the probability of the correct guess is: (p = 1 / choose(8, 4)) ## Now try if she misclassifies two cups (C = matrix(c(3,1,1,3), 2, 2)) fisher.test(C, alternative = "greater") ## And now 4 misclassified cups (C = matrix(c(2,2,2,2), 2, 2)) fisher.test(C, alternative = "greater") ## Notice this is just 1 - p, where p was the probability above of a perfect answer ## This is because there is exactly one combination that is worse (C = matrix(c(1,3,3,1), 2, 2)) fisher.test(C, alternative = "greater") ## Everything wrong! Notice the probability of doing this well or better is 1. (C = matrix(c(0,4,4,0), 2, 2)) fisher.test(C, alternative = "greater") ## Testing the sample mean from a Gaussian RV ## Let's test the hypothesis that the Michelson-Morley data was on average ## higher than the true speed of light ## Null hypothesis, H0 : mu = "true speed" ## Alternative hypothesis, H1 : mu > "true speed" ## True speed of light (in km/s minus 299,000 km/s) trueSpeed = 792.458 ## Let's test the sample mean from just the 4th run (this was the closest to ## correct) x = morley$Speed[morley$Expt == 4] (sampleMean = mean(x)) sampleSigma = sd(x) n = length(x) ## Here is the critical value for an alpha = 0.05 significance level (criticalValue = trueSpeed + sampleSigma / sqrt(n) * qt(1 - 0.05, df = n - 1)) ## We reject the null hypothesis if the sampleMean is greater than this (it is) (sampleMean > criticalValue) ## Use a one-sided t test to get a p-value (tStat = (sampleMean - trueSpeed) / (sampleSigma / sqrt(n))) (pValue = 1 - pt(tStat, df = n - 1)) ## We reject the null hypothesis if this p-value is < 0.05 (our significance ## level). Note: the final answer (reject or don't reject) is equivalent to the ## critical value test (pValue < 0.05) ## All of these steps can be verified with a single R command. (This is what you ## would use in practice if you did not want to do all of the individual steps) t.test(x - trueSpeed, alternative = "greater") ## Testing a sample proportion from a Bernoulli RV ## Let's test the hypothesis that Obama was below 50% in Florida. ## We'll use the last Rasmussen poll, which had him at 48%. ## The sample size was 750. sampleMean = 0.48 n = 750 sampleSigma = sqrt(0.48 * (1 - 0.48)) ## Null hypothesis, H0 : p = 0.5 ## Alternative hypothesis, H1 : p < 0.5 ## Let's use the t test again on the sample mean ## Same procedure as above (just notice the sign change because we are testing ## the hypothesis that it is less) (criticalValue = 0.5 - sampleSigma / sqrt(n) * qt(1 - 0.05, df = n - 1)) ## Also the test is now if we are less than the critical value (sampleMean < criticalValue) ## Same pValue procedure as above, but now with the "left tail" ## (notice the "1 - " goes away) (tStat = (sampleMean - 0.5) / (sampleSigma / sqrt(n))) (pValue = pt(tStat, df = n - 1)) (pValue < 0.05)