Visualizing Statistical Power
Both alpha and beta are the probability of making an error, but they assume two different realities:
- Alpha is the probability that you will decide that a difference between group means exists in the population, when the reality is that there is no such difference.
- Beta is the probability that you will decide that no difference between group means exists in the population, when the reality is that there is at least one such difference.
A Basic Analysis
To visualize statistical power, it helps to show the distribution of your test statistic in each sort of reality: no difference versus at least one difference. We start here with a simple situation. Suppose that you have developed a new medication that you believe lowers "bad" cholesterol levels. You randomly select and randomly assign 20 people to each of two groups, a treatment group that takes your medication and a comparison group that takes a placebo.
After one month of treatment, you get cholesterol levels from each of the 40 participants, calculate the mean cholesterol level of each group, and subtract the treatment group's mean from the comparison group's mean.
You expect that the treatment subjects will have a lower cholesterol level than the comparison subjects. Therefore, you also expect that subtracting the treatment mean from the comparison mean will result in a positive number.
Now, there are two possible realities that your hypotheses describe:
- Your null hypothesis is that in the populations from which you took your samples, the mean cholesterol level for the population that (hypothetically) takes your medication is the same as the mean cholesterol level of the population that (hypothetically) takes a placebo.
- Your alternative hypothesis is that the hypothetical treatment population has a lower mean cholesterol level than the hypothetical placebo population.
These two states of nature show up in Figure 1.
Figure 1 The curve on the left represents the no-difference reality. The curve on the right represents the different-means reality.
The two populations might really have the same mean cholesterol level at the end of the experiment. In that case, doing the same experiment many, many times would tend to result in a mean difference of zero, or close to it. Some replications of the experiment would result in a positive difference, and some a negative difference, simply due to sampling error.
No Difference Between Population Means
If you repeated the experiment many times when the population means did not differ, and plotted the results, you would get a curve like the one on the left in Figure 1. The mean of that curve would be zero, because the two populations have the same mean cholesterol level, but sampling error would cause some results smaller than zero and some larger than zero.
If you had adopted an alpha level of 5%, you would reject the null hypothesis, when it is true, 5% of the time. This comes about because sampling error causes some of the mean differences to be so large (greater than 13, the critical value associated with alpha in this case) that it is not sensible to conclude that the null hypothesis is true. Because you can get results like that even when there is no difference between the population means, it's an error to conclude that a population difference exists–by tradition, it's called a Type I error. The probability that it will occur is called alpha, symbolized as α.
Actual Difference Between Population Means
What if the two populations really had different cholesterol levels after being treated with either your medication or a placebo? Then it might be that the comparison population has a cholesterol level that's about 8 points higher than the treatment population. Over repeated, hypothetical replications of the experiment, the mean difference between the sample means, comparison minus treatment, would tend to be 8 or close to it.
But some replications of the experiment, when there is a difference of 8 points in the populations, would return a difference of more than 8 points and some considerably less—perhaps only 1 or 2 points.
In the end, if you charted the results, you would likely get a curve much like the one on the right in Figure 1. Its mean would be 8 because the difference between the population means in that reality is 8. Some replications of the experiment would have a mean difference smaller than zero, and others would have a mean difference larger than 13.
A statistical test has power only when the null hypothesis is false, and therefore when the alternative hypothesis is true. In the situation that Figure 1 depicts, you would reject the null hypothesis if the result of your experiment were that the treatment group had a cholesterol level at least 13 points lower than the comparison group.
As shown in Figure 1, that could happen less than half the time even if the population mean for the treatment group is as much as 12 points lower than the placebo group. This experiment has relatively low statistical power. If you knew that beforehand, you might not go to the trouble and expense of running the experiment as it's designed. You have less than a 50-50 chance of concluding that the medication makes a difference, even if it actually does.
The fact is that sometimes you're going to decide that a treatment makes no difference, even though omniscience would tell you otherwise. That decision would be an error—by tradition, it's known as a Type II error. The probability that it will occur is called beta, symbolized as β.