The Concept of Statistical Power
When you undertake a true experiment, you often make a random selection of potential subjects from a population that interests you, and assign them at random to one of two or more groups. Often, those groups might be a treatment group and a control group, or they might be two or more treatment groups and a control group.
When sampling error in establishing your groups causes you to conclude that your treatments have a reliable, replicable effect on the population when in fact they don't, it's called "Type I error." You can quantify the probability of making a Type I error, and that probability is often called "statistical significance" or alpha (symbolized as α).
There's another sort of error, conceptually similar to Type I error. It is the error that you make when your experimental results lead you to conclude that your treatments will have no effect if applied to the population, when in fact they would. You can also quantify this "Type II" error and determine the probability that it will occur. That probability is often called beta and symbolized as β. The probability of not making a Type II error, 1 - β, is called statistical power.
You are reading the first in a series of four articles that discuss statistical power, and how to quantify and visualize it using Microsoft Excel. I suggest that you read this paper first. Then, if you wish, continue (in order) with The Statistical Power of t-Tests, The Noncentrality Parameter in the F Distribution, and Calculating the Power of the F Test.
Controlling the Risk
The probability of both a Type I and a Type II error is determined by several factors, including the size of the samples you take, the size of the differences between the group means, and the size of the standard deviation of the outcome measure.
With that information in hand, you can use Excel to calculate the probability of not making a Type II error. That probability is called statistical power and is equal to 1 – β. It refers to the sensitivity of your statistical test: its ability to detect a true, replicable difference between a treatment group and a comparison group. If your statistical test won't do that, you will make a Type II error, with probability β. The smaller that you can make β, the greater you can make 1 – β, and the greater your test's statistical power.
Statistical power—or simply power—is a matter of great concern when you're designing experiments, for a variety of reasons. Two of the most important factors that determine power are described next.
Directional and Nondirectional Hypotheses
The type of alternative hypothesis you choose affects power. You might choose a nondirectional hypothesis (for example, "We hypothesize that our treatment group will have a different mean than our control group."). Or you might choose a directional hypothesis (for example, "We hypothesize that our treatment group will have a higher mean than our control group.")
Your choice of a nondirectional instead of a directional hypothesis can easily change your experiment's statistical power from, say, 80% to 40%. You would go from recognizing real treatment effects in 80% of repeated experiments to recognizing them 40% of the time.
Changing the Sample Size
The size of the sample you take also affects power. There will usually be an optimum sample size for a desired level of power. An analysis of the power of a statistical test can tell you when you have chosen too small a sample (so, your statistical power might be only 20% and you would miss too many real treatment effects).
Equally important, it can tell you when you are about to select too large a sample. It may be that you are planning on 50 subjects per group, and that would get you to 90% power. A power analysis could show that you would have 85% power if you cut the sample size in half and used only 25 subjects per group. You might easily decide that the larger group sizes would waste resources when the gain in statistical power is only 5%.