Charting Numeric Variables in Excel
Several chart types in Excel lend themselves beautifully to the visual representation of numeric variables. This book relies heavily on charts of that type because most of us find statistical concepts that are difficult to grasp in the abstract are much clearer when they’re illustrated in charts.
Charting Two Variables
Earlier this chapter briefly discussed two chart types that use a category variable on one axis and a numeric variable on the other: Column charts and Bar charts. There are other, similar types of charts, such as Line charts, that are useful for analyzing a numeric variable in terms of different categories—especially time categories such as months, quarters, and years. However, one particular type of Excel chart, called an XY (Scatter) chart, shows the relationship between exactly two numeric variables. Figure 1.8 provides an example.
Figure 1.8 In an XY (Scatter) chart, both the horizontal and vertical axes are value axes.
The markers in an XY chart show where a particular person or object falls on each of two numeric variables. The overall pattern of the markers can tell you quite a bit about the relationship between the variables, as expressed in each record’s measurement. Chapter 4, “How Variables Move Jointly: Correlation,” goes into considerable detail about this sort of relationship.
In Figure 1.8, for example, you can see the relationship between a person’s height and weight: Generally, the greater the height, the greater the weight. The relationship between the two variables differs fundamentally from those discussed earlier in this chapter, where the emphasis is placed on the sum or average of a numeric variable, such as number of vehicles, according to the category of a nominal variable, such as make of car.
However, when you are interested in the way that two numeric variables are related, you are asking a different sort of question, and you use a different sort of statistical analysis. How are height and weight related, and how strong is the relationship? Does the amount of time spent on a cell phone correspond in some way to the likelihood of contracting cancer? Do people who spend more years in school eventually make more money? (And if so, does that relationship hold all the way from elementary school to post-graduate degrees?) This is another major class of empirical research and statistical analysis: the investigation of how different variables change together—or, in statistical jargon, how they covary.
Figure 1.9 A trendline graphs a numeric relationship, which is almost never an accurate way to depict reality.
The diagonal line you see in Figure 1.9 is a trendline. It is an idealized representation of the relationship between men’s height and weight, at least as determined from the sample of 17 men whose measures are charted in the figure. The trendline is based on this formula:
- Weight = 5.2 * Height – 152
Excel calculates the formula based on what’s called the least squares criterion. You’ll see much more about this in Chapter 4.
Suppose that you picked several—say, 20—different values for height in inches, plugged them into that formula, and then used the formula to calculate the resulting weight. If you now created an Excel XY chart that shows those values of height and weight, you would get a chart that shows a straight line similar to the trendline you see in Figure 1.9.
That’s because arithmetic is nice and clean and doesn’t involve errors. The formula applies arithmetic which results in a set of predicted weights that, plotted against height on a chart, describe a straight line. Reality, though, is seldom free from errors. Some people weigh more than a formula thinks they should, given their height. Other people weigh less. (Statistical analysis terms these discrepancies errors or deviations.) The result is that if you chart the measures you get from actual people instead of from a mechanical formula, you’re going to get a set of data that looks like the somewhat scattered markers in Figures 1.8 and 1.9.
Reality is messy, and the statistician’s approach to cleaning it up is to seek to identify regular patterns lurking behind the real-world measures. If those real-world measures don’t precisely fit the pattern that has been identified, there are several explanations, including these (and they’re not mutually exclusive):
- People and things just don’t always conform to ideal mathematical patterns. Deal with it.
- There may be some problem with the way the measures were taken. Get better yardsticks.
- Some other, unexamined variable may cause the deviations from the underlying pattern. Come up with some more theory, and then carry out more research.