Business Statistics: Visualizing Profit and Performance
- Turning Raw Data into Information
- Examining Relationships with Scatterplots
- Understanding the Types of Variables
- Common Questions About Research Data
In This Chapter
Making data visually understandable
Representing data in useful ways
Displaying data in various charts, graphs, and plots
Understanding the concept of "skew"
If you've been given a list of daily numbers for all the sales in the country over the last two work weeks for your new battery product, you'd have little hope of understanding the numbers by themselves. You need to summarize the numbers in some way. Suppose you have observations of the following sales numbers over ten days (two work weeks) for boxes of batteries sold by your western regional distributors:
49, 37, 89, 63, 65, 55, 66, 104, 41, 66
This list of numbers is an example of raw data, as you might remember from Chapter 1, "Statistics and Business Go Hand in Hand." Raw data are numbers that haven't been transformed with other statistical (mathematical) operations. How can you see underlying patterns in a row of naked numbers? There must be a more productive way to view the information.
Reduce the Risk
Before any statistical calculation—even the simplest—is performed, your data should be tabulated, graphed, or plotted.
Turning Raw Data into Information
Raw numbers need to be organized in a way that makes them understandable. You could simply state that on Monday of the first week we sold 49 boxes of batteries, on Tuesday 37, on Wednesday 89, and so on—but there's got to be a better way to represent and understand the data than a simple narrative. There are three main ways to present raw statistical data such as this: in tables, graphs, and charts. I'll start with tables, move on to graphs, and then discuss pie charts (which look exactly like pizza pies without the pepperoni) and other charting options.
Using Tables
Tables provide an easy format to present raw data in an orderly way that (hopefully) also is easy to read. However, if tables contain hundreds or thousands of numbers, they might not be too easy to understand. Things must be summarized (which I'll talk about later in this chapter). You'll be working with simple tables for now. The following table simply displays the number of boxes of sales for each day of the two weeks using the same data from the beginning of this chapter.
Western Region Battery Sales in Boxes for Two Weeks
Days |
Week One |
Week Two |
---|---|---|
Monday |
49 |
55 |
Tuesday |
37 |
66 |
Wednesday |
89 |
104 |
Thursday |
63 |
41 |
Friday |
65 |
66 |
Totals for the Week |
303 |
332 |
Sometimes you might want to use tables to make comparisons. Suppose you want to compare the sales of the Western and Eastern regions. Here's a table that represents and compares two weeks of sales both:
Western and Eastern Region Battery Sales by Boxes for Two Weeks
Days |
Week One Western |
Week One Eastern |
Week Two Western |
Week Two Eastern |
---|---|---|---|---|
Monday |
49 |
102 |
55 |
97 |
Tuesday |
37 |
95 |
66 |
89 |
Wednesday |
89 |
37 |
104 |
42 |
Thursday |
63 |
41 |
41 |
45 |
Friday |
65 |
55 |
66 |
66 |
Totals for the Weeks |
303 |
330 |
332 |
339 |
You can graph the numbers with dots for each number or actually connect the dots (as shown in later examples in this chapter) with a line that makes a pattern of what is happening with the data. One of the most basic (and important) statistical tables is the frequency table. You can construct this type of table by dividing scores or instances into intervals, and counting the number of scores or instances in each interval. An interval or instance can be 1, but in large frequency tables the frequencies likely will be put into groups such as all frequencies ranging from 1-5, 6-10, and so on. The actual number and percentage of scores in each interval typically are displayed.
Cumulative frequencies also are displayed in a frequency table. A frequency table for the range of chess moves for the players in a chess tournament is provided in the following table as an example of a typical frequency table.
Chess Moves by Number of Players: Cumulative Frequencies
Lower |
Upper |
Players |
Cumulative |
Percentage |
Cumulative |
---|---|---|---|---|---|
25 |
35 |
1 |
1 |
5 |
5 |
35 |
45 |
3 |
4 |
20 |
25 |
55 |
65 |
5 |
10 |
50 |
75 |
75 |
85 |
9 |
19 |
45 |
95 |
85 |
95 |
1 |
20 |
5 |
100 |
Note: Values are > lower limit and < upper limit of moves per game.
You'll probably agree that, simple or complex, tables generally are boring. However, you can add color and dimension to them with today's software—even Microsoft Word 2000 will enable you to do that. Even better than playing with various designs for the tables, you can turn the same data into more interesting graphs and plots that help you interpret the data quite easily.
Using Pie Charts
Pie charts, also called graphs, are a good way to show the relative percentages of a total amount that has been sold, delivered, or manufactured in a business—among other business uses. The following figure shows a simple pie chart that represents the Western region's first week of sales of boxes of batteries.
Figure 3.1 Example of a pie chart.
Line graphs (also called plots) are another simple way of representing data. The following figure shows a line graph that represents the first week of sales for both the Western and Eastern regions on one graph. You can see that even though the total sales by week is very similar—that days on which the most boxes are sold vary. This is the power of graphing raw data—the ability to see things more easily than you can see them in tables or as raw data.
Figure 3.2 Graph of Western and Eastern regional sales for one week.
A polygon plot is skewed if one of its tails is longer than the one in the other direction. The first graph shown in the first of the following three figures has a positive skew. This means it has a long tail in the positive direction. The distribution graph shown in the second of the following figures has a negative skew because it has a long tail in the negative direction. Finally, the third distribution, shown in the third figure is symmetric and has no skew. The tails are the same length and shape on each side. Distributions with positive skew sometimes are called "skewed to the right;" distributions with negative skew are called "skewed to the left."
This is a little bit confusing. Remember, it's the long tail—not the big area of the plot—that determines the direction of the skew. You'll be learning more about skewed distributions in Chapter 6, "Solving Problems with Curves and z-Scores."
Figure 3.3 Graph of positive skew.
Figure 3.4 Graph of negative skew.
Figure 3.5 Graph of symmetric distribution (no skew).
Understanding about skew will become more important and meaningful as you learn more about inferential statistics (including variance) later in the book.
Using Histograms
You can create many different charts and graphs from a frequency table. A histogram is one of the basic graphs that can be constructed from a frequency table. The intervals are shown on the X axis; the number of scores in each interval is represented by the height of a rectangle located above the interval. The following chart is a histogram for the number of moves by the players in a chess tournament.
Figure 3.6 Histogram of chess moves by number of players in a tournament with that frequency per game.
Histograms vary based on the class intervals you use. For example, a histogram of the sales of your boxes of batteries by quarter might look much different than those by month. This is because the shapes of histograms will vary depending on the choice of the size of the intervals. In the first quarterly histogram, I've used intervals of 500 boxes on the X axis. In the second histogram on the same data, I've used intervals of 100 boxes on the X axis. For the monthly histogram, I'm using an interval of 10. Look at the following examples to see how the histograms change based on the size of the intervals.
Monthly Battery Cases Sold in 2002
Jan |
700 |
Feb |
800 |
March |
700 |
April |
600 |
May |
757 |
June |
550 |
July |
867 |
August |
1067 |
Sept |
883 |
October |
567 |
November |
933 |
December |
683 |
Sales Boxes Sold by Quarter
Qtr 1 |
2200 |
Qtr 2 |
1907 |
Qtr 3 |
2817 |
Qtr 4 |
2183 |
Figure 3.7 Histograms of quarterly sales of batteries.
Figure 3.8 Histogram of monthly sales of batteries with 10 unit intervals.
You can see how the length of observation and interval can affect someone's perception of sales during the various months and quarters in 2002. Thus, choosing the interval is important in developing histograms. Here are some helpful steps for you to follow:
Use intervals of equal length with midpoints at convenient round numbers.
For a small data set, use a small number of intervals.
For a large data set, use more intervals.
Another way to display frequencies is with the cumulative frequency distribution. This is a plot or histogram of the number of observations falling on, in, or below an interval. The graph shown in the following figure is a cumulative frequency distribution in the form of a histogram of the scores on a single statistics test. Forty students took the test. The X axis shows various intervals of scores (the interval labeled 35 includes any score from 32.5 to 37.5). The Y axis shows the number of students scoring in the interval or below the interval.
Any cumulative frequency distribution can be displayed as either the actual frequencies at or below each interval (as shown here) or the percentage of the scores at or below each interval. A cumulative frequency distribution can be a histogram, as shown in the next figure, or a polygon plot as shown in the following figure.
Figure 3.9 Cumulative scores on a statistics test for forty students.
There are many ways to display frequencies in a series of related observations, such as people attending the doctor's office over a period of a month, numbers of people riding on the train each month over a year, or the number of boxes of batteries sold each month in a year. One way of displaying the cumulative frequencies over a period is using a frequency polygon as another graphical display of a frequency table.
Figure 3.10 The cumulative frequency of batteries sold over a period of twelve months.
In a frequency polygon the intervals are shown on the X axis; the number of scores, observations, or counts in each interval is represented by the height of a point located above the middle of the interval. The points are connected so that with the X axis they form a polygon, which sometimes looks like a mountain or two mountains; other times it looks like a hill or bell shape, depending on the way the frequencies distribute themselves on the plot. You'll see a lot of frequency diagrams in other chapters.
Using the Bar Graph
A bar graph is much like a histogram, except the columns are separated from each other by a short distance. Bar graphs are commonly used for qualitative variables such as colors, brand names of cars, or other such nominal data. The following chart is a bar graph of the colors of toy wagons sold by color.
Figure 3.11 Sample of a simple bar chart.