Time Series Charts
Time series charts are probably the best-known technique for visualizing security metrics. They remain the most common form of exhibit in security information reports, and they figure prominently in products for measuring compliance or tracking vulnerabilities.
Basic Time Series Charts
Chapter 5 discussed how a time series captures a set of consistently measured data records over an interval of time. Each record contains a number of data attributes. Time series charts simply graph an attribute (or set of attributes) over a time interval. The time interval (generally days, months, quarters, or years) serves as the independent variable and usually appears on the horizontal axis. The attribute(s) that vary over time serve as the dependent variable(s) and appear on the vertical axis.
Variations on the basic time series chart exist. Clever analysts occasionally add a second vertical axis on the right side of the exhibit to display a contrasting attribute in the same exhibit. Most readers are likely familiar with financially oriented time series charts (The Economist, Business Week) that show, for example, interest rates on the left and money supply on the right.
Time series charts accommodate a number of formats, depending on the preferences of the security analyst. Formats that work well include
- Line charts
- Area charts
- Bar charts
Each format has strengths and weaknesses. Personally, I prefer line charts for exhibits used in isolation. In an individual exhibit, the direction and tendency of the series line matters most; bars and colored chart areas distract the reader. But for small-multiple exhibits (discussed later in this chapter), area charts can help imbue individual exhibits with stronger "shapes" that are better distinguished by the eye.
Figure 6-8 depicts a sample time series graphic, drawn as a line chart. It shows the number of infections for the 2001 Slammer worm, based on data from the Cooperative Association for Internet Data Analysis (CAIDA).4 When I prepared this chart, I wanted the infection trend line to be the most prominent characteristic. Note how the x- and y-axes are relatively plain and thin, while the data series itself appears as a thick line drawn in saturated blue.
Figure 6-8 Time Series Chart of the Slammer Worm
Time series charts are perhaps the most easy-to-understand form of information graphics. Everyone—managers, staff, and laypersons—knows how to interpret what they mean. Every graphics package worth its salt supports one or more of its forms. And unless the analyst commits a horrible labeling blunder, they are nearly impossible to screw up.
Indexed Time Series Charts
Popularity and wide tool support mean that time series charts make a good starting point for visualizing security metrics. One of the more common applications of time series charts is for displaying improvement over time against a baseline. By "baseline" I mean a set of measurements taken at a particular point in time. A twist on the venerable time series chart, therefore, is an "indexed" version that charts each data series relative to the baseline.
To create the baseline, the analyst selects a starting point in time and normalizes all dependent data series values at that point to some "base" index value. I prefer normalizing to the number 100 because it corresponds to the "report card" or "IQ score" scales that most people are familiar with. As a side benefit, it displays fairly nicely and can show up to two significant digits of precision if required.
Normalization of the data series values to the baseline value produces a chart in which all values emanate from a common baseline origin point and diverge from that point forward. The normalization, in effect, encourages the viewer to trace the pathways of each series over time.
Figure 6-9 shows a sample indexed time series chart depicting the number of security vulnerabilities in several types of software for the period of 2001 through the first quarter of 2005. It uses 2001 quarterly averages as the baseline values. The chart clearly shows that the number of vulnerabilities for Microsoft products dropped well below its 2001 baseline early in 2003 and has not yet returned to that level. In contrast, the number of vulnerabilities for security products increased in early 2005 to nearly 50 percent (indexed value: 151) over the 2001 baseline.
Figure 6-9 Indexed Time Series Chart
Indexed time series charts challenge readers to compare and contrast rates of change among divergent data series over time. As a result, they are best used to show comparisons of measurements taken over time against understood baselines.
Quartile Time Series Charts
Indexed time series charts showcase one way to revisualize sets of time series data by normalizing to a baseline. Another variation on the time series chart, which I refer to as the "quartile time series chart," showcases another technique. It uses quartile information from data sets to show broader measures of performance over time.
As you may recall from Chapter 5, quartiles group data into four bins: the top 25% of the data points in the sample comprise the first or "top" quartile, and the bottom 25% form the fourth. The last element in the second quartile, in fact, is the median data point in the set.
To create a quartile time series chart, the analyst calculates the first, second, third, and fourth quartile numbers for each time interval in the data set. The resulting exhibit simply graphs the first, second, and third quartiles. Figure 6-10 shows a sample quartile time series chart. Notice how the exhibit omits the fourth quartile; since it represents the upper bound of the data set, including it would only add visual noise.
Figure 6-10 Quartile Time Series Chart
The way to read the exhibit is straightforward: the thick line represents the median values that separate the second and third quartiles. The thin line below the median separates the first and second quartiles, and the thin line above the median separates the third from the fourth. Based on the positions of the lines, viewers can quickly identify the correct quartile that any other data point falls into. Although the time series interval in the example I have provided is fairly broad (yearly samples), the broad headlines from this exhibit announce themselves:
- The period from 2000 until 2001 saw the most dramatic improvement (a 50% drop in median scores).
- Since 2001, median scores have stayed fairly flat.
- The worst applications (fourth quartile) demonstrated continuous improvement through all periods.
- All quartiles appear to be converging, which means that application security scores are generally improving across the board (specifically, the difference between the first and third quartile lines decreases over time).
- The first quartile has worsened in the most recent year (2003) relative to the previous one.
This chart contains two minor graphical refinements worth mentioning. First, all the quartile lines appear in the same color (black). However, the line that arguably provides the most context—the median—was drawn thicker (a 3-point line instead of 1-point). Second, I have added free-form text labels (italicized) to clearly establish the territories occupied by each quartile and to identify the median (italicized and bold).
In addition to these refinements, analysts can plot additional data points to show which quartiles they fall into. This is extremely useful for answering a common question from management about a particular item (namely, "How did we do?"). In fact, an analyst could combine the quartile time series line plots with a scatter plot showing the scores for selected (or all) data points in the set.
Alternatively, the analyst could create what I refer to as the "You Are Here" benchmarking chart by adding a horizontal line representing the score for a particular data point being benchmarked. The line crosses the y-axis and extends the width of the chart. When I was a consultant at @stake, for example, we used this technique to show how a client's freshly assessed application scored relative to our first/second/third/fourth quartile benchmarks. Clients liked the "You Are Here" chart because it showed how their applications ranked—that is, which quartile they fell into. From the consultant's point of view, the "You Are Here" chart helped drive business because it made the point that the client's application would have ranked better (or worse) in different periods.
Quartile charts excel in revealing how data change over time. They dig below the surface by graphing more than just simplistic averages or means. I rarely see them used, and that is a shame. Make them part of your toolset.