Curious people like to probe, ask questions, and understand why something is the way it is. One of the most powerful ways to satisfy a person's curiosity is to provide ways to compare and contrast. People instinctively know how to compare before with after, apples with oranges, and like with unlike. Why do they do this? In part so that they can understand relationships by spotting the differences.
In the field of information graphics, a concept called small multiples provides a natural, instinctive way to show how things relate and, more importantly, how they differ. First popularized by Edward Tufte, small multiples plot several cross sections of data in separate mini-charts and then combine them into a single exhibit. As a result, readers can—at a glance—quickly sweep back and forth across the exhibit, looking for patterns, similarities, and differences. An important quality of the small-multiple chart is that the axes remain constant with respect to their units of measure and scales. Only the data cross sections being plotted change.
Figure 6-14, a screen capture from the distributed network intrusion detection project DShield, shows how small-multiple exhibits work. The small multiples are in the column labeled "Activity Past Month." They show the relative number of hostile scans encountered for the network services enumerated in the "Service Name" column. Although the x- and y-axis labels are not shown, it seems clear enough what they must be: the vertical axis scale starts at 0 and increases at a linear—or possibly logarithmic—rate to maxima held constant in all graphs. The x-axis shows how the relative number of scans varies over time. Minor quibbles about labeling aside, the use of small multiples in this exhibit enables the reader to quickly get a sense of which ports are most likely to be scanned. In this case, they are 445 and 135—two ports associated with Windows services that are often prone to compromises. A network administrator running an all-Windows environment, for example, might see this exhibit and decide to push out a group policy temporarily restricting access to these ports.
One can easily imagine how this exhibit could be enhanced. Instead of simply showing the "top 10" most-scanned ports, we could show the top 100, or a subset of the most common well-known ports. Doing so would require some graphical nips and tucks. The "Explanation" column would need to vanish, and we would want to combine the "Service Name" and "Port Number" columns. From the point of view of aesthetics, representing the scan results as solid filled area charts on a white background (instead of black) could increase the small-multiple format's readability.
An intriguing small-multiple format that would work well here is the sparkline—a minimalist "simple, intense, word-sized graphics" format invented by Tufte.7 Figure 6-15 shows a fictitious redrawing of the preceding exhibit using sparkline format, constructed using Excel. Each mini-chart includes a dark gray line to show the trend for each cross section, as well as a light gray band denoting the "normal" range—that is, the mean value plus or minus the standard deviation. So that the reader can understand the plot lines in context, the final data point in each series is highlighted with a red marker and numeric label.
Figure 6-15 Sample Small-Multiple Exhibit (Sparklines)
Quartile-Plot Small Multiples
The time-series-oriented small multiples in Figures 6-14 and 6-15 help the reader understand the relative magnitude of activities over time. But time series charts are not the only type of graphic that can be used as a small multiple. Figure 6-16 shows a hand-drawn small-multiple exhibit using bar charts that compares and contrasts the distribution of security flaws across nine different application security areas for a selected group of applications.8 Each multiple contains a vertical bar chart displaying the area's first and fourth quartiles in the data set.
Figure 6-16 Sample Small-Multiple Exhibit (with Quartiles)
The combination of the small-multiple format with a first-versus-fourth comparison yields an extremely powerful graphic. A simple glance at the exhibit reveals the headline: fourth-quartile applications are much worse than their first-quartile counterparts in some areas, but not others. For example, the "best" applications contain 90% fewer authentication defects, have 90% fewer problems related to sensitive data handling, and suffer from 80% fewer session management issues. In contrast, the number of cryptographic issues are few across the board, and the difference between the best and worst applications is not large.
The exhibit is interesting for another reason: it sports a "layered" macro/micro design that shows both the overall total (on the left) and the contributions made by individual multiples. The scaling factors for the y-axis remain the same for both, and the quartile labels on the "overall total" graph serve as a key. Not all small-multiple exhibits lend themselves to such an elegant format, but it is nice when they do.
Small multiples, while powerful, are not well-supported by mainstream spreadsheet packages. For example, due to lack of better methods, a security analyst would need to hand-draw Figure 6-16 using Visio or a similar drawing package. Some careful spreadsheet jockeying in Excel might also work, although to do so would require the analyst to painstakingly format and align each multiple down to the pixel—and pray that Excel doesn't move or reformat it. However, in most cases analysts would do better to generate the individual multiples using a script and then stitch them together programmatically into a web page or PDF.