Home > Articles > Security > General Security and Privacy

Software [In]security: BSIMM2

  • Print
  • + Share This
  • 💬 Discuss
Since the release of the original Building Security In Maturity Model (BSIMM) in 2009, the size of the study has tripled. BSIMM2 is the second iteration of the BSIMM project and can be used as a measuring stick for software security. Gary McGraw, author of Software Security: Building Security In, describes the BSIMM2 along with Brian Chess, Sammy Migues, and Elizabeth Nichols.


The Building Security In Maturity Model (BSIMM, pronounced "bee simm") is an observation-based scientific model directly describing the collective software security activities of thirty software security initiatives. Twenty of the thirty firms we studied have graciously allowed us to use their names. They are:

Adobe Intel SWIFT
AON Intuit Symantec
Bank of America Microsoft Telecom Italia
Capitol One Nokia Thomson Reuters
The Depository Trust & Clearing Corporation (DTCC) QUALCOMM VMWare
EMC Sallie Mae Wells Fargo
Google Standard Life  

BSIMM2 is the second iteration of the BSIMM project. The original BSIMM described the software security initiatives underway in nine firms. Since the release of the original model in March 2009, the size of the study has tripled.

BSIMM2 can be used as a measuring stick for software security. As such, it is useful for comparing software security activities observed in a target firm to those activities observed among the thirty firms (or various subsets of the thirty firms). A direct comparison using the BSIMM is an excellent tool for devising a software security strategy.

Just the Facts, Ma'am

By contrast with prescriptive, "faith-based" approaches to software security, the BSIMM is directly descriptive. That is, it does not tell you what you should do; instead, it tells you what everyone else is actually doing. As a descriptive model, BSIMM2 has accumulated a number of observed facts that we share here.

BSIMM2 describes the work of 635 people whose firms have a collective 130 years of experience working on software security. On average, the target organizations have practiced software security for four years and five months (with the newest initiative being three months old and the oldest initiative being fourteen years old in September 2009). All thirty agree that the success of their program hinges on having an internal group devoted to software security — the Software Security Group (SSG). SSG size on average is 21.9 people (smallest 0.5, largest 100, median 13) with a "satellite" of others (developers, architects and people in the organization directly engaged in and promoting software security) of 39.7 people (smallest 0, largest 300, median 11). The average number of developers among our targets was 5061 people (smallest 40, largest 30,000, median 3000), yielding an average percentage of SSG to development of just over 1%.

BSIMM2 describes 109 activities organized in twelve practices according to the Software Security Framework. During the study, we kept track of how many times each activity was observed in the thirty firms. Here are the resulting data (to interpret individual activities, download a copy of the BSIMM document, which carefully describes the 109 activities).

As you can see, fifteen of the 109 activities are highlighted. These are the most commonly observed activities. We describe them in some detail in an article titled What Works in Software Security.

The thirty executives in charge of the software security initiatives we studied have a variety of titles, including: Director of IT Security and Risk Management, Director of Application Controls, Product Security Manager, Sr. Manager of Product Security, SVP of Global Risk Management, Global Head of Information Security Markets, and CISO. We observed a fairly wide spread in exactly where the SSG is situated in the firms we studied as well. In particular, 7 SSGs exist in the CIO's organization, 7 exist in the CTO's organization, 6 report to CSOs, 3 exist in either the General Counsel's office or the Office of Compliance and Risk Management, and 2 report directly to the founder or CEO. Five of the companies we studied did not specify where their SSG fits in the larger organization.

Every single one of the thirty successful programs we describe in the BSIMM has a sizeable SSG. Carrying out the activities in the BSIMM successfully without an SSG has never been observed in the field to date. Though none of the thirty SSGs we examined had exactly the same structure (suggesting that there is no one set way to structure an SSG), there are some commonalities we observed that are worth mentioning. At the highest level of organization, SSGs come in three major flavors: those organized according to technical SDLC duties, those organized by operational duties, and those organized according to internal business units. Some SSGs are highly distributed across a firm, and others are very centralized and policy-oriented. If we look across all of the thirty SSGs in our study, there are several common "subgroups" that are often observed. They are: people dedicated to policy/strategy and metrics; internal "services" groups that (often separately) cover tools, pen testing, and middleware development/shepherding; incident response groups; groups responsible for training development and delivery; externally-facing marketing/communications groups; and, vendor-control groups. For more about this issue see the article You Really Need an SSG.

In the numbers reported above, we noted that percentage of SSG to development of just over 1% in the thirty organizations we studied. This was a number that we originally uncovered in the initial BSIMM work (a study of nine companies). Much to our surprise, it has held steady as we collected data from twenty-one more firms. That means one SSG member for every 100 developers. The largest SSG was 2.6% and the smallest was 0.05%. To remind you of the particulars in terms of actual bodies, SSG size on average among the thirty firms is 21.9 people (smallest 0.5, largest 100, median 13).


Statisticians all know that a sample size of 30 or greater makes for meaningful statistics. Because the BSIMM2 data set includes data from 30 firms, we can perform some statistical analysis on the model, and verify its findings mathematically. Some of our results are presented here.

The first metric we computed was a Recursive Mean Score (RMS) for each firm. Starting with the 109 activities, a company is awarded a 0 or a 1, 1 for "observed." Activities are grouped by practices, which are grouped by domains as described in the Software Security Framework. Furthermore, activities are assigned a "level" inside a practice (see below). The RMS metric produced 53 values between 0 and 100 for each of the 30 firms — numbers that support analysis of activity density across many dimensions.

The distribution of RMS numbers is amazing. RMS scores fall neatly into a symmetric, normal (Gaussian) distribution as shown here. This seems to indicate that the scoring has been very consistent and neither too lenient nor too harsh. Mathematically, this result validates the math we used to come up with our sample size estimate, and it satisfies the prerequisites for a wide range of statistical models that we can use to analyze the data.

We can zoom in to the practice level and begin to get a feel for the relative impact that various practices have on BSIMM2 scores.

This whisker chart takes some unpacking. There is one blue whisker for each BSIMM practice. At first glance, we get primarily a sense of comparative activity density (or prevalence in the population) across the twelve BSIMM practices for all 30 firms. Now we know which practices have the highest and lowest density and which practices had the highest and lowest observed variation in density.

The yellow annotations point out precisely what each whisker depicts. Each whisker shows the min, 1stQuartile, median, 3rdQuartile, max and outliers of practice score. Outliers are observations that are more than 1.5 times the Inter-Quartile-Range (IQR) above the 3rd Quartile or below the 1st Quartile. The notches extend to +/- 1.58 IQR/sqrt(n), which is designed to give a 95% confidence interval for the estimate of the median.

If the notches of two whiskers do not overlap, this is strong evidence (95% confidence) that the two medians differ. Some observations:

  1. The greatest activity appears to be in the Strategy & Metrics practice, with Security Features & Design a very close second — but our confidence in saying this is less than 95%.
  2. The least activity appears to be in the Attack Models and Training practices — again, the overlapping notches prevent us from stating this with very high confidence.
  3. The greatest variation occurs in the Software Environment and CM&VM practices.
  4. For least variation, there are several candidates: Architecture Analysis and Strategy & Metrics are two.

Now you can see why sample size can be so important. As more companies participate in the BSIMM, margins of error about our estimates will narrow, and our mathematical statements can reflect that.

There Are No Special Snowflakes

Most firms feel that they are not at all like any other firm (ironically, they are all alike in that feeling), but the BSIMM2 data show that when it comes to software security, firms generally do the same thing even when they are in different industry verticals. To demonstrate that "there are no special snowflakes," we ran a pairwise T test comparing the activities undertaken in the Financial Services vertical (twelve firms) to the activities undertaken by the Independent Software Vendors (seven firms). As you can see in the Venn diagram (Figure 4) below, there is plenty of overlap even among firms in different verticals.

The numbers shown below are activity counts. The activity count in the middle of the two overlapping circles is the count of activities with similar levels of observation for the two verticals. A perfect overlap would be 109. We used the student T test to accept or reject, at a 95% level of confidence, the following hypothesis: H0: Vertical X's effort is EQUAL to Vertical Y's effort associated with activity A, where "effort" for Vertical X and Activity A was computed by taking the average of the associated 0s and 1s for activity A across all firms in Vertical X. Similarly "effort" for Vertical Y and activity A was computed by taking the average of the associated 0s and 1s for activity A across all firms in Vertical Y. If we accept the null hypotheses, the activity is counted in the overlap of the two circles. Otherwise, it is not. Looking at the results here for the FI versus ISV comparison, there are 100 activities for which we could, at a level of 95% confidence, not reject this hypothesis. Put another way, we can say: 100 out of 109 activities in these two verticals have the same level of effort applied. The numbers 5 and 4 are counts of activities where we could reject the hypothesis. Five activities had a higher concentration in FI than in ISV. Four were higher for ISV than FI. In sum, there are no special snowflakes.

Self Reflection

The BSIMM2 breaks down activities into maturity levels that are meant mostly as a guide. The levels provide a natural progression through the activities associated with each practice. However, it is not at all necessary to carry out all activities in a given level before moving on to activities at a higher level in the same practice. That said, the levels that we have identified hold water under statistical scrutiny. Level one activities (straightforward and simple) are commonly observed, level two (more difficult and requiring more coordination) slightly less so, and level three (rocket science) are much more rarely observed.

A quick look back at the activities chart at the top of the article shows how well the levels map into our data set. There were two level demotions and two level promotions encompassing 4 activities that we changed for BSIMM2.

Remember, the BSIMM represents observations from the field. To see how well we did setting levels, we drew lines across the activities per practice based on our experience. The diagram here shows how well we have done so far.

We scored observations and coded them with a color spectrum ranging from red to black to green with bright red being no adoption and bright green being 100% adoption. Based on observations per maturity level, this picture tells us that we did a pretty good job assigning maturity levels. The fact that you see very little red in maturity level 1 and lots of red in maturity level 3 is good.

Of course, this picture will almost certainly change over time based on two occurrences: there may be more activities added to the model, and some things will just get easier over time. If these things happen, we will be sure to report them.

Creating A Community

All 30 firms who participated in the BSIMM have expressed a desire to create a community of interest around software security and the BSIMM. We have already held some informal events where participants swapped software security war stories and got to know each other. We plan to host a conference in the Fall for the BSIMM participants.

Our long-term plan is to continue to aggressively expand the model, adding as many firms as we can while retaining data integrity. We intend to take on the hard problem of measuring efficiency and effectiveness of observed activities.

As we continue to gel as a BSIMM community, we have formed a BSIMM Advisory Board made up of Steve Lipner from Microsoft, Eric Baize from EMC, Jeff Cohen from Intel, Janne Uusilehto from Nokia and Brad Arkin from Adobe. Together we will continue to evolve and grow the BSIMM study.

  • + Share This
  • 🔖 Save To Your Account


comments powered by Disqus