Home > Articles > Security > Software Security

The Role of Architectural Risk Analysis in Software Security

By Gary McGraw
Mar 3, 2006

📄 Contents

␡

Common Themes among Security Risk Analysis Approaches
Traditional Risk Analysis Terminology
Knowledge Requirement
The Necessity of a Forest-Level View
A Traditional Example of a Risk Calculation
Limitations of Traditional Approaches
Modern Risk Analysis
Touchpoint Process: Architectural Risk Analysis
Getting Started with Risk Analysis
Architectural Risk Analysis Is a Necessity

⎙ Print

< Back Page 8 of 10 Next >

This chapter is from the book 

Software Security: Building Security In

Learn More Buy

Touchpoint Process: Architectural Risk Analysis

Architectural risk analysis as practiced today is usually performed by experts in an ad hoc fashion. Such an approach does not scale, nor is it in any way repeatable or consistent. Results are deeply constrained by the expertise and experience of the team doing the analysis. Every team does its own thing. For these reasons, the results of disparate analyses are difficult to compare (if they are comparable at all). That’s not so good.

As an alternative to the ad hoc approach, Cigital uses the architectural risk analysis process shown in Figure 5-4. This process complements and extends the RMF of Chapter 2. Though the process described here is certainly not the “be all, end all, one and only” way to carry out architectural risk analysis, the three subprocesses described here are extraordinarily powerful.

Figure 5-4 A simple process diagram for architectural risk analysis.

A risk analysis should be carried out only once a reasonable, big-picture overview of the system has been established. The idea is to forget about the code-based trees of bugland (temporarily at least) and concentrate on the forest. Thus the first step of the process shown in the figure is to build a one-page overview of the system under analysis. Sometimes a one-page big picture exists, but more often it does not. The one-page overview can be developed through a process of artifact analysis coupled with interviews. Inputs to the process are shown in the leftmost column of Figure 5-4.

Three critical steps (or subprocesses) make up the heart of this architectural risk analysis approach:

Attack resistance analysis
Ambiguity analysis
Weakness analysis

Don’t forget to refer back to Figure 5-4 as you read about the three subprocesses.

Attack Resistance Analysis

Attack resistance analysis is meant to capture the checklist-like approach to risk analysis taken in Microsoft’s STRIDE approach. The gist of the idea is to use information about known attacks, attack patterns, and vulnerabilities during the process of analysis. That is, given the one-page overview, how does the system fare against known attacks? Four steps are involved in this subprocess.

Identify general flaws using secure design literature and checklists (e.g., cycling through the Spoofing, Tampering, ... categories from STRIDE). A knowledge base of historical risks is particularly useful in this activity.
Map attack patterns using either the results of abuse case development (see Chapter 8) or a list of attack patterns.
Identify risks in the architecture based on the use of checklists.
Understand and demonstrate the viability of these known attacks (using something like exploit graphs; see the Exploit Graphs box ).

Note that this subprocess is very good at finding known problems but is not very good at finding new or otherwise creative attacks.

Example flaws uncovered by the attack resistance subprocess, in my experience, include the following.

Transparent authentication token generation/management: In this flaw, tokens meant to identify a user are easy to guess or otherwise simple to misuse. Web-based programs that use “hidden” variables to preserve user state are a prime example of how not to do this. A number of these flaws are described in detail in Exploiting Software [Hoglund and McGraw 2004].
Misuse of cryptographic primitives: This flaw is almost self-explanatory. The best example is the seriously flawed WEP protocol found in 802.11b, which misused cryptography to such an extent that the security was completely compromised [Stubblefield, Ioannides, and Rubin 2004].
Easily subverted guard components, broken encapsulation: Examples here are slightly more subtle, but consider a situation in which an API is subverted and functionality is either misused or used in a surprising new way. APIs can be thought of as classical “guards” in some cases, as long as they remain a choke point and single point of entry. As soon as they can be avoided, they cease to be useful.
Cross-language trust/privilege issues: Flaws arise when language boundaries are crossed but input filtering and state-preservation mechanisms fail.

Exploit Graphs

An exploit graph helps an analyst understand what kind of access and/or pattern is required to carry out an attack given a software risk. Flowcharts are very useful in describing an exploit and should include some basics such as attack delivery (payloads), gaining access, privilege escalation, subverting protections, descriptions of architectural failure, and discussion of any existing mitigations (and their effectiveness). Charts help. Figure 5-5 shows a simple exploit graph that illustrates a mobile code attack.

Figure 5-5 An exploit graph showing one of the mobile code attacks described in Securing Java [McGraw and Felten 1999]. The section numbers refer to entries in an associated table (in this case, Table 5-1). John Steven of Cigital created this graph.

A Partial Exploit Graph Table to Accompany Figure 5-5

Step #	Detail: How/What	Conditions	Protection
Delivery 1	Deliver attack: get attack code onto machine with Jewel.	Client must have Internet access.
Delivery 1.1	Trick user to point browser to JSP.	Browser must have “run JSP” enabled.	Disable JSSP in browser. NOTE: doing so prevents other sites from working.
Delivery 1.2	Send victim e-mail containing malicious JSP.	User’s mail reader must interpret JSP.	Disable JSP execution in mail reader.
Note: JSP refers to Java Server Page.

Exploit graphs also require some explanation in text as briefly described earlier. Table 5-1 is a partial view (attack delivery only) of the table meant to accompany Figure 5-5.

Though attack graphs are not yet a mechanism in widespread use, they do help in a risk analysis. Their most important contribution lies in allowing an analyst to estimate the level of effort required to exploit a flaw. When it comes to exploit development, having a set of exploit graphs on hand can help determine which one exploit (usually of many) is the best to develop in the case that some kind of “proof” is required. Sometimes you will find that exploit development is required to convince skeptical observers that there is a serious problem that needs to be fixed.

Ambiguity Analysis

Ambiguity analysis is the subprocess capturing the creative activity required to discover new risks. This process, by definition, requires at least two analysts (the more the merrier) and some amount of experience. The idea is for each team member to carry out separate analysis activities in parallel. Only after these separate analyses are complete does the team come together in the “unify understanding” step shown in Figure 5-4.

We all know what happens when two or more software architects are put in a room together ... catfight—often a catfight of world-bending magnitude. The ambiguity analysis subprocess takes advantage of the multiple points of view afforded by the art that is software architecture to create a critical analysis technique. Where good architects disagree, there lie interesting things (and sometimes new flaws).

In 1998, when performing an architectural risk analysis on early Java Card systems with John Viega and Brad Arkin (their first), my team started with a process very much like STRIDE. The team members each went their solitary analysis ways with their own private list of possible flaws and then came together for a whiteboard brainstorming session. When the team came together, it became apparent that none of the standard-issue attacks considered by the new team members were directly applicable in any obvious fashion. But we could not very well declare the system “secure” and go on to bill the customer (Visa)! What to do?!

As we started to describe together how the system worked (not how it failed, but how it worked), disagreements cropped up. It turns out that these disagreements and misunderstandings were harbingers of security risks. The creative process of describing to others how the system worked (well, at least how we thought it worked) was extremely valuable. Any major points of disagreement or any clear ambiguities became points of further analysis. This evolved into the subprocess of ambiguity analysis.

Ambiguity analysis helps to uncover ambiguity and inconsistency, identify downstream difficulty (through a process of traceability analysis), and unravel convolution. Unfortunately, this subprocess works best when carried out by a team of very experienced analysts. Furthermore, it is best taught in an apprenticeship situation. Perhaps knowledge management collections will make this all a bit less arbitrary (see Chapter 11).

Example flaws uncovered by the ambiguity analysis subprocess in my experience include the following.

Protocol, authentication problems: One example involved key material used to (accidentally) encrypt itself in a complex new crypto system. It turns out that this mistake cut down the possible search space for a key from extremely large to manageably small.^[10] This turned out to be a previously unknown attack, but it was fatal.
Java Card applet firewall and Java inner class issues: Two examples. The first was a problematic object-sharing mechanism that suffered from serious transitive trust issues, the gist being that class A shared method foo with class B, and class B could then publish the method to the world (something A did not necessarily condone). The second involved the way that inner classes were actually implemented (and continue to be implemented) in various Java compilers. Turns out that package scoping in this case was somewhat counterintuitive and that inner classes had a privilege scope that was surprisingly large.
Type safety and type confusion: Type-safety problems in Java accounted for a good portion of the serious Java attacks from the mid-1990s. See Securing Java [McGraw and Felten 1999].
Password retrieval, fitness, and strength: Why people continue to roll their own password mechanisms is beyond me. They do, though.

Weakness Analysis

Weakness analysis is a subprocess aimed at understanding the impact of external software dependencies. Software is no longer created in giant monolithic a.out globs (as it was in the good old days). Modern software is usually built on top of complex middleware frameworks like .NET and J2EE. Furthermore, almost all code counts on outside libraries like DLLs or common language libraries such as glibc. To make matters worse, distributed code—once the interesting architectural exception—has become the norm. With the rapid evolution of software has come a whole host of problems caused by linking in (or otherwise counting on) broken stuff. Leslie Lamport’s definition of a distributed system as “one in which the failure of a computer you didn’t even know existed can render your own computer unusable” describes exactly why the weakness problem is hard.

Uncovering weaknesses that arise by counting on outside software requires consideration of:

COTS (including various outside security feature packages like the RSA libraries or Netegrity’s authentication modules)
Frameworks (J2EE, .NET, and any number of other middleware frameworks)
Network topology (modern software almost always exists in a networked environment)
Platform (consider what it’s like to be application code on a cell phone or a smart card)^[11]
Physical environment (consider storage devices like USB keys and iPods)
Build environment (what happens when you rely on a broken or poisoned compiler? what if your build machine is running a rootkit?)

In the coming days of Service Oriented Architectures (SOAs), understanding which services your code is counting on and exactly what your code expects those services to deliver is critical. Common components make particularly attractive targets for attack. Common mode failure goes global.

The basic idea here is to understand what kind of assumptions you are making about outside software, and what will happen when those assumptions fail (or are coerced into failing). When assumptions fail, weaknesses are often revealed in stark relief. A large base of experience with third-party software libraries, systems, and platforms is extremely valuable when carrying out weakness analysis. Unfortunately, no perfect clearinghouse of security information for third-party software exists. One good idea is to take advantage of public security discussion forums such as BugTraq <http://www.securityfocus.com/archive/1>, comp.risks <http://catless.ncl.ac.uk/Risks>, and security tracker <http://www.securitytracker.com>.^[12]

Example flaws uncovered by the weakness analysis subprocess in my experience include the following.

Browser and other VM sandboxing failures: Browsers are overly complex pieces of software rivaled in complexity only by operating systems. Browsers have so many moving parts that finding unexplored niches and other “between the seams” flaws is easy.
Insecure service provision—RMI, COM, and so on: Protocols and communications systems are often a standard feature of modern software. When Java’s RMI was found to fail open <http://www.cs.princeton.edu/~balfanz>, the systems counting on RMI were all subject to the same kind of attack.
Debug (or other operational) interfaces: Debugging code is always as useful to the attacker as it is to the maintainer. Don’t send error reports to your (mis)user.
Unused (but privileged) product “features”: If you put overly powerful features into your design, don’t be surprised when they are turned against you. See Building Secure Software for a good story of what happened when old-fashioned bulletin board systems allowed a user to invoke emacs [Viega and McGraw 2001].
Interposition attacks—DLLs, library paths, client spoofing: Person-in-the-middle attacks are very popular, mostly because they are very effective. Same goes for PATH hacking, spoofing, and other low-hanging fruit. Carefully consider what happens when an attacker gets between one component and the other components (or between one level of the computing system and the others).

By applying the simple three-step process outlined here, you can greatly improve on a more generic checklist-based approach. There is no substitute for experience and expertise, but as software security knowledge increases, more and more groups should be able to adopt these methods as their own.

< Back Page 8 of 10 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address