The previous section discussed foundations: basic issues of hardware attacks and defenses. However, when putting together a secure system, one typically thinks of larger-scale components. Rather than worrying only about how to build a chip that resists an attacker, one might worry about how to use an attack-resistant chip to do something useful within a larger system. In this section, we take a look at some of components in the toolbox.
16.3.1 Secure Coprocessors
If we’re thinking about trying to protect computation from an adversary with direct physical access to the computer, the most "natural" approach might be to think about putting armor around the entire computer. However, since effective physical security raises issues about heat dissipation and internal maintenance, we usually can’t count on armoring the entire computer system in question, so a more practical compromise is to armor a smaller subsystem and use that in conjunction with a larger host. This is the approach taken by secure coprocessors. Commercial examples include the IBM 4758 [SW99] and its more recent follow-on, the IBM 4764 [AD04]. (As the reader may conclude from checking out the citations in the bibliography, yes, the authors of this book had something to do with this.)
Generally, this type of device works by hiding secrets inside the armored device and using an interleaving of tamper-protection techniques to ensure that, under attack, the secrets are destroyed before the adversary can get to them. Owing to the relative ease of zeroizing SRAM compared to other forms of storage, secure coprocessors typically end up with a tiered memory architecture: a small amount of battery-backed SRAM contains the nonvolatile but tamper-protected secret; larger DRAM contains runtime data, and FLASH holds nonvolatile but non-secret data.
As a consequence, perhaps the most natural application of a secure coprocessor is to obtain confidentiality of stored data. This can be useful. However, one can also use this "protected secret" architecture to provide other properties. For example:
- Integrity of public data. If the secret in question is the private half of a key pair, then the coprocessor can use it to sign statements. A relying party that verifies the signature and believes that the device’s physical security works and software is trustworthy, can believe that this statement came from an untampered device. If the statement pertains to the value of a stored data item, then the relying party can trust in the integrity of that value. This property may be useful in such scenarios as metering.
Integrity of executing program. Is the device still running the correct, untampered software? A side effect of the private key approach just discussed is that the relying party can also verify that the software inside the device is still correct—if an adversary has tampered with it, then the private key would have, in theory, been zeroized and thus not available to the modified software.
This property can be useful in many scenarios, such as a trustworthy SSL-protected Web server. With more complex devices that permit updates and reinstallation of software and permit nontrivial software architectures, making this scheme work can become rather tricky. This idea of outbound authentication—enabling the untampered entity to authenticate itself as such to the outside world—foreshadowed the subsequent emphasis on attestation.
Privacy of program execution. Some scenarios call for the program itself to be public but its execution to be private—that is, not only selected parameters but also operational details, such as which branch is taken after a comparison. For example, consider an auction. The program may need to be public, as all participants need to trust that the program evaluating the bids works correctly. However, exactly what it does when it runs on the secret bids should be secret; otherwise, observers would know details of the bids.
Outbound authentication, combined with a self-contained computing environment, can provide this property.
- Secrecy of program code. Typically, the device may store its software in internal FLASH. However, the device could store much of this software in encrypted form and use its protected secret to decrypt it into DRAM before execution—thus using the protected-secret architecture to provide secrecy of program executables. This property may be useful for protecting proprietary pricing algorithms for insurance or pharmaceuticals.
Using a secure coprocessor in real-world applications may require dealing with some subtle design and architecture issues, owing to the exigencies of commercially feasible physical security. One basic problem is that the device may be too small to accommodate the necessary data; this problem drives some current research, as we discuss later. Another problem arises from the typical lack of human I/O on devices. If an enterprise runs a stand-alone application that has one trusted coprocessor installed but depends on input from an untrustworthy host, then the enterprise may not be benefiting much from the physical security. Nearly anything the adversary might have wanted to do by attacking the coprocessor can be achieved by attacking the host. The true value of the physical security comes into play when other parties and/or other trusted devices come into the picture: for example, remote clients connecting to a coprocessor-hardened server.
Another real-world issue with using a commercial secure-coprocessor platform is believing that it works. In our case, we had it validated against FIPS 140-1; however, going from such a validation to the conclusion that a system using such a device is sufficiently secure is a big step—see Chapter 11.
16.3.2 Cryptographic Accelerators
As discussed earlier in the book, cryptography is a fundamental building block of security in many modern computing scenarios. However, as Chapter 7 made clear, it is based on tasks that are by no means easy for traditional computers. For a basic example, RSA requires modular exponentiation: taking X and Y to X Y mod N, where X, Y, and N are all very large integers. By current standards, RSA requires integers at least 1024 bits long to be deemed secure; currently, however, standard desktop computers operate on 32-bit words. Implementing 1024-bit modular exponentiation on a 32-bit machine is rather inefficient; this inefficiency can become an obstacle for applications, such as SSL Web servers, that must do this repeatedly.
These issues drive the idea of creating special-purpose hardware to accelerate such otherwise inefficient operations. Hardware for such operations as symmetric encryption and hashing can also be inserted in-line with data transmission (e.g., in a network card or in a disk drive) to make use of encryption in these aspects of system operation more affordable. (For example, building hardware acceleration for digital signature generation and verification into edge routers can greatly improve the performance cost of S-BGP compared to standard BGP—recall Chapter 5.)
Both the nature and the applications of cryptography introduce issues of physical security for cryptographic accelerators. For one thing, cryptographic parameters, such as private keys, may be long-lived, mission-critical data items whose compromise may have serious ramifications. For another thing, application domains, such as banking and the postal service, have a long history of relying on physical security as a component of trying to assure trustworthiness. As a consequence, cryptographic accelerators may tout tamper protection and feature APIs to protect installation and usage of critical secrets. As we noted, such devices tend to be called hardware security modules (HSMs) in the literature and in discussions of best practices for such application installations as certification authorities. The same architecture issues we noted earlier apply here as well. Physical security may protect against an adversary directly extracting the keys from the device and may protect against more esoteric attacks, such as subverting the key-generation code the device uses in the first place, in order to make the "randomly" generated keys predictable to a remote adversary. However, physical security on the HSM does not protect against attacks on its host.
For using cryptographic accelerators or HSMs in the real world, we advise consideration of many questions.
- Should you trust that the HSM works? Researchers have shown that one can build a crypto black box that appears to work perfectly but has adversarial back doors, like the one discussed earlier [YY96]. Here, we recommend that you look for FIPS validations—both of the overall module (e.g., via FIPS 140-N) and of the individual cryptographic algorithms used (recall Chapter 11).
- Should you trust that the HSM works too well? From a perhaps a straightforward security perspective, it’s better for a device to have false positives—and destroy secrets even though no attack was occurring—than the other way around. From a business perspective, however, this may be a rather bad thing. The necessity to preserve the operational envelope in effective tamper protection may create even more opportunities for such false positives (e.g., if the building heat fails at Dartmouth College in the winter, an IBM 4758 would not last more than a day). Using HSMs requires thinking beforehand about continuity of operations.
- What if the manufacturer goes out of business or the device reaches its end of life? In order to make its physical security mean something, an HSM design may make it impossible to export private keys to another type of device. However, what happens should the vendor cease supporting this HSM? (This happened to colleagues of ours.)
- Exactly how can you configure the cryptographic elements? Having hardware support for fast operations does not necessarily mean that you can do the combination of operations you would like to. For example, the IBM 4758 Model 2 featured fast TDES and fast SHA-1, both of which could be configured in-line with the buffers bringing data in or through the device. Doing cryptography this way on large data was much faster than bringing into the device DRAM and then using the relatively slow internal architecture to drive the operation. However, in practical settings, one usually does not want just encryption: One wants to check integrity as well. One natural way to do this might be to hash the plaintext and then encrypt it along with its hash. However, doing something like this with the fast IBM hardware requires being able to bring the data through the TDES engine and then sneak a copy of the plaintext into the hash engine on its way out. Unfortunately, our fast hardware did not support this!
- What if new algorithms emerge? For example, the TDES engine in the IBM 4758 Model 2 includes support for standard chaining, such as CBC. Subsequently, Jutla invented a slower chaining method that provided integrity checking for free [Jut01]. We would have liked to use this chaining method, but the hardware did not support it. For another example, one need only consider the recent demise of MD5 hashing and fears of the future demise of SHA-1.
Should you believe performance benchmarks? The problem here is that cryptographic operations may feature several parameters; in practice, many operations may be joined together (e.g., signatures or hybrid encryption); and HSMs may include internal modules, thus confusing which boundaries we should measure across.
For example, if one wants to attach a number to an implementation of a symmetric cryptosystem, the natural measure might be bytes per second. IBM did this for the DES engine in the IBM 4758. A customer complained; on examination, we found that the touted speed was what one could get if operations were done with very long data items. Informally, the device had a per byte cost on the data as well as a per operation cost on the overhead of setting up the keys and such. For small data, the per operation cost dominates—and the effective per byte cost could drop an order of magnitude or more.
16.3.3 Extra-CPU Functionality
These armoring approaches run into some fundamental limitations. It seems that the computational power of what can fit inside the armor always lags behind the power of a current desktop system. This delta is probably an inevitable consequence of Moore’s Law (see Section 16.5.3) and the economics of chip manufacturing: What gets packaged inside armor lags behind the latest developments.
This situation raises a natural question: Can we use hardware techniques to improve the security of general systems without wrapping the CPUs in armor? In the commercial and research space here, the general trend is to use hardware to increase assurance about the integrity and correctness of the software on the machine.
Currently, the dominant approach is to consider boot-time protections. Figure 16.1 sketches an example sequence of what software gets executed when a system boots. The time order of this execution creates a dependency order: If software module S 1 executes before software module S 2, then correct execution of S 2 depends on S 1; if the adversary attacks or modifies S 1, then maybe it will change S 2 before loading it, or maybe it will load something else altogether.
Figure 16.1 At boot time, a well-defined sequence of software modules get executed.
Boot-time approaches exploit the inductive nature of this sequence. By magic, or perhaps by hardware, we check the integrity and correctness of the first element of this chain. Then, before we grow the chain with a new element, a chain element that has been already checked checks this next candidate element. Figure 16.2 sketches an example. (In our 4758 work, we got rather formal about this and included hardware elements in this "chain.")
Figure 16.2 In the typical approach to system integrity checking, each element in the boot sequence checks the next before invoking it.
At the end of the process, we might have some assurance that the system is running correct, unaltered software—that is, if we have some way of knowing whether this verification process succeeded. (One will see the terms trusted boot and secure boot used for this process—sometimes as synonyms, sometimes to denote slightly different versions of this idea.)
One way to know whether verification succeeded is to add hardware that releases secrets depending on what happens. In the commercial world, the Trusted Computing Group (TCG) consortium2 has developed—and still is developing, for that matter—an architecture to implement this idea in standard commercial machines. The TCG architecture adds a trusted platform module (TPM)—a small, inexpensive chip—to the motherboard. At the first level of abstraction, we can think of the TPM as a storehouse that releases secrets, depending on the state of the TPM’s platform configuration registers (PCRs). Each PCR can contain an SHA-1 hash value but has some special restrictions regarding how it can be written.
- At boot time, the PCRs are reset to 0s.3
- If a PCR currently contains a value v, the host can extend a PCR by providing a new value w. However, rather than replacing v with w, the TPM replaces v with the hash of the concatenation of v with w:
PCR ← H (PCR || w).
This approach to "writing" PCRs allows the system to use them to securely measure software and other parameters during the boot process (see Figure 16.1). At step i – 1, the system could hash the relevant software from module i and store this hash in PCR i. Suppose that module 3 is supposed to hash to h 3 but that, in fact, the adversary has substituted an untrustworthy version that hashes instead to . If the PCRs permitted ordinary writing, nothing would stop adversarial software later from simply overwriting with h 3 in PCR 3. However, because the PCRs permit writing only via hash extension, the PCR will contain H(0 || ); if the hash function is secure, the adversary will not be able to calculate a v such that
In fact, this hash-extension approach allows the system to measure platform configuration into the PCRs using two dimensions. The system could use each PCR i to record the hash of a critical piece of the boot process. However, the system could also record a sequence of measurements within a single PCR, by successively hash-extending in each element of the sequence. By the properties of cryptographically secure hash functions, the end result of that PCR uniquely reflects that sequence of values, written in that order.
As mentioned, we can think of the TPM as essentially a place to store secrets. When we store a secret here, we can tie it to a specified subset of the PCRs and list a value for each. Subsequently, the TPM will reveal a stored secret only if each PCR in that subset has that specified value. (Note that we qualify this statement with "essentially": The actual implementation of this functionality is a bit more convoluted.) If such a secret is an RSA private key, then it can be stored with a further provision: When the PCRs are correct, the TPM will use it on request from the system but will never actually release its plaintext value.
The ability of the PCRs to reflect system configuration and the ability of the TPM to bind things such as RSA private keys to specific configurations enables several usage scenarios.
- Binding a key to a software configuration on that machine enables us to do similar things to what we did with secure coprocessors. The entity consisting of that software on that device can now authenticate itself, make verifiable statements about things, and participate in cryptographic protocols.
- If we cook things up so that we have a trusted entity that is much smaller than the entire platform, we can use a TPM-bound private key to make signed attestations about the rest of the platform configuration, as expressed by the PCRs. In the TCG architecture, this entity is part of the TPM itself, but it could also be a separate software module protected by the TPM.
Moving from a rather special-purpose and expensive device (a secure coprocessor) to a generic, ubiquitous platform (standard desktops and laptops) changes the flavor of potential applications, as well. For example, moving Yee’s partitioned-computation idea from a coprocessor to an encrypted subsystem or tables (protected by a TPM) can enable a software vendor to lock an application to a particular machine or OS. Attestation can enable an enterprise to shunt unpatched machines to a remedial network, thus promoting better network hygiene—this is called trusted network connect (TNC). Attestation might also enable a powerful corporation to monitor everything on your machine. (Of course, all these scenarios are based on the assumption that the adversary cannot subvert the TPM’s security protections!)
Realizing this approach in the real world requires worrying about exactly how to map platform configuration into the PCRs. This part of the design is rather complex and keeps changing, so we won’t bother going through it all here. The initial BIOS reports itself to a PCR; as a consequence, the BIOS can break everything and thus is called the root of trust measurement (RTM).4 Subsequently, things already measured turn around and measure other things; what they are and which PCR they get measured into appear to be determined both by platform-specific specifications and random vendor choices. Platform elements also factor into the hashes; we discovered that doing things as simple as removing a keyboard or replacing a memory card caused the PCRs to change.
As Chapter 4 described, however, the software that comprises a particular application running on a contemporary operating system is by no means a monolithic entity or even a simple stack. How to glue TPM measurements to this complex structure is an area of ongoing research. In our early work here, we introduced a level of indirection—the TPM protects a trusted kernel-based module, which in turn evaluates higher-level entities [MSMW03, MSWM03]. In contrast, our colleagues at IBM Watson extended the hash-extension idea all the way up into Linux application environments [SZJv04].
We stress again that this is an area of active research by many parties. Stay tuned. In particular, as this book goes to press, researchers have developed ways to break the security of current TPM-based PCs simply by using a wire to ground the reset line on the Low Pin Count (LPC) bus that connects the TPM to the rest of the system.
This fools the TPM into thinking that the system has rebooted, at which point, the TPM resets all its PCRs, and the host can feed it measurements that simulate booting of the system it would like to impersonate. It looks as though Bernard Kauer [Kau07] got there first, but we were the first to do it on YouTube [Spa].
Using hardware to assist with runtime checks of platform integrity is an area that has also received renewed interest lately. CoPilot, an academic project currently being commercialized, is good example of this [PFMA04]. One adds to the standard platform a separate PCI card with busmastering capabilities, so it can take over the PCI bus and probe system memory. At regular intervals, this auxiliary card probes the system memory and looks for signs of malware and corruption.
Realizing this approach in the real world requires intricate knowledge of what the system memory image should look like and requires that what image the card sees is the same reality the host CPU sees. Neither of these tasks is trivial. For example, rootkits typically attack systems not by inserting themselves into something big and relatively static, like executable code, but rather by making subtle modifications to dynamic data structures. The fact that these data structures are supposed to change makes it hard for the coprocessor to determine when bad changes have occurred. For another example, malware might restore correct-looking data structures when the coprocessor examines memory or might even maintain a decoy set of structures where the coprocessor expects to find them. Combating this latter set of issues may require using the software-based attestation ideas from earlier to establish a dynamic root of trust within the host CPU.
Strictly speaking, the runtime approach is not necessarily disjoint from the boot-time approach. As the experimental approaches we just discussed illustrate, a boot-time-verified module can easily turn around and verify changes and events during runtime. Even standard uses of a TPM can update PCRs during runtime. As we also mentioned earlier, the TCG is currently examining approaches whereby some PCRs can be reset during special conditions at runtime; such an approach could also extend to doing regular remeasurements during runtime.
So far, we’ve looked at approaches that use hardware either to directly harden the traditional computing platform or to detect tampering afterward. Ongoing research has been looking at more unconventional approaches: transforming the computation somehow so that a conventional, untrusted host does most of the work, but a smaller, trusted unit participates in such a way as to still provide the overall security property. We offer some examples.
- Cryptographic operations can lend themselves to situations in which part of the work can be blinded and then outsourced to a less trusted host. This approach might provide higher throughput and lower latency, while still protecting the private keys within a small hardware TCB.
- Our own tiny trusted third party project (e.g., [IS06]) builds on ORAM and Yao’s secure multiparty computation to compile a program into a blinded circuit, which a fast untrusted host can execute with the assistance of a small piece of special-purpose hardware. This approach might provide privacy of computational details, even if the computation doesn’t fit inside a small hardware TCB.
- In general, one might speculate about the space of functions in which calculating an answer requires significant resources, but verifying it requires very little. Can we build a method to provide integrity in such calculations, with only limited trusted hardware?
At some point, this approach starts to merge into the partitioned computation model with secure coprocessors.
16.3.4 Portable Tokens
It’s almost a cliche that computing hardware has been getting small enough and cheap enough that substantial computing power now fits in a pocket. The truth that underlies this cliche also affects hardware-based security. Putting substantial computing and memory, perhaps with physical security, in a package that users can carry around is economically feasible in many situations; the near-ubiquity of USB slots on PCs and laptops—and the emerging ubiquity of Bluetooth and other forms of near-field communication (NFC)—make interaction with the standard computing environment rather easy.
Such devices have many security applications. They can be one factor for multifactor authentication. In enterprise-wide PKI installations, users might carry and wield private keys from a portable device rather than trying to bring data around. Perhaps a user’s portable device could verify the integrity of a broader and untrusted system (e.g., [SS05]).
Such devices can also enable another type of security application: honey-tokens. Black-hat teams have penetrated enterprises by distributing "abandoned" USB memory sticks—with Trojan horses—in the parking lot. Employees of the target enterprise find the memory sticks, bring them inside the enterprise, insert them into computers, and unintentionally invoke the Trojan; the testers thus succeed in running their own code with insider privileges on inside-the-firewall systems. (See [Sta06] for more information and some commentary by such a pen tester.)