What Is Surreptitious Software?
In this first chapter we will talk about the basic techniques used to protect secrets stored in software, namely obfuscation, watermarking, tamperproofing, and birth-marking. These techniques have many interesting applications, such as the use of obfuscation and tamperproofing to protect media in digital rights management systems. What we think you will find particularly interesting is that obfuscation and the three other techniques “solve” problems that traditional computer security and cryptography can’t touch. We put “solve” in quotation marks because there are no known algorithms that provide complete security for an indefinite amount of time. At the present time, the best we can hope for is to be able to extend the time it takes a hacker to crack our schemes. You might think that this seems highly unsatisfactory—and you’d be right—but the bottom line is that there are interesting applications for which no better techniques are known.
1.1 Setting the Scene
When you hear the term computer security, you probably imagine a scenario where a computer (owned by a benign user we’ll call Alice) is under attack from an evil hacker (we’ll call him Bob), or from the viruses, worms, Trojan horses, rootkits, and keyloggers that he’s created. The goal of computer security research is to devise techniques for building systems that prevent Bob from taking over Alice’s computer or that alert her when he does. The basic idea behind such techniques is to restrict what Bob can do on Alice’s computer without unduly restricting what she can do herself. For example, a network firewall allows Alice to access other computers on the network but restricts the ways in which Bob can access hers. An intrusion detection system analyzes the network access patterns on Alice’s computer and alerts her if Bob appears to be doing something unusual or suspicious. A virus scanner refuses to run Bob’s program unless it can convince itself that the program contains no harmful code. In other words, Alice adds protective layers around her computer to prevent someone from entering, to detect that someone has entered, or to stop someone from doing harm once they’ve entered:
Now what happens if we invert the situation? What if, instead of Bob sending an evil program to penetrate the defenses around Alice’s computer, we have a software developer, Doris, who sends or sells Axel1 a benign program to run? To make this interesting, let’s assume that Doris’s program contains some secret S and that Axel can gain some economic advantage over Doris by extracting or altering S:
The secret could be anything: a new super-duper algorithm that makes Doris program much faster than Axel’s that he would love to get his hands on; the overall architecture of her program, which would be useful to Axel as he starts building his own; a cryptographic key that is used to unlock some media in a digital rights management system; or a license check that prevents Axel from running the program after a certain period of time. What can Doris do to protect this secret?
At first blush, you might think that cryptography would solve the problem, since, after all, cryptography is concerned with protecting the confidentiality of data. Specifically, a cryptographic system scrambles a cleartext S into a cryptotext EK(S) so that it can’t be read without access to a secret key K:
So why doesn’t Doris just protect the secret she has stored in her program by encrypting the program before selling it to Axel? Unfortunately, this won’t work, since Axel needs to be able to execute the program and hence, at some point, it—and Doris’ secret—must exist in cleartext!
What makes software protection so different from cryptography and standard computer security is that once Axel has access to Doris’ program, there is no limit to what he can do to it: He can study its code (maybe first disassembling or decompiling it); he can execute the program to study its behavior (perhaps using a debugger); or he can alter the code to make it do something different than what the original author intended (such as bypassing a license check).
There are three components to a typical attack in a software protection scenario against Doris’ program P, namely, analysis, tampering, and distribution:
Axel starts by analyzing P, extracting algorithms, design, and other secrets such as cryptographic keys or the location of license-checking code. Next, he modifies Doris’ code (he may, for example, remove the license check) or incorporates pieces of it into his own program. Finally, Axel distributes the resulting program, thereby violating Doris’ intellectual property rights.
There are many variants of this scenario, of course. Axel could remove a license check without redistributing the hacked program and just enjoy it for his own pleasure. He could resell the program along with a known license password, without ever having to tamper with the code. Finally, he could decompile and analyze the program to verify its safety (for example, that it doesn’t contain damaging viruses or spyware, or, in the case of voting software, that it correctly counts every vote), without using this information to improve on his own programs. While these attacks occur in a variety of guises, they’re all based on the following observation: Once a program leaves the hands of its author, any secrets it contains become open to attack.
In the scenarios we study, there is usually some financial motive for Axel to extract or alter information in the program. There is also typically a certain period of time during which Doris wants to protect this information. It is the goal of software protection to provide technical means for keeping the valuable information safe from attack for this period of time. A computer game developer, for example, may be happy if software protection prevents his program from being pirated for a few extra weeks, since most of the revenue is generated during a short time period after the release.
In a typical defense scenario, Doris adds confusion to her code to make it more difficult for Axel to analyze, tamper-protection to prevent him from modifying it, and finally marks the code (for example, with her copyright notice or Axel’s unique identifier) to assert her intellectual property rights:
In this book we will consider five methods for Doris to protect her program: Code obfuscation for preventing analysis; software watermarking, fingerprinting, and birth-marking for detecting and tracing illegal distribution; and software- and hardware-based protection against tampering.
Although the primary motivation for the techniques developed in software protection has been protecting the secrets contained within computer programs, they also have applications to protecting the distribution chain of digital media (digital rights management), protecting against computer viruses, steganographic transfer of secret messages, and protecting against cheating in online computer games. We will also show how these techniques can be used maliciously to create stealthy computer viruses and to cheat in computer-based voting.
Software protection is related both to computer security and cryptography, but it has most in common with steganography, the branch of cryptography that studies how to transfer a secret stealthily. This is often illustrated by the so-called prisoners’ problem. Here, Alice and Bob are planning a prison break by passing notes through their warden, Wendy:
Of course, if Wendy finds that a purported love note mentions a prison break, she will immediately stop any further messages and put Alice and Bob in solitary confinement. So what can the two conspirators do? They can’t use cryptography, since as soon as Wendy sees a garbled message she will become suspicious and put an end to further communication. Instead, they must communicate surreptitiously, by sending their secrets hidden inside innocuous-looking messages. For example, Alice and Bob could agree on a scheme where the hidden message (the payload) is hidden in the first letter of each sentence in the cover message:
Easter is soon, dear! So many flowers! Can you smell them? Are you cold at night? Prison food stinks! Eat well, still! Are you lonely? The prison cat is cute! Don't worry! All is well! Wendy is nice! Need you! ): |
This is called a null cipher. There are many other possible types of cover messages. For example, Alice could send Bob a picture of the prison cat in which she has manipulated the low-order bits to encode the payload. Or she could send him an mp3-file of their favorite love song in which she has added inaudible echoes—a short one for every 0-bit of the payload, a longer one for every 1-bit. Or she could subtly manipulate the line spacing in a pdf-file, 12.0 points representing a 0, 12.1 points representing a 1. Or she could be even sneakier and ask Wendy to pass along a Tetris program she’s written to help Bob while away the long hours in solitary. However, unbeknownst to Wendy, the program not only plays Tetris, but inside its control or data structures Alice has hidden the secret payload detailing their escape plan. In this book we will consider exactly this scenario and many like it. We call a program that contains both a secret and any technique for preventing an attack against this secret surreptitious software.