Hiding Data in Executables: Stego and Polymorphism
So far in this chapter, we have focused on Trojan horses that masquerade some sort of remote control or command shell backdoor, but that's not the full extent of what Trojan horse techniques could disguise. Beyond hidden executables for remotely taking over a system, attackers could embed hidden messages inside programs. The program looks like a nice, happy executable, but in fact contains a hidden message. Therefore, this executable fits our definition of a Trojan horse, and also acts as a covert channel for communication.
The art and science of hiding messages is called steganography, from the Greek words for hidden writing. Steganography is often referred to as stego for short. To get a feel for its use, consider this scenario. Suppose a military general wants to send the message "Attack at dawn" to another general without their mutual adversary knowing about their communication. Of course, they could just encrypt the message so the adversary wouldn't know for sure whether the message says "Attack at dawn" or "Gee, you smell funny." Still, by analyzing the traf-fic between the two generals and seeing the encrypted message sent across the network, the adversary could figure out that something significant is afoot. Traditional cryptography mathematically transforms the message so the adversary cannot read its contents, but can still see that some form of information is being exchanged. Steganography conceals the message so that the adversary doesn't even know that there is data being exchanged in the first place. Of course, clever generals would use steganography to hide a message and cryptography to transform the message just in case it is discovered. Detecting and eliminating all such covert communication is an extremely difficult endeavor.
Steganographic techniques have been used for thousands of years. However, in the field of computer science, they've really gotten a lot more attention in just the last few years. Typical computer steganogra-phy techniques hide information in pictures, such as BMP, JPEG, or GIF files. Other techniques hide information in sound files, such as MP3, WAV, or other formats. However, newer techniques stash information inside of computer executable programs without altering the program's function or size.
Hydan and Executable Steganography
In February 2003, Rakan El-Khalil released a program called Hydan to stash messages inside of executable programs written for x86 processors, such as Intel's or AMD's popular chips. The tool stores hidden information inside of executables for the Linux, Windows XP, NetBSD, FreeBSD, and OpenBSD operating systems. Available at http://www.crazy-boy.com/hydan, Hydan implements this steganography by using polymorphic coding techniques. There's that fancy-sounding word again: polymorphic. We saw it before in Chapter 2 associated with viruses, and in Chapter 3 on worms. Remember, polymorphic code simply means that you can have multiple different pieces of computer code that all do the exact same thing. By carefully selecting certain variations of that functionally equivalent code, we can transmit a message in the executable. In other words, there's more than one way to skin a cat, and Hydan embeds messages by selecting specific cat-skinning techniques. Figure 6.10 illustrates how Hydan works.
The process starts with an executable program, such as a word processor, backdoor, or operating system command. Really, any x86 executable will do. Hydan's not too picky. Hydan also needs some secret information to hide, such as a message, a picture, some other executable code, or anything else. The user feeds both the executable and the secret information into the Hydan tool. Hydan prompts the user, asking for a passphrase that can be used to encrypt the message before the stego process ensues. Hydan first encrypts the message with the blowfish encryption algorithm using this passphrase as an encryption key.
Hydan then works its magic by embedding the encrypted secret information inside the executable program. For this embedding, Hydan defines two different sets of CPU instructions that have exactly the same function, Set 0 and Set 1. For example, when you add two numbers, you can use the add or subtract instructions. You could add X and
Figure 6.10 How Hydan embeds data using polymorphic coding techniques.
Y, or you could subtract negative Y from X. If you remember your high school algebra class, these two different instructions have the exact same result. So, we could put the add instruction into Set 0 and the subtract instruction into Set 1. Hydan takes the original executable and rebuilds it by choosing instructions from Set 0 or Set 1 based on the particular bits from the secret information to hide. It looks for the first instruction in the executable that is represented in one of the sets, such as an add instruction. If a given bit to be hidden is a zero, we will choose an instruction from the Set 0 group of instructions to replace the existing instruction. If the bit is a one, we will choose a functionally equivalent instruction from Set 1.
Then, after the entire code is rebuilt with instructions from these two sets, the new executable is rewritten to the hard drive. Because each instruction in Set 0 is chosen so that it has the same size as its functionally equivalent counterpart in Set 1, the resulting executable program has exactly the same size, and exactly the same function! However, it is a brand new piece of code. Most important, by using Hydan again in reverse mode, the original secret information can be retrieved from the resulting executable if the proper passphrase is typed in.
Hydan's stego technique, implemented with polymorphic instructions, isn't the only way to hide messages, of course. Data can be embedded inside of nonexecutable files as well, such as pictures, sounds, and other data types. For these other types of files, the stego technique might alter the color or sound frequency distribution of the image or other mathematical properties to hide data, using techniques analogous to Hydan's instruction substitution. Because our focus in this book is on malware, (e.g., malicious programs), we've addressed hiding data inside of programs. For more information about stego techniques for other types of files, I highly recommend that you consult Eric Cole's book, Hiding in Plain Sight .
Hydan in Action
Look at Figure 6.11 to get a feel for Hydan in action on Linux. The Windows version of Hydan is virtually identical to this Linux version. In this example, I created a small file called hideme.txt that contains my supersecret text. I then used Hydan to embed hideme.txt inside a GUI calculator named xcalc. Note that it put 40 bytes into the file, but it could have stored up to 72 bytes. The total storage capacity of an executable is based on the number of adds and subtracts, as well as other related polymorphic instructions, in that executable. After it ran, Hydan generated a new copy of the xcalc tool, which I named xcalc-steg. This version is exactly the same size (29,874 bytes) and has the same functionality as the original xcalc. I ran a copy of the new calculator so you can see that it is, in fact, a calculator. However, this xcalc-steg also includes my hidden supersecret message. By using the hydan-decode routine, I can recover my original message, the contents of hideme.txt. So, the new calculator program is now a Trojan horse: It still runs as a program, but I could send this program to other people to transmit my secret information.
Figure 6.11 Hydan in action on Linux: Hydan encrypts and hides a message inside of a calculator program.
Hydan is capable of stashing one byte of the secret information in approximately 150 to 250 bytes of executable code, depending on the particular instructions used by that executable. That's not nearly as effi-cient as more traditional stego techniques for hiding data inside of pictures (which often get up to one byte hidden in 20 bytes of image). Still, it's not a bad ratio for hiding data.
It's also important to note that Hydan does alter the statistical distribution of instructions used in the Trojan horse executable. By creating a histogram showing how frequently various instructions are used in that executable, an investigator could determine that the program just doesn't look right. For an analogy, think of the use of various letters of the alphabet in standard English text: There are lots of es and ts, but not very many qs or zs. We could graph the relative occurrences of letters to create a histogram. By analyzing the histogram of a sample file, we could get a good feel for whether the sample is English text or something else, such as an encrypted file, an executable, or even non-English text. If the histogram matches what we'd expect for the alphabetic distribution for English, it's probably an English text file. You could do a similar analysis with x86 instructions. "Normal" programs have a certain predictable usage pattern for various instructions. There are lots of add and move instructions, but somewhat fewer subtracts. In this way, an analyst or automated tool might be able to detect the presence of hidden data in an executable without knowing what that hidden data is. This statistical analysis technique would certainly work, but no current tool is available for such analysis on executable programs. For similar types of analysis of images with hidden data, however, there is a popular analysis tool called StegDetect by Niels Provos available at http://www.outguess.org/detection.php.
You might be wondering what an attacker could do with a Hydan-generated program containing hidden text. There are several possibilities, including the following:
Hiding Information for Covert Communication: Two people might have login access to a single machine somewhere on the Inter-net. One user could cram secret information inside a user program, service, or even a kernel module and install the resulting program on the shared machine. The other user could log in, analyze the appropriate executable, and retrieve the message. An eavesdropper looking to see if the two parties are communicating might not notice this subtle covert channel.
Watermarking or Signing an Executable: By using Hydan, a software developer could mark an executable with an identification code unique to that instance of the program so that a copy of the program can be easily correlated with the original. Furthermore, by using Hydan to embed a digital signature inside the executable, a user can verify that he or she was the author of an executable. Suppose I'm a software vendor. If I ever want to prove that I was the one who compiled a particular version of a program, I can digitally sign a document saying so, and then embed this document inside of the executable itself. When I want to prove that I compiled the executable, I could extract the document and show that it was signed with my own key. This technique could be applied to copyrighting mechanisms and digital rights management for executables.
Evading Signatures: Finally, and perhaps most ominously, the technique could be extended to implement evasion of signature-based antivirus tools and network-based IDS tools. Many anti-virus and IDS tools look for specific sequences of bits to identify malicious software. By using the polymorphic techniques included in Hydan, an attacker can morph an executable so that it no longer matches the signatures and therefore evades detection. It's important to note that Hydan doesn't yet do this. It lacks enough different types of polymorphic substitutions to do effective signature evasion. When Hydan is used, enough of the original program survives so that signature matching still works. However, in the near future, these Hydan concepts could be extended to achieve true signature evasion ... stay tuned!
To check if someone has been altering your critical executables with a tool like Hydan, you really need to use a file integrity checking tool, such as Tripwire, AIDE, or Osiris. We'll discuss these tools briefly here, but will cover them in far more depth in Chapter 7 when we deal with RootKits. At this point, though, we need to note that these file integrity checking tools create a database of hashes of your critical system files, which you can store on secure media (e.g., a write-protected floppy disc or write-once CD-ROM). Then, you run a check against this database on a regular basis (every hour, day, or week) to see if someone has altered your files. If you spot changes, you need to figure out whether a system administrator or an attacker made them. If an attacker tries to use Hydan to embed data in any of your critical executables, you'll notice the change the next time you run the file integrity checker. Of course, this technique will only detect problems associated with those programs that you actually analyze with the file integrity checking tool, such as your operating system commands and important applications. Changes to any other programs on your system would fly under your file integrity checking radar.