In this section, we dissect a number of fuzzing frameworks to gain an understanding of what already exists. We do not cover every available fuzzing framework, but instead, we examine a sampling of frameworks that represent a range of different methodologies. The following frameworks are listed in a general order of maturity and feature richness, starting with the most primitive.
Written in Python, antiparser is an API designed to assist in the creation of random data specifically for the construction of fuzzers. This framework can be used to develop fuzzers that will run across multiple platforms as the framework depends solely on the availability of a Python interpreter. Use of the framework is pretty straightforward. An instance of the antiparser class must first be created; this class serves as a container. Next, antiparser exposes a number of fuzz types that can be instantiated and appended to the container. Available fuzz types include the following:
- apChar(): An eight-bit C character.
- apCString(): A C-style string; that is, an array of characters terminated by a null byte.
- apKeywords(): A list of values each coupled with a separator, data block, and terminator.
- apLong(): A 32-bit C integer.
- apShort(): A 16-bit C integer.
- apString(): A free form string.
Of the available data types, apKeywords() is the most interesting. With this class, you define a list of keywords, a block of data, a separator between keywords and the data, and finally an optional data block terminator. The class will generate data in the format [keyword] [separator] [data block] [terminator].
antiparser distributes an example script, evilftpclient.py, that leverages the apKeywords() data type. Let's examine portions of the script to gain a better understanding of what development on this framework looks like. The following Python code excerpt shows the relevant portions of evilftpclient.py responsible for testing an FTP daemon for format string vulnerabilities in the parsing of FTP verb arguments. This excerpt does not display the functionality for authenticating with the target FTP daemon, for example. For a complete listing, refer to the source.
from antiparser import * CMDLIST = ['ABOR', 'ALLO', 'APPE', 'CDUP', 'XCUP', 'CWD', 'XCWD', 'DELE', 'HELP', 'LIST', 'MKD', 'XMKD', 'MACB', 'MODE', 'MTMD', 'NLST', 'NOOP', 'PASS', 'PASV', 'PORT', 'PWD', 'XPWD', 'QUIT', 'REIN', 'RETR', 'RMD', 'XRMD', 'REST', 'RNFR', 'RNTO', 'SITE', 'SIZE', 'STAT', 'STOR', 'STRU', 'STOU', 'SYST', 'TYPE', 'USER'] SEPARATOR = " " TERMINATOR = "\r\n" for cmd in CMDLIST: ap = antiparser() cmdkw = apKeywords() cmdkw.setKeywords([cmd]) cmdkw.setSeparator(SEPARATOR) cmdkw.setTerminator(TERMINATOR) cmdkw.setContent(r"%n%n%n%n%n%n%n%n%n%n%n%n%n%n%n%n") cmdkw.setMode('incremental') cmdkw.setMaxSize(65536) ap.append(cmdkw) sock = apSocket() sock.connect(HOST, PORT) # print FTP daemon banner print sock.recv(1024) # send fuzz test sock.sendTCP(ap.getPayload()) # print FTP daemon response print sock.recv(1024) sock.close()
The code begins by importing all the available data types and classes from the antiparser framework, defining a list of FTP verbs, the verb argument separator character (space), and the command termination sequence (carriage return followed by newline). Each listed FTP verb is then iterated and tested separately, beginning with the instantiation of a new antiparser container class. Next, the apKeywords() data type comes into play. A list defining a single item, the current verb being tested, is specified as the keywords (keyword in this case). The appropriate verb–argument separation and command termination characters are then defined. The data content for the created apKeyword() object is set to a sequence of format string tokens. If the target FTP server exposes a format string vulnerability in the parsing of verb arguments, this data content will surely trigger it.
The next two calls, setMode('incremental') and setMaxSize(65536), specify that on permutation the data block should grow sequentially to the maximum value of 65,336. However, in this specific case, those two calls are irrelevant, as the fuzzer does not loop through a number of test cases or permutations by calling ap.permute(). Instead, each verb is tested with a single data block.
The remaining lines of code are mostly self-explanatory. The single data type, apKeywords(), is appended to the antiparser container and a socket is created. Once a connection has been established, the test case is generated in the call to ap.getPayload() and transmitted via sock.sendTCP().
It is obvious that antiparser has a number of limitations. The example FTP fuzzer can be easily reproduced in raw Python without assistance from the framework. The ratio of framework-specific code versus generic code when developing fuzzers on antiparser is significantly low in comparison to other frameworks. The framework also lacks many of the desired automations listed in the previous section, such as the ability to automatically calculate and represent the common TLV protocol format. On a final note, the documentation for this framework is slim and, unfortunately, only a single example is available as of version 2.0, which was released in August 2005. Although this framework is simple and provides some benefits for generating simple fuzzers, it is inadequate for handling complex tasks.
Written in C by Diego Bauche, Dfuz is an actively maintained and frequently updated fuzzing framework. This framework has been used to uncover a variety of vulnerabilities affecting vendors such as Microsoft, Ipswitch, and RealNetworks. The source code is open and available for download but a restrictive license prevents any duplication or modification without express permission of the author. Depending on your needs, this limited license may dissuade you from utilizing the framework. The motivating factor behind this rigid licensing appears to be that the author is unhappy with the quality of his own code (according to the README file). It might be worth a shot to contact him directly if you wish to reuse portions of his code. Dfuz was designed to run on UNIX/Linux and exposes a custom language for developing new fuzzers. This framework is not the most advanced fuzzing framework dissected in this chapter. However, its simple and intuitive design makes it a good framework design case study, so let's examine it in detail.
The basic components that comprise Dfuz include data, functions, lists, options, protocols, and variables. These various components are used to define a set of rules that the fuzzing engine can later parse to generate and transmit data. The following familiar and simple syntax is used to define variables within rule files:
var my_variable = my_data var ref_other = "1234",$my_variable,0x00
Variables are defined with the simple prefix var and can be referenced from other locations by prefixing the variable name with a dollar sign ($, like in Perl or PHP). Fuzzer creation is entirely self-contained. This means that unlike antiparser, for example, building a fuzzer on top of this framework is done entirely in its custom scripting language.
Dfuz defines various functions to accomplish frequently needed tasks. Functions are easily recognizable as their names are prefixed with the percent character (%). Defined functions include the following:
- %attach(): Wait until the Enter or Return key is pressed. This is useful for pausing the fuzzer to accomplish another task. For example, if your fuzzing target spawns a new thread to handle incoming connections and you wish to attach a debugger to this new thread, insert a call to %attach() after the initial connection is established, then locate and attach to the target thread.
- %length() or %calc(): Calculate and insert the size of the supplied argument in binary format. For example, %length("AAAA") will insert the binary value 0x04 into the binary stream. The default output size for these functions is 32 bits, but this can be modified to 8 bits by calling %length:uint8() or to 16 bits by calling %length:uint16().
- %put:<size>(number): Insert the specified number into the binary stream. The size can be specified as one of byte, word, or dword, which are aliases of uint8, uint16, and uint32, respectively.
- %random:<size>(): Will generate and insert a random binary value of the specified size. Similar to %put(), size can be specified as one of byte, word, or dword, which are aliases of uint8, uint16, and uint32, respectively.
- %random:data(<length>,<type>): Generate and insert random data. Length specifies the number of bytes to generate. Type specifies the kind of random data to generate and can be specified as one of ascii, alphanum, plain, or graph.
- %dec2str(num): Convert a decimal number to a string and insert it into the binary stream. For example, %dec2str(123) generates 123.
- %fry(): Randomly modify the previously defined data. The rule "AAAA",%fry(), for example, will cause a random number of the characters in the string "AAAA" to be replaced with a random byte value.
- %str2bin(): Parses a variety of hexadecimal string representations into their raw value. For example: 4141, 41 41, and 41-41 would all translate to AA.
Data can be represented in a number of ways. The custom scripting language supports syntax for specifying strings, raw bytes, addresses, data repetitions, and basic data looping. Multiple data definitions can be strung together to form a simple list by using the comma separator. The following examples demonstrate a variety of ways that data can be defined (refer to the documentation for further details):
var my_variable1 = "a string" var my_variable2 = 0x41,|0xdeadbeef|,[Px50],[\x41*200],100
Lists are declared with the keyword list followed by the list name, the keyword begin, a list of data values separated by newlines, and are finally terminated with the keyword end. A list can be used to define and index a sequence of data. For example:
list my_list: begin some_data more_data even_more_data end
Like variables, lists can be referenced from other locations by prefixing the list name with a dollar sign ($). With similar syntax to other scripting languages, such as Perl and PHP, a list member can be indexed through square brackets: $my_list. Random indexes within a list are also supported through the rand keyword: $my_list[rand].
A number of options for controlling the behavior of the overall engine exist, including the following:
- keep_connecting: Continue fuzzing the target even if a connection cannot be established.
- big_endian: Changes the byte order of data generation to big endian (little endian is the default).
- little_endian: Changes the byte order of data generation to little endian (the default).
- tcp: Specifies that socket connections should be established over TCP.
- udp. Specifies that socket connections should be established over UDP.
- client_side: Specifies that the engine will be fuzzing a server and thereby acting as a client.
- server_side: Specifies that the engine will be fuzzing a client and thereby acting as a server waiting for a connection.
- use_stdout: Generate data to standard output (console) as opposed to a socket-connected peer. This option must be coupled with a host value of "stdout."
To ease the burden of reproducing frequently fuzzed protocols, Dfuz can emulate the FTP, POP3, Telnet, and Server Message Block (SMB) protocols. This functionality is exposed through functions such as ftp:user(), ftp:pass(), ftp:mkd(), pop3:user(), pop3:pass(), pop3:dele(), telnet:user(), telnet:pass(), smb:setup(), and so on. Refer to the Dfuz documentation for a complete listing.
These basic components must be combined with some additional directives to create rule files. As a simple, yet complete example, consider the following rule file (bundled with the framework) for fuzzing an FTP server:
port=21/tcp peer write: @ftp:user("user") peer read peer write: @ftp:pass("pass") peer read peer write: "CWD /", %random:data(1024,alphanum), 0x0a peer read peer write: @ftp:quit() peer read repeat=1024 wait=1 # No Options
The first directive specifies that the engine must connect over TCP port 21. As no options are specified, it defaults to behaving as a client. The peer read and peer write directives indicate to the engine when data should be read from and written to the fuzz target, respectively. In this specific rule file, the FTP protocol functionality is used to authenticate with the target FTP server. Next, the Change Working Directory (CWD) command is manually constructed and transmitted to the server. The CWD command is fed 1,024 random bytes of alphanumeric data followed by a terminating newline (0x0a). Finally the connection is closed. The final repeat directive specifies that the peer read and write block should be executed 1,024 times. With every test case, Dfuz will establish an authenticated connection with the FTP server, issue a CWD command with a random 1,024-byte alphanumeric string as the argument, and tear down the connection.
Dfuz is a simple and powerful fuzzing framework that can be used to replicate and fuzz many protocols and file formats. The combination of stdout (standard output) support with some basic command-line scripting can transform this framework into a file format, environment variable, or command-line argument fuzzer as well. Dfuz has a relatively quick learning curve and fast development time. The fact that fuzzer development is accomplished entirely in its own scripting language is a double-edged sword. It is a positive in that nonprogrammers can describe and fuzz protocols on this framework and it is a negative in that experienced programmers cannot leverage the innate powers and features exposed by a mature programming language. Dfuz promotes some code reuse, but not nearly as much as other frameworks, such as Peach. A key feature currently lacking is the availability of an intelligent set of attack heuristics. Overall, Dfuz is an interesting case study for a well-designed fuzzing framework and a good tool to keep in the back pocket.
SPIKE, written by Dave Aitel, is probably the most widely used and recognized fuzzing framework. SPIKE is written in C and exposes an API for quickly and efficiently developing network protocol fuzzers. SPIKE is open source and released under the flexible GNU General Public License (GPL).7 This favorable licensing has allowed for the creation of SPIKEfile, a repurposed version of the framework designed specifically for file format fuzzing (see Chapter 12, "File Format Fuzzing: Automation on UNIX"). SPIKE utilizes a novel technique for representing and thereafter fuzzing network protocols. Protocol data structures are broken down and represented as blocks, also referred to as a SPIKE, which contains both binary data and the block size. Block-based protocol representation allows for abstracted construction of various protocol layers with automatic size calculations. To better understand the block-based concept, consider the following simple example from the whitepaper "The Advantages of Block-Based Protocol Analysis for Security Testing":8
s_block_size_binary_bigendian_word("somepacketdata"); s_block_start("somepacketdata") s_binary("01020304"); s_block_end("somepacketdata");
This basic SPIKE script (SPIKE scripts are written in C) defines a block named somepacketdata, pushes the four bytes 0x01020304 into the block and prefixes the block with the block length. In this case the block length would be calculated as 4 and stored as a big endian word. Note that most of the SPIKE API is prefixed with either s_ or spike_. The s_binary() API is used to add binary data to a block and is quite liberal with its argument format, allowing it to handle a wide variety of copied and pasted inputs such as the string 4141 \x41 0x41 41 00 41 00. Although simple, this example demonstrates the basics and overall approach of constructing a SPIKE. As SPIKE allows blocks to be embedded within other blocks, arbitrarily complex protocols can be easily broken down into their smallest atoms. Expanding on the previous example:
s_block_size_binary_bigendian_word("somepacketdata"); s_block_start("somepacketdata") s_binary("01020304"); s_blocksize_halfword_bigendian("innerdata"); s_block_start("innerdata"); s_binary("00 01"); s_binary_bigendian_word_variable(0x02); s_string_variable("SELECT"); s_block_end("innerdata"); s_block_end("somepacketdata");
In this example, two blocks are defined, somepacketdata and innerdata. The latter block is contained within the former block and each individual block is prefixed with a size value. The newly defined innerdata block begins with a static two-byte value (0x0001), followed by a four-byte variable integer with a default value of 0x02, and finally a string variable with a default value of SELECT. The s_binary_bigendian_word_variable() and s_string_variable() APIs will loop through a predefined set of integer and string variables (attack heuristics), respectively, that have been known in the past to uncover security vulnerabilities. SPIKE will begin by looping through the possible word variable mutations and then move on to mutating the string variable. The true power of this framework is that SPIKE will automatically update the values for each of the size fields as the various mutations are made. To examine or expand the current list of fuzz variables, look at SPIKE/src/spike.c. Version 2.9 of the framework contains a list of almost 700 error-inducing heuristics.
Using the basic concepts demonstrated in the previous example, you can begin to see how arbitrarily complex protocols can be modeled in this framework. A number of additional APIs and examples exist. Refer to the SPIKE documentation for further information. Sticking to the running example, the following code excerpt is from an FTP fuzzer distributed with SPIKE. This is not the best showcase of SPIKE's capabilities, as no blocks are actually defined, but it helps to compare apples with apples.
s_string("HOST "); s_string_variable("10.20.30.40"); s_string("\r\n"); s_string_variable("USER"); s_string(" v); s_string_variable("bob"); s_string("\r\n"); s_string("PASS "); s_string_variable("bob"); s_string("\r\n"); s_string("SITE "); s_string_variable("SEDV"); s_string("\r\n"); s_string("ACCT "); s_string_variable("bob"); s_string("\r\n"); s_string("CWD "); s_string_variable("."); s_string("\r\n"); s_string("SMNT "); s_string_variable("."); s_string("\r\n"); s_string("PORT "); s_string_variable("1"); s_string(","); s_string_variable("2"); s_string(","); s_string_variable("3"); s_string(","); s_string_variable("4"); s_string(","); s_string_variable("5"); s_string(","); s_string_variable("6"); s_string("\r\n");
SPIKE is sporadically documented and the distributed package contains many deprecated components that can lead to confusion. However, a number of working examples are available and serve as excellent references for familiarizing with this powerful fuzzing framework. The lack of complete documentation and disorganization of the distribution package has led some researchers to speculate that SPIKE is purposefully broken in a number of areas to prevent others from uncovering vulnerabilities privately discovered by the author. The veracity of this claim remains unverified.
Depending on your individual needs, one major pitfall of the SPIKE framework is the lack of support for Microsoft Windows, as SPIKE was designed to run in a UNIX environment, although there are mixed reports of getting SPIKE to function on the Windows platform through Cygwin.9 Another factor to consider is that even minor changes to the framework, such as the addition of new fuzz strings, require a recompilation. On a final negative note, code reuse between developed fuzzers is a manual copy-and-paste effort. New elements such as a fuzzer for e-mail addresses cannot simply be defined and later referenced globally across the framework.
Overall, SPIKE has proven to be effective and has been used by both its author and others to uncover a variety of high-profile vulnerabilities. SPIKE also includes utilities such as a proxy, allowing a researcher to monitor and fuzz communications between a browser and a Web application. SPIKE's fault-inducing capabilities have gone a long way in establishing the value of fuzzing on a whole. The block-based approach to fuzzing has gained popularity evident in that since the initial public release of SPIKE, a number of fuzzing frameworks have adopted the technique.
Peach, released by IOACTIVE, is a cross-platform fuzzing framework written in Python and originally released in 2004. Peach is open source and openly licensed. Compared with the other available fuzzing frameworks, Peach's architecture is arguably the most flexible and promotes the most code reuse. Furthermore, in the author's opinion, it has the most interesting name (peach, fuzz—get it?). The framework exposes a number of basic components for constructing new fuzzers, including generators, transformers, protocols, publishers, and groups.
Generators are responsible for generating data ranging from simple strings to complex layered binary messages. Generators can be chained together to simplify the generation of complex data types. Abstraction of data generation into its own object allows for easy code reuse across implemented fuzzers. Consider, for example, that an e-mail address generator was developed during an SMTP server audit. That generator can be transparently reused in another fuzzer that requires generation of e-mail addresses.
Transformers change data in a specific way. Example transformers might include a base64 encoder, gzip, and HTML encoding. Transformers can also be chained together and can be bound to a generator. For example, a generated e-mail address can be passed through a URL-encoding transformer and then again through a gzip transformer. Abstraction of data transformation into its own object allows for easy code reuse across implemented fuzzers. Once a given transformation is implemented, it can be transparently reused by all future developed fuzzers.
Publishers implement a form of transport for generated data through a protocol. Example publishers include file publishers and TCP publishers. Again, the abstraction of this concept into its own object promotes code reuse. Although not possible in the current version of Peach, the eventual goal for publishers is to allow transparent interfacing with any publisher. Consider, for example, that you create a GIF image generator. That generator should be able to publish to a file or post to a Web form by simply swapping the specified publisher.
Groups contain one or more generators and are the mechanism for stepping through the values that a generator is capable of producing. Several stock group implementations are included with Peach. An additional component, the Script object, is a simple abstraction for reducing the amount of redundant code required for implementing looping through data through calls to group.next() and protocol.step().
As a complete, but simple example, consider the following Peach fuzzer designed to brute force the password of an FTP user from a dictionary file:
from Peach import * from Peach.Transformers import * from Peach.Generators import * from Peach.Protocols import * from Peach.Publishers import * loginGroup = group.Group() loginBlock = block.Block() loginBlock.setGenerators(( static.Static("USER username\r\nPASS "), dictionary.Dictionary(loginGroup, "dict.txt"), static.Static("\r\nQUIT\r\n") )) loginProt = null.NullStdout(ftp.BasicFtp('127.0.0.1', 21), loginBlock) script.Script(loginProt, loginGroup, 0.25).go()
The fuzzer begins by importing the various components of the Peach framework. Next, a new block and group is instantiated. The block is defined to pass a username and then the verb of the password command. The next element of the block imports a dictionary of potential passwords. This is the block element that will be iterated during fuzzing. The final element of the block terminates the password command and issues the FTP quit command to disconnect from the server. A new protocol is defined, extending from the already available FTP protocol. Finally, a script object is created to orchestrate the looping of connections and iterations through the dictionary. The first thought that might come to mind after looking over this script is that interfacing with the framework is not very intuitive. This is a valid criticism and probably the biggest pitfall of the Peach framework. Developing your first fuzzer in Peach will definitely take you longer than developing your first fuzzer in Autodafé or Dfuz.
The Peach architecture allows a researcher to focus on the individual subcomponents of a given protocol, later tying them together to create a complete fuzzer. This approach to fuzzer development, although arguably not as fast as the block-based approach, certainly promotes the most code reuse of any other fuzzing framework. For example, if a gzip transformer must be developed to test an antivirus solution, then it becomes available in the library for transparent use later on to test an HTTP server's capability of handling compressed data. This is a beautiful facet of Peach. The more you use it, the smarter it gets. Thanks to its pure Python implementation, a Peach fuzzer can be run from any environment with a suitable Python installation. Furthermore, by leveraging existing interfaces such as Python's, Microsoft's COM,11 or Microsoft .NET packages, Peach allows for direct fuzzing of ActiveX controls and managed code. Examples are also provided for directly fuzzing Microsoft Windows DLLs as well as embedding Peach into C/C++ code for creating instrumented clients and servers.
Peach is under active development and as of the time of publication the latest available version is 0.5 (released in April 2006). Although Peach is highly advanced in theory, it is unfortunately not thoroughly documented, nor is it widely used. This lack of reference material results in a substantial learning curve that might dissuade you from considering this framework. The author of Peach has introduced some novel ideas and created a strong foundation for expansion. On a final note, a Ruby port of the Peach framework has been announced, although no further details were available at of the time of writing.
General Purpose Fuzzer12
General Purpose Fuzzer (GPF), released by Jared DeMott of Applied Security, is named as a play on words on the commonly recognized term general protection fault. GPF is actively maintained, available as open source under the GPL license, and is developed to run on a UNIX platform. As the name implies, GPF is designed as a generic fuzzer; unlike SPIKE, it can generate an infinite number of mutations. This is not to say that generation fuzzers are superior to heuristic fuzzers, as both methodologies have pros and cons. The major advantage of GPF over the other frameworks listed in this section is the low cost of entry in getting a fuzzer up and running. GPF exposes functionality through a number of modes, including PureFuzz, Convert, GPF (main mode), Pattern Fuzz, and SuperGPF.
PureFuzz is an easy-to-use purely random fuzzer, similar to attaching /dev/urandom to a socket. Although the generated input space is unintelligent and infinite, this technique has discovered vulnerabilities in the past, even in common enterprise software. The main advantage of PureFuzz over the netcat and /dev/urandom combination is that PureFuzz exposes a seed option allowing for pseudo-random stream replay. Furthermore, in the event that PureFuzz is successful, the specific packets responsible for causing an exception can be pinpointed through the use of a range option.
Convert is a GPF utility that can translate libpcap files, such as those generated by Ethereal14 and Wireshark,15 into a GPF file. This utility alleviates some tedium from the initial stages of protocol modeling by converting the binary pcap (packet capture) format into a human-readable and ready-to-modify text-based format.
In GPF's main mode, a GPF file and various command-line options are supplied to control a variety of basic protocol attacks. Captured traffic is replayed, portions of which are mutated in various ways. Mutations include insertion of progressively larger string sequences and format string tokens, byte cycling, and random mutations, among others. This fuzzing mode is manually intensive as the bulk of its logic is built on the analyst's instinct.
Pattern Fuzz (PF) is the most notable GPF mode due to its ability to automatically tokenize and fuzz detected plain text portions of protocols. PF examines target protocols for common ASCII boundaries and field terminators and automatically fuzzes them according to set of internal rules. The internal rules are defined in C code named tokAids. The ASCII mutation engine is defined as a tokAid (normal_ascii) and there are a few others (e.g., DNS). To accurately and intelligently model and fuzz a custom binary protocol, a tokAid must be written and compiled.
SuperGPF is a Perl script GPF wrapper written to address situations where a specific socket endpoint has been targeted for fuzzing but the researcher has no idea where to begin. SuperGPF combines a GPF capture with a text file containing valid protocol commands and generates thousands of new capture files. The script then starts multiple instances of GPF in varying fuzz modes to bombard the target with a wide variety of generated data. SuperGPF is limited to the fuzzing of ASCII protocols only.
Again, we provide an example GPF script to compare and contrast with previously shown FTP fuzzers:
Source:S Size:20 Data:220 (vsFTPd 1.1.3) Source:C Size:12 Data:USER fuzzy Source:S Size:34 Data:331 Please specify the password. Source:C Size:12 Data:PASS wuzzy Source:S Size:33 Data:230 Login successful. Have fun. Source:C Size:6 Data:QUIT Source:S Size:14 Data:221 Goodbye.
The biggest setback to working with GPF is its complexity. The learning curve required to get started with GPF is significant. Modeling proprietary binary protocols with tokAids is not as simple as other alternatives, such as SPIKE's block-based approach, and furthermore requires compilation to use. Finally, a heavy dependence on command-line options results in unwieldy commands such as the following:
GPF ftp.gpf client localhost 21 ? TCP 8973987234 100000 0 + 6 6 100 100 5000 43 finish 0 3 auto none -G b
Overall, GPF is a valuable tool due to both its flexibility and extensibility. The varying modes allow a researcher to start fuzzing quickly while working on setting up a second and more intelligent wave of attacks. The capability to automatically process and fuzz ASCII protocols is powerful and stands out among other frameworks. As of the time of writing, the author of GPF is currently developing a new fuzzing mode that leverages the fundamentals of evolutionary computing to automatically dissect and intelligently fuzz unknown protocols. This advanced approach to fuzzing is discussed in more detail in Chapter 22, "Automated Protocol Dissection."
Quite simply, Autodafé can be described as the next generation of SPIKE and can be used to fuzz both network protocols and file formats. Autodafé is released under the flexible GNU GPL by Martin Vuagnoux and is designed to run on UNIX platforms. Similar to SPIKE, Autodafé's core fuzzing engine takes a block-based approach for protocol modeling. The following excerpt from the whitepaper "Autodafé, an Act of Software Torture"17 demonstrates the block-based language used by Autodafé and will look familiar to SPIKE users:
string("dummy"); /* define a constant string */ string_uni("dummy"); /* define a constant unicode string */ hex(0x0a 0a \x0a); /* define a hexadecimal constant value */ block_begin("block"); /* define the beginning of a block */ block_end("block"); /* define the end of a block */ block_size_b32("block"); /* 32 bits big-endian size of a block */ block_size_l32("block"); /* 32 bits little-endian size of a block */ block_size_b16("block"); /* 16 bits big-endian size of a block */ block_size_l16("block"); /* 16 bits little-endian size of a block */ block_size_8("block"); /* 8 bits size of a block */ block_size_8("block"); /* 8 bits size of a block */ block_size_hex_string("a"); /* hexadecimal string size of a block block_size_dec_string("b"); /* decimal string size of a block */ block_crc32_b("block"); /* crc32 of a block in big-endian */ block_crc32_l("block"); /* crc32 of a block in little-endian */ send("block"); /* send the block */ recv("block"); /* receive the block */ fuzz_string("dummy"); /* fuzz the string "dummy" */ fuzz_string_uni("dummy"); /* fuzz the unicode string "dummy" */ fuzz_hex(0xff ff \xff); /* fuzz the hexadecimal value */
This functionality is deceivingly simple as with it, it is possible to represent the majority of binary and plain text protocol formats.
The main goal of the Autodafé framework is to reduce the size and complexity of the total fuzzing input space to more efficiently focus on areas of the protocol that are likely to result in the discovery of security vulnerabilities. Calculating the input space, or complexity, for a complete fuzz audit is simple. Consider the following simple Autodafé script:
fuzz_string("GET"); string(" /"); fuzz_string("index.html"); string(" HTTP/1.1"); hex(0d 0a);
Once launched against a target Web server, this script will first iterate through a number of HTTP verb mutations followed by a number of verb argument mutations. Assume that Autodafé contains a fuzz string substitution library with 500 entries. The total number of test cases required to complete this audit is 500 times the number of variables to fuzz, for a total of 1,000 test cases. In most real-world cases, the substitution library will be at least double that size and there could be hundreds of variables to fuzz. Autodafé applies an interesting technique named Markers Technique to provide a weight to each fuzz variable. Autodafé defines a marker as data, string or numeric, that is controllable by the user (or fuzzer). The applied weights are used to determine the order in which to process the fuzz variables, focusing on those that are more likely to result in the discovery of security vulnerabilities.
To accomplish this task, Autodafé includes a debugger component named adbg. Debugging technologies have traditionally been used along side fuzzers and even included with specific fuzzing tools such as FileFuzz (see Chapter 13, "File Format Fuzzing: Automation on Windows"), but Autodafé is the first fuzzing framework to explicitly include a debugger. The debugger component is used by Autodafé to set breakpoints on common dangerous APIs such as strcpy(), which is known as a source of buffer overflows, and fprintf(), which is known as a source of format string vulnerabilities. The fuzzer transmits test cases to both the target and the debugger simultaneously. The debugger then monitors dangerous API calls looking for strings originating from the fuzzer.
Every fuzz variable is considered a marker. The weight for an individual marker detected to pass through a dangerous API is increased. Markers that do not touch dangerous APIs are not fuzzed during the first pass. Markers with heavier weights are given priority and fuzzed first. In the event that the debugger detects an access violation, the fuzzer is automatically informed and the responsible test case is recorded. By prioritizing the fuzz variables and ignoring those that never cross a dangerous API, the total input space can be drastically reduced.
Autodafé includes a couple of additional tools to help quick and efficient fuzzer development. The first tool, PDML2AD, can parse pack data XML (PDML) files exported by Ethereal and Wireshark into the block-based Autodafé language. If your target protocol is among the more than 750 protocols that these popular network sniffers recognize, then the majority of the tedious block-based modeling can be handled automatically. Even in the event that your target protocol is unrecognized, PDML2AD can still provide a few shortcuts, as it will automatically detect plain text fields and generate the appropriate calls to hex(), string(), and so on. The second tool, TXT2AD, is a simple shell script that will convert a text file into an Autodafé script. The third and final tool, ADC, is the Autodafé compiler. ADC is useful when developing complex Autodafé scripts as it can detect common errors such as incorrect function names and unclosed blocks.
Autodafé is a well-thought-out advanced fuzzing framework that expands on the groundwork laid out by SPIKE. Autodafé shares many of the same pros and cons as SPIKE. The most impressive feature of this framework is the debugging component, which stands out among the other frameworks. Again, depending on individual needs, the lack of Microsoft Windows support might immediately disqualify this framework from consideration. Modifications to the framework require recompilation and once again code reuse between fuzz scripts is not as transparent and effortless as it could be.