Home > Articles > Programming > C/C++

  • Print
  • + Share This
This chapter is from the book

2.4. Mitigation Strategies for Strings

Because errors in string manipulation have long been recognized as a leading source of buffer overflows in C and C++, a number of mitigation strategies have been devised. These include mitigation strategies designed to prevent buffer overflows from occurring and strategies designed to detect buffer overflows and securely recover without allowing the failure to be exploited.

Rather than completely relying on a given mitigation strategy, it is often advantageous to follow a defense-in-depth tactic that combines multiple strategies. A common approach is to consistently apply a secure technique to string handling (a prevention strategy) and back it up with one or more runtime detection and recovery schemes.

String Handling

The CERT C Secure Coding Standard [Seacord 2008], “STR01-C. Adopt and implement a consistent plan for managing strings,” recommends selecting a single approach to handling character strings and applying it consistently across a project. Otherwise, the decision is left to individual programmers who are likely to make different, inconsistent choices. String-handling functions can be categorized according to how they manage memory. There are three basic models:

  • Caller allocates, caller frees (C99, OpenBSD, C11 Annex K)
  • Callee allocates, caller frees (ISO/IEC TR 24731-2)
  • Callee allocates, callee frees (C++ std::basic_string)

It could be argued whether the first model is more secure than the second model, or vice versa. The first model makes it clearer when memory needs to be freed, and it is more likely to prevent leaks, but the second model ensures that sufficient memory is available (except when a call to malloc() fails).

The third memory management mode, in which the callee both allocates and frees storage, is the most secure of the three solutions but is available only in C++.

C11 Annex K Bounds-Checking Interfaces

The first memory management model (caller allocates, caller frees) is implemented by the C string-handling functions defined in <string.h>, by the OpenBSD functions strlcpy() and strlcat(), and by the C11 Annex K bounds-checking interfaces. Memory can be statically or dynamically allocated before invoking these functions, making this model optimally efficient. C11 Annex K provides alternative library functions that promote safer, more secure programming. The alternative functions verify that output buffers are large enough for the intended result and return a failure indicator if they are not. Data is never written past the end of an array. All string results are null-terminated.

C11 Annex K bounds-checking interfaces are primarily designed to be safer replacements for existing functions. For example, C11 Annex K defines the strcpy_s(), strcat_s(), strncpy_s(), and strncat_s() functions as replacements for strcpy(), strcat(), strncpy(), and strncat(), respectively, suitable in situations when the length of the source string is not known or guaranteed to be less than the known size of the destination buffer.

The C11 Annex K functions were created by Microsoft to help retrofit its existing legacy code base in response to numerous well-publicized security incidents. These functions were subsequently proposed to the ISO/IEC JTC1/SC22/WG14 international standardization working group for the programming language C for standardization. These functions were published as ISO/IEC TR 24731-1 and later incorporated in C11 in the form of a set of optional extensions specified in a normative annex. Because the C11 Annex K functions can often be used as simple replacements for the original library functions in legacy code, The CERT C Secure Coding Standard [Seacord 2008], “STR07-C. Use TR 24731 for remediation of existing string manipulation code,” recommends using them for this purpose on implementations that implement the annex. (Such implementations are expected to define the __STDC_LIB_EXT1__ macro.)

Annex K also addresses another problem that complicates writing robust code: functions that are not reentrant because they return pointers to static objects owned by the function. Such functions can be troublesome because a previously returned result can change if the function is called again, perhaps by another thread.

C11 Annex K is a normative but optional annex—you should make sure it is available on all your target platforms. Even though these functions were originally developed by Microsoft, the implementation of the bounds-checking library that ships with Microsoft Visual C++ 2012 and earlier releases does not conform completely with Annex K because of changes to these functions during the standardization process that have not been retrofitted to Microsoft Visual C++.

Example 2.1 from the section “Improperly Bounded String Copies” can be reimplemented using the C11 Annex K functions, as shown in Example 2.5. This program is similar to the original example except that the array bounds are checked. There is implementation-defined behavior (typically, the program aborts) if eight or more characters are input.

Example 2.5. Reading from stdin Using gets_s()

01  #define __STDC_WANT_LIB_EXT1__ 1
02  #include <stdio.h>
03  #include <stdlib.h>
04
05  void get_y_or_n(void) {
06    char response[8];
07    size_t len = sizeof(response);
08    puts("Continue? [y] n: ");
09    gets_s(response, len);
10    if (response[0] == 'n')
11      exit(0);
12  }

Most bounds-checking functions, upon detecting an error such as invalid arguments or not enough bytes available in an output buffer, call a special runtime-constraint-handler function. This function might print an error message and/or abort the program. The programmer can control which handler function is called via the set_constraint_handler_s() function and can make the handler simply return if desired. If the handler simply returns, the function that invoked the handler indicates a failure to its caller using its return value. Programs that install a handler that returns must check the return value of each call to any of the bounds-checking functions and handle errors appropriately. The CERT C Secure Coding Standard [Seacord 2008], “ERR03-C. Use runtime-constraint handlers when calling functions defined by TR24731-1,” recommends installing a runtime-constraint handler to eliminate implementation-defined behavior.

Example 2.1 of reading from stdin using the C11 Annex K bounds-checking functions can be improved to remove the implementation-defined behavior at the cost of some additional complexity, as shown by Example 2.6.

Example 2.6. Reading from stdin Using gets_s() (Improved)

01  #define __STDC_WANT_LIB_EXT1__ 1
02  #include <stdio.h>
03  #include <stdlib.h>
04
05  void get_y_or_n(void) {
06    char response[8];
07    size_t len = sizeof(response);
08
09    puts("Continue? [y] n: ");
10    if ((gets_s(response, len) == NULL) || (response[0] == 'n')) {
11       exit(0);
12    }
13  }
14
15  int main(void) {
16    constraint_handler_t oconstraint =
17      set_constraint_handler_s(ignore_handler_s);
18    get_y_or_n();
19  }

This example adds a call to set_constraint_handler_s() to install the ignore_handler_s() function as the runtime-constraint handler. If the runtime-constraint handler is set to the ignore_handler_s() function, any library function in which a runtime-constraint violation occurs will return to its caller. The caller can determine whether a runtime-constraint violation occurred on the basis of the library function’s specification. Most bounds-checking functions return a nonzero errno_t. Instead, the get_s() function returns a null pointer so that it can serve as a close drop-in replacement for gets().

In conformance with The CERT C Secure Coding Standard [Seacord 2008], “ERR00-C. Adopt and implement a consistent and comprehensive error-handling policy,” the constraint handler is set in main() to allow for a consistent error-handling policy throughout the application. Custom library functions may wish to avoid setting a specific constraint-handler policy because it might conflict with the overall policy enforced by the application. In this case, library functions should assume that calls to bounds-checked functions will return and check the return status accordingly. In cases in which the library function does set a constraint handler, the function must restore the original constraint handler (returned by the function set_constraint_handler_s()) before returning or exiting (in case there are atexit() registered functions).

Both the C string-handling and C11 Annex K bounds-checking functions require that storage be preallocated. It is impossible to add new data once the destination memory is filled. Consequently, these functions must either discard excess data or fail. It is important that the programmer ensure that the destination is of sufficient size to hold the character data to be copied and the null-termination character, as described by The CERT C Secure Coding Standard [Seacord 2008], “STR31-C. Guarantee that storage for strings has sufficient space for character data and the null terminator.”

The bounds-checking functions defined in C11 Annex K are not foolproof. If an invalid size is passed to one of the functions, it could still suffer from buffer overflow problems while appearing to have addressed such issues. Because the functions typically take more arguments than their traditional counterparts, using them requires a solid understanding of the purpose of each argument. Introducing the bounds-checking functions into a legacy code base as replacements for their traditional counterparts also requires great care to avoid inadvertently injecting new defects in the process. It is also worth noting that it is not always appropriate to replace every C string-handling function with its corresponding bounds-checking function.

Dynamic Allocation Functions

The second memory management model (callee allocates, caller frees) is implemented by the dynamic allocation functions defined by ISO/IEC TR 24731-2. ISO/IEC TR 24731-2 defines replacements for many of the standard C string-handling functions that use dynamically allocated memory to ensure that buffer overflow does not occur. Because the use of such functions requires introducing additional calls to free the buffers later, these functions are better suited to new development than to retrofitting existing code.

In general, the functions described in ISO/IEC TR 24731-2 provide greater assurance that buffer overflow problems will not occur, because buffers are always automatically sized to hold the data required. Applications that use dynamic memory allocation might, however, suffer from denial-of-service attacks in which data is presented until memory is exhausted. They are also more prone to dynamic memory management errors, which can also result in vulnerabilities.

Example 2.1 can be implemented using the dynamic allocation functions, as shown in Example 2.7.

Example 2.7. Reading from stdin Using getline()

01  #define __STDC_WANT_LIB_EXT2__ 1
02  #include <stdio.h>
03  #include <stdlib.h>
04
05  void get_y_or_n(void) {
06    char *response = NULL;
07    size_t len;
08
09    puts("Continue? [y] n: ");
10    if ((getline(&response, &len, stdin) < 0) ||
11        (len && response[0] == 'n')) {
12      free(response);
13      exit(0);
14    }
15    free(response);
16  }

This program has defined behavior for any input, including the assumption that an extremely long line that exhausts all available memory to hold it should be treated as if it were a “no” response. Because the getline() function dynamically allocates the response buffer, the program must call free() to release any allocated memory.

ISO/IEC TR 24731-2 allows you to define streams that do not correspond to open files. One such type of stream takes input from or writes output to a memory buffer. These streams are used by the GNU C library, for example, to implement the sprintf() and sscanf() functions.

A stream associated with a memory buffer has the same operations for text files that a stream associated with an external file would have. In addition, the stream orientation is determined in exactly the same fashion.

You can create a string stream explicitly using the fmemopen(), open_memstream(), or open_wmemstream() function. These functions allow you to perform I/O to a string or memory buffer. The fmemopen() and open_memstream() functions are declared in <stdio.h> as follows:

1  FILE *fmemopen(
2    void * restrict buf, size_t size, const char * restrict mode
3  );
4  FILE *open_memstream(
5    char ** restrict bufp, size_t * restrict sizep
6  );

The open_wmemstream() function is defined in <wchar.h> and has the following signature:

FILE *open_wmemstream(wchar_t **bufp, size_t *sizep);

The fmemopen() function opens a stream that allows you to read from or write to a specified buffer. The open_memstream() function opens a byte-oriented stream for writing to a buffer, and the open_wmemstream() function creates a wide-oriented stream. When the stream is closed with fclose() or flushed with fflush(), the locations bufp and sizep are updated to contain the pointer to the buffer and its size. These values remain valid only as long as no further output on the stream takes place. If you perform additional output, you must flush the stream again to store new values before you use them again. A null character is written at the end of the buffer but is not included in the size value stored at sizep.

Input and output operations on a stream associated with a memory buffer by a call to fmemopen(), open_memstream(), or open_wmemstream() are constrained by the implementation to take place within the bounds of the memory buffer. In the case of a stream opened by open_memstream() or open_wmemstream(), the memory area grows dynamically to accommodate write operations as necessary. For output, data is moved from the buffer provided by setvbuf() to the memory stream during a flush or close operation. If there is insufficient memory to grow the memory area, or the operation requires access outside of the associated memory area, the associated operation fails.

The program in Example 2.8 opens a stream to write to memory on line 6.

Example 2.8. Opening a Stream to Write to Memory

01  #include <stdio.h>
02
03  int main(void) {
04    char *buf;
05    size_t size;
06    FILE *stream;
07
08    stream = open_memstream(&buf, &size);
09    if (stream == NULL) { /* handle error */ };
10    fprintf(stream, "hello");
11    fflush(stream);
12    printf("buf = '%s', size = %zu\n", buf, size);
13    fprintf(stream, ", world");
14    fclose(stream);
15    printf("buf = '%s', size = %zu\n", buf, size);
16    free(buf);
17    return 0;
18  }

The string "hello" is written to the stream on line 10, and the stream is flushed on line 11. The call to fflush() updates buf and size so that the printf() function on line 12 outputs

buf = 'hello', size = 5

After the string ", world" is written to the stream on line 13, the stream is closed on line 14. Closing the stream also updates buf and size so that the printf() function on line 15 outputs

buf = 'hello, world', size = 12

The size is the cumulative (total) size of the buffer. The open_memstream() function provides a safer mechanism for writing to memory because it uses a dynamic approach that allocates memory as required. However, it does require the caller to free the allocated memory, as shown on line 16 of the example.

Dynamic allocation is often disallowed in safety-critical systems. For example, the MISRA standard requires that “dynamic heap memory allocation shall not be used” [MISRA 2005]. Some safety-critical systems can take advantage of dynamic memory allocation during initialization but not during operations. For example, avionics software may dynamically allocate memory while initializing the aircraft but not during flight.

The dynamic allocation functions are drawn from existing implementations that have widespread usage; many of these functions are included in POSIX.

C++ std::basic_string

Earlier we described a common programming flaw using the C++ extraction operator operator>> to read input from the standard std::cin iostream object into a character array. Although setting the field width eliminates the buffer overflow vulnerability, it does not address the issue of truncation. Also, unexpected program behavior could result when the maximum field width is reached and the remaining characters in the input stream are consumed by the next call to the extraction operator.

C++ programmers have the option of using the standard std::string class defined in ISO/IEC 14882. The std::string class is a specialization of the std::basic_string template on type char. The std::wstring class is a specialization of the std::basic_string template on type wchar_t.

The basic_string class represents a sequence of characters. It supports sequence operations as well as string operations such as search and concatenation and is parameterized by character type.

The basic_string class uses a dynamic approach to strings in that memory is allocated as required—meaning that in all cases, size() <= capacity(). The basic_string class is convenient because the language supports the class directly. Also, many existing libraries already use this class, which simplifies integration.

The basic_string class implements the “callee allocates, callee frees” memory management strategy. This is the most secure approach, but it is supported only in C++. Because basic_string manages memory, the caller does not need to worry about the details of memory management. For example, string concatenation is handled simply as follows:

1  string str1 = "hello, ";
2  string str2 = "world";
3  string str3 = str1 + str2;

Internally, the basic_string methods allocate memory dynamically; buffers are always automatically sized to hold the data required, typically by invoking realloc(). These methods scale better than their C counterparts and do not discard excess data.

The following program shows a solution to extracting characters from std::cin into a std::string, using a std::string object instead of a character array:

01  #include <iostream>
02  #include <string>
03  using namespace std;
04
05  int main(void) {
06    string str;
07
08    cin >> str;
09    cout << "str 1: " << str << '\n';
10  }

This program is simple and elegant, handles buffer overflows and string truncation, and behaves in a predictable fashion. What more could you possibly want?

The basic_string class is less prone to security vulnerabilities than null-terminated byte strings, although coding errors leading to security vulnerabilities are still possible. One area of concern when using the basic_string class is iterators. Iterators can be used to iterate over the contents of a string:

1  string::iterator i;
2  for (i = str.begin(); i != str.end(); ++i) {
3    cout << *i;
4  }

Invalidating String Object References

References, pointers, and iterators referencing string objects are invalidated by operations that modify the string, which can lead to errors. Using an invalid iterator is undefined behavior and can result in a security vulnerability.

For example, the following program fragment attempts to sanitize an e-mail address stored in the input character array before passing it to a command shell by copying the null-terminated byte string to a string object (email), replacing each semicolon with a space character:

01  char input[];
02  string email;
03  string::iterator loc = email.begin();
04  // copy into string converting ";" to " "
05  for (size_t i=0; i < strlen(input); i++) {
06    if (input[i] != ';') {
07      email.insert(loc++, input[i]); // invalid iterator
08    }
09    else email.insert(loc++, ' '); // invalid iterator
10  }

The problem with this code is that the iterator loc is invalidated after the first call to insert(), and every subsequent call to insert() results in undefined behavior. This problem can be easily repaired if the programmer is aware of the issue:

01  char input[];
02  string email;
03  string::iterator loc = email.begin();
04  // copy into string converting ";" to " "
05  for (size_t i=0; i < strlen(input); ++i) {
06    if (input[i] != ';') {
07      loc = email.insert(loc, input[i]);
08    }
09    else loc = email.insert(loc, ' ');
10    ++loc;
11  }

In this version of the program, the value of the iterator loc is properly updated as a result of each insertion, eliminating the undefined behavior. Most checked standard template library (STL) implementations detect common errors automatically. At a minimum, run your code using a checked STL implementation on a single platform during prerelease testing using your full complement of tests.

The basic_string class generally protects against buffer overflow, but there are still situations in which programming errors can lead to buffer overflows. While C++ generally throws an exception of type std::out_of_range when an operation references memory outside the bounds of the string, for maximum efficiency, the subscript member std::string::operator[] (which does not perform bounds checking) does not. For example, the following program fragment can result in a write outside the bounds of the storage allocated to the bs string object if f() >= bs.size():

1  string bs("01234567");
2  size_t i = f();
3  bs[i] = '\0';

The at() method behaves in a similar fashion to the index operator[] but throws an out_of_range exception if pos >= size():

1  string bs("01234567");
2  try {
3    size_t i = f();
4    bs.at(i) = '\0';
5  }
6  catch (out_of_range& oor) {
7    cerr << "Out of Range error: " << oor.what() << '\n';
8  }

Although the basic_string class is generally more secure, the use of null-terminated byte strings in a C++ program is generally unavoidable except in rare circumstances in which there are no string literals and no interaction with existing libraries that accept null-terminated byte strings. The c_str() method can be used to generate a null-terminated sequence of characters with the same content as the string object and returns it as a pointer to an array of characters.

string str = x;
cout << strlen(str.c_str());

The c_str() method returns a const value, which means that calling free() or delete on the returned string is an error. Modifying the returned string can also lead to an error, so if you need to modify the string, make a copy first and then modify the copy.

Other Common Mistakes in basic_string Usage

Other common mistakes using the basic_string class include

  • Using an invalidated or uninitialized iterator
  • Passing an out-of-bounds index
  • Using an iterator range that really is not a range
  • Passing an invalid iterator position

These issues are discussed in more detail in C++ Coding Standards: 101 Rules, Guidelines, and Best Practices by Herb Sutter and Andrei Alexandrescu [Sutter 2005].

Finally, many existing C++ programs and libraries use their own string classes. To use these libraries, you may have to use these string types or constantly convert back and forth. Such libraries are of varying quality when it comes to security. It is generally best to use the standard library (when possible) or to understand completely the semantics of the selected library. Generally speaking, libraries should be evaluated on the basis of how easy or complex they are to use, the type of errors that can be made, how easy those errors are to make, and what the potential consequences may be.

  • + Share This
  • 🔖 Save To Your Account

Sign Up for Our Newsletters

Subscribing to the InformIT newsletters is an easy way to keep in touch with what's happening in your corner of the industry. We have a newsletters dedicated to a variety of topics such as open source, programming, and web development, so you get just the information you need. Sign up today.