InformIT

Preventing Buffer Overflow In Visual C++ Applications

Date: Mar 19, 2004

Return to the article

Buffer overflows are currently the most common cause of security flaws in applications. Discover the techniques that professionals use to thwart this problem in this article by John Mueller.

One of the most common security issues today is the buffer overflow. This particular security problem is responsible for more virus infections than perhaps all other sources combined. Just about every application and operating system on the market has buffer overflow flaws that a cracker could exploit. The problem is so prevalent with Microsoft Windows that Microsoft is taking a different approach to the problem with releases of products such as Windows XP Service Pack (SP) 2

NOTE

See "Windows XP Service Pack 2: A Developer's View" for a description of the security changes in Windows XP SP 2 from the developer perspective.

The purpose of this article is to help you understand buffer overflows more clearly and to give you simple techniques you can use to reduce (or hopefully eliminate) this problem in your applications.

Understanding Buffer Overflows

Buffer overflows illustrate the point that you can't know what the user will try to type into your application until you actually watch the user interact with it. These attacks rely on a somewhat strange idea. A cracker provides input to a program that exceeds the length of a buffer. The extra information ends up overwriting memory other than the memory controlled by the buffer. In some cases, the memory actually holds executable information (heap memory overrun). Instead of running the original executable code, the application ends up running the cracker's code. In other cases, the cracker overwrites the stack frame for the application (stack memory overrun).

The application sends a return call to another location where the cracker's code resides.

NOTE

The Cyberguard paper titled "Buffer Overrun Attacks" describes buffer overruns in greater detail.

Some crackers actually analyze your code, looking for places to exploit either a heap or stack memory overrun. However, in some cases, an exploit is discovered when a cracker tries typing something into a field to see what will happen. For example, a cracker might try to type a simple script to see if your application will execute it. No matter how the cracker discovers the exploit, the result is the same. Your application loses control to the cracker's code—the cracker now enjoys any privileges once enjoyed by your application.

Many developers think that crackers hatch devious plots to make use of the exploits they create, but many exploits are simple—the simple act of telling the operating system to display a command prompt is enough to gain control in some cases. If the system security is even a little lax, the cracker could gain control of the server. At the very least, a command prompt allows the cracker to probe the system looking for other ways to gain more access. Crackers don't have to gain control of your system on the first try. A little gain here and a little gain there is all they need.

It doesn't take much to understand that you must provide some kind of protection for your application to keep it safe from buffer overruns. The best way to control buffer overrun is to check every input your program receives, even from trusted sources. This article considers four basic checks that every program should perform: checking the data range, verifying the data length, excluding illegal characters, and providing the user with adequate help to ensure good input.

The problem of buffer overruns is so entrenched that you really can't trust any source of information—not even your own code—because some operating system layer could contaminate the data. If you want to write truly secure code, you need to make constant checks. Although this may sound paranoid, a little paranoia is good when working with security.

NOTE

Read Microsoft's "Avoiding Buffer Overruns" for a number of other suggestions.

Validating Data Ranges

Most data ranges provided by programming languages reflect the realities of the underlying hardware, not the requirements of the real world. For example, when you define a value in your code as Int32, it means that the user can enter any value from -2,147,483,648 through 2,147,483,647. This number is based on the requirements of the hardware where the computer stores the number using 31 bits for data and 1 bit as the sign bit (2^31 = 2,147,483,648). However, your application might not find this range acceptable.

When the needs of the hardware don't match the real world needs of your application, you must include special code in your application to check for potential error conditions. You might want to accept numbers from 1 to 40,000 in your code, which is outside the Int16 value range, but well within the Int32 value range. Listing 1 shows an example of such a check for an input control.

Listing 1 Detecting range errors

System::Void btnDataRange_Click(System::Object * sender, 
                System::EventArgs * e)
{
  Int32 TestData;  // Holds the input value.

  try
  {
   // Always attempt to parse the data first.
   TestData = Int32::Parse(txtInput1->Text);
  }
  catch (System::OverflowException *OE)
  {
   // React to the overflow error.
   MessageBox::Show(S"Type a value between 1 and 40,000.",
            S"Input Error",
            MessageBoxButtons::OK,
            MessageBoxIcon::Error);
   return;
  }
  catch (System::FormatException *FE)
  {
   // React to the overflow error.
   MessageBox::Show(S"Type the number without extra charaters.",
            S"Input Error",
            MessageBoxButtons::OK,
            MessageBoxIcon::Error);
   return;
  }

  // Test the specific data range.
  if (TestData < 1 || TestData > 40000)

   // React to the data range error.
   MessageBox::Show(S"Type a value between 1 and 40,000.",
            S"Input Error",
            MessageBoxButtons::OK,
            MessageBoxIcon::Error);
}

Notice that the code converts the input to an Int32 first by using the Parse() method. This simple conversion locates many input problems. In this case, the code checks for values that are either too large or too small using the System::OverflowException exception and values that aren't in the right format using the System::FormatException exception. After the code ensures that the input value is actually a legitimate Int32 value, it then checks for the actual input range.

Value data types are the easiest to check because they have specific ranges. Unlike an object, values have no hidden elements and few surprises for the developer.

NOTE

Microsoft provides a complete list of value types supported by the .NET Framework.

In general, all you need to do to validate a value data range is define the upper and lower boundaries, then check for them in your code.

The problems with data range validation begin with objects. For example, you might require that the user provide one of several strings as input. (Contrary to what some developers believe, strings are objects.) Using list boxes to reduce user choices to the options that you have in mind does help. A user can't enter invalid information, such as a script, when faced with a list box that allows a fixed number of choices.

Sometimes you have to devise unique solutions for problems. For example, what if you need to ensure that a particular method receives a fixed number of inputs in a discontinuous range? An enumeration can save the day in this case. Listing 2 shows how you can use an enumeration as an automatic range change in your code.

Listing 2 Using enumerations to check data ranges

System::Void btnTestEnum_Click(System::Object * sender, 
                System::EventArgs * e)
{
  // Call the DisplayString function.
  DisplayString(SomeStrings::One);
}

// Create the enumerated values.
__value enum SomeStrings
{
  One,
  Two,
  Three,
  Four
};

System::Void DisplayString(SomeStrings Input)
{
  // Convert the input value to a string.
  String* DataStr = Enum::GetName(__typeof(SomeStrings), __box(Input));

  // Display the input value.
  MessageBox::Show(DataStr);
}

Notice that the declaration of DisplayString() requires the input of a SomeStrings enumeration type. The caller can't use any other type as input, which means the DisplayString() method is automatically protected from many forms of bad input. For example, you couldn't supply a script as input because it isn't the correct type.

After the DisplayString() method receives the correct input type, it converts it to a string using the Enum::GetName() method. Notice that you must __box() Input because Input is a value type, not an object type. The code simply displays the resulting DataStr object.

Verifying Data Length

Some data types don't lend themselves to quick checks. For example, a string can contain any number of characters, at least up to the limit set by the .NET Framework and the machine. Of course, very few people really need a string that long. Normally, a developer needs a string with a minimum and maximum length range. Consequently, you not only need to verify that you've received a string, but that the string is the right length. Otherwise, someone could send a string of any length, and that could lead to a buffer overrun. Listing 3 shows how you can prevent this problem by validating the data length of each argument.

Listing 3 Verifying the data length

System::Boolean ProcessData(String *Input, 
              Int32 UpperLimit, 
              Int32 LowerLimit)
{
  StringBuilder *ErrorMsg; // Error message.

  // Check for an input error.
  if (UpperLimit < LowerLimit)
  {
   // Create the error message.
   ErrorMsg = new StringBuilder();
   ErrorMsg->Append(S"The UpperLimit input must be greater than ");
   ErrorMsg->Append(S"the LowerLimit number.");

   // Define a new error.
   System::ArgumentException  *AE;
   AE = new ArgumentException(ErrorMsg->ToString(),
                 S"UpperLimit");

   // Throw the error.
   throw(AE);
  }

  // Check for a data length error condition.
  if (Input->Length < LowerLimit || Input->Length > UpperLimit)
  {
   // Create the error message.
   ErrorMsg = new StringBuilder();
   ErrorMsg->Append(S"String is the wrong length. Use a string ");
   ErrorMsg->Append(S"between 4 and 8 characters long.");

   // Define a new error.
   System::Security::SecurityException *SE;
   SE = new SecurityException(ErrorMsg->ToString());

   // Throw the error.
   throw(SE);
  }

  // If the data is correct, return true.
  return true;
}

System::Void btnDataLength_Click(System::Object * sender,
                 System::EventArgs * e)
{
  try
  {
   // Process the input text.
   if (ProcessData(txtInput2->Text, 8, 4))

     // Display a result message for correct input.
     MessageBox::Show(txtInput2->Text, 
             "Input String", 
             MessageBoxButtons::OK, 
             MessageBoxIcon::Information);
  }
  catch (System::Security::SecurityException *SE)
  {
   // Display an error message for incorrect input.
   MessageBox::Show(SE->Message, 
            "Input Error", 
            MessageBoxButtons::OK, 
            MessageBoxIcon::Error);
  }
  catch (System::ArgumentException *AE)
  {
   // Display an error message for incorrect input.
   MessageBox::Show(AE->Message, 
            "Argument Error", 
            MessageBoxButtons::OK, 
            MessageBoxIcon::Error);
  }
}

The validation occurs in the ProcessData() method, which accepts the input string, the minimum string length, and the maximum string length as inputs. Notice that the code first verifies that the input arguments are correct. The UpperLimit argument must be larger than the LowerLimit argument. This portion of the code demonstrates a good practice for all developers—never trust the input you receive. Notice that the code raises a System::ArgumentException exception, not a generic exception. Too many developers use a generic exception when a specific exception will work better. When the .NET Framework fails to provide a specific exception for your coding need, it's time to create a custom exception that does.

The code validates the string next. If the string doesn't have enough characters or if it has too many, the code raises a System::Security::SecurityException exception. The security exception is correct in this case because of the events that lead to the exception. A user might decide to input a long string in order to create a buffer overflow condition. Even when the user has made a mistake, the fact that you raise this event as a security exception means that you can at least verify the reason for the exception, rather than simply pass it off as a user having a bad day.

The test code for this example appears in the btnDataLength_Click() method. The code executes within a try...catch block to ensure that the exceptions are trapped. The actual check is a simple if statement. The code includes one catch statement for each exception. Trapping the exceptions is important if you want to ensure that the application notes any security exceptions and handles them appropriately.

Excluding Illegal Characters

Crackers often include extra illegal characters in their input to see what happens. For example, a cracker can often create a script by adding special characters. In many cases, the system will execute the script without any warning, giving the cracker access to the system. Web applications are more susceptible than desktop applications to this exploit, but you need to protect both.

NOTE

See an interesting description of one of the more interesting Web application exploits at this Web site.

Fortunately, the .NET Framework provides great regular expression support. A regular expression defines the acceptable input for a string, so you can detect illegal characters easily. Listing 4 shows one method for using regular expressions.

Listing 4 Using regular expressions

System::Boolean CheckChars(System::String *Input)
{
  StringBuilder *ErrorMsg; // Error message.
  Regex     *R;     // Regular expression.

  // Create a regular expression for match purposes.
  R = new Regex("[A-Za-z]");

  // Check for a data length error condition.
  if (R->Matches(Input)->Count < Input->Length)
  {
   // Create the error message.
   ErrorMsg = new StringBuilder();
   ErrorMsg->Append(S"String contains incorrect characters. ");
   ErrorMsg->Append(S"Use only A through Z and a through z.");

   // Define the exception.
   System::Security::SecurityException *SE;
   SE = new SecurityException(ErrorMsg->ToString());

   // Throw the exception.
   throw(SE);
  }

  // If the data is correct, return true.
  return true;
}

The code begins by building a Regex object. In this case, the only acceptable inputs are letters (you can't even include spaces). Regular expressions can encompass a vast array of inputs. In fact, there are many default templates provided as part of the validator support for ASP.NET applications. The point is that you can build a string that defines acceptable input, including input patterns such as telephone numbers.

A Regex object can perform a number of comparisons. It uses the Matches() method in this case to check the length of the string against the number of comparisons. When the two numbers match, the input is correct. Otherwise, the input contains illegal characters, and the CheckChars() method raises an exception.

Providing Superior User Help

Many developers would never associate help with good security, but good help does improve security by making user mistakes a little less likely. For example, a good help file can prevent many kinds of user input errors by showing the user precisely what your application expects to receive. Reducing input errors makes it possible to perform thorough analyses of the errors that remain, which reduces security risks from incorrect input in the end.

Help comes in all forms, including useful error messages. Some data types present special challenges that your application must handle to ensure data integrity, as well as address security concerns. For example, a date is a common data entry item that can present problems. First, you need to consider the format of the date. A user could type 1 June 2003, 06/01/2003, June 1, 2003, 2003/06/01, or any other acceptable variant. Restricting your application to allow only one date format makes it easier to check the data for invalid information. The error messages and help files must tell the user what form to use, however, so the user doesn't become frustrated when entering a valid date using an incorrect format.

No matter what you do, some users will attempt to abuse the system. They'll enter the date using the wrong format or even type something that has nothing to do with a date. However, by providing good help, you now have a basis to question the user. You can invoke security measures that ensure that users know that the behavior is unacceptable. Reducing buffer overflows is a proactive process. You must defend against the invalid input, provide good help to users who aren't informed, and be willing to take punitive measures against those users who decide to ignore the rules.

800 East 96th Street, Indianapolis, Indiana 46240