- Overview
- Web and Application Servers
- Database Server
- Client Computer
- Secure Communications
- Network Security
- Verifying Site Security
- TBS Case Study
- Chapter Summary
- References
3.2 Web and Application Servers
Web servers and application servers, the front line of a Web system, have specific security requirements. Web and application server security involves the proper operation of user authentication and access control, as well as a detailed examination of the Web system components used to drive dynamic and interactive content.
Authentication
The security mechanism that verifies a user's identity is referred to as authentication. Users can prove their identity in several ways, most commonly through a user ID and password. Many Web systems provide a combination of public and private areas. The content in the public areas is accessible to the general public, whereas content in private areas requires users to authenticate themselves prior to being granted access. This section describes the two most common authentication methods used on public Internet severs: HTTP basic authentication and custom authentication forms.
HTTP Basic Authentication
This type of authentication is the standard method of access control provided by most major browsers. Basic authentication is supported at the HTTP level by most Web servers and requires little or no development effort to implement. Unfortunately, because HTTP basic authentication does not provide protection of the user ID and password during transmission from the user's computer to the site's Web server, that information may be intercepted by a third party. Such protection usually requires the establishment of a secure HTTP connection, typically through the use of SSL.
HTTP basic authentication is readily identifiable by its use of the access control dialog box. The standard Microsoft Internet Explorer Basic Authentication dialog is depicted in Figure 3-1. Basic authentication is also compatible with proxy servers and firewalls, so it is preferable to some of the other, platform-specific authentication techniques. When the Web system uses the Microsoft Internet Information Server (IIS) on a Windows NT/2000 platform, user accounts and permissions are integrated with the operating system's user database, allowing management of Web user accounts through the familiar, operating systemprovided tools. Most UNIX-based Web servers use a separate user ID and password file, although some are capable of using a Lightweight Directory Access Protocol (LDAP) directory server for authentication.
Figure 3-1 Basic Authentication Dialog Box
Figure 3-2 documents an exchange of packets, or messages, between a browser and a Web server engaging in HTTP basic authentication captured through the use of the Windows 2000 network monitor. Each frame number lists one packet sent from the source host to the destination host during the retrieval of a secured, or private, page. In this example, LOCAL is the client machine running the browser, and HOMER is the Web server machine.
Figure 3-2 Basic Authentication Network Capture
Frame 1 represents the initial request by the client computer for the secured page. In this example, LOCAL, the client computer, is performing a simple retrieval using the HTTP GET methodthe most common way that Web browsers retrieve pages from Web servers. When it receives the request, HOMER, the Web server computer, examines the data in the request. HOMER examines the request, checks for authorization, and responds once it has completed its examination of the data. Following is the message data sent from LOCAL to HOMER in the GET request:
GET /secured/secure.html HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, application/vnd.ms-excel, application/msword Referer: http://Homer/ Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) Host: Homer Connection: Keep-Alive
HOMER is configured to protect the pagethat is, it is not accessible by anonymous usersso it therefore requires that authentication data be provided by the client computer. HTTP basic authentication enables the client computer to deliver the authentication within the HTTP request, using the authorization field in the request message. In this case, that field is not present in the initial message sent by the LOCAL, as reflected in frame 1, so HOMER returns an error to LOCAL, indicating that the page cannot be delivered without this information. Frames 24 represent HOMER's response to LOCAL, conveying error 401.2Unauthorized. The message is spread across three packets, as the maximum transmittable unit (MTU) of the network in this example is 1,500 bytes. HOMER's response includes a nicely formatted error page, so it is quite long (more than 4,000 bytes).
After receiving these three packets, LOCAL knows that it must prompt the user for the user ID and password in order to give HOMER the information it needs and so displays a login dialog box similar to that of Figure 3-1. Once the user enters the user ID and password and clicks OK, the browser will lightly encode them, using a base64 encoding algorithm. 1 The browser then resends the request for the page to the Web server, HOMER, including the encoded user ID and password. Note the term encoded, which is not the same as encrypted. Frame 5 reflects the action of LOCAL in resending the request to the Web server, this time with the encoded user ID and password. The message data sent in frame 5 is as follows:
GET /secured/secure.html HTTP 1.1 . . . Authorization: BASIC aG10aGVyZTplbmNvZGVk
HOMER receives this request and validates the user ID and password that the user entered, by decoding the authorization string and attempting to perform a logon with the supplied credentials. In our example, the logon succeeds, and HOMER sends back the requested page, in frame 6. An important thing to note here is that the next time a page is requested from the "secured" directory on HOMER, the browser will automatically include the encoded user ID and password string within the request. This condition remains true until the browser is closed, at which point the encoded user ID and password string is discarded.
As illustrated in frame 5's message data, the authorization field is the encoded user ID and password. This field is in plaintext and, as we have seen, can be captured by a third party using a network monitor or any one of a number of other tools.
Once this plaintext is captured, the third party can either decode the user ID and password and attempt to use this information to log on to the system or replay the authorization string to retrieve pages from the server.
This potential situation highlights the primary security issue with HTTP basic authentication: plaintext transmission of user IDs and passwords across the Internet. To protect this information, the only real option is to use SSL or an equivalent secure protocol. Using basic authentication over an unsecured connection is extremely hazardous and allows a third party to possibly intercept the request and decode the user ID and password. Note that it is not sufficient to use SSL only during the initial logon when using HTTP Basic Authentication, since the Authorization string is retransmitted with each request. Therefore, it is necessary to encrypt the entire session with SSL in order to protect the user's credentials.
An SSL-encrypted session would render the captured packets unreadable by a third party. For example, if the same sequence of frames were exchanged over an SSL connection, frame 5 would look something like the following string of text:
________©____h&|3⁄4e˙ Eo©H@ qA˘ R"Px_Y_}y'Q{||.N_Z:0%_?_ {_Ò&]RüP]yl/ô?@e˙ 9j{z2ee˘ $_˘N
If a third party captures that string of text, considerable time would be needed to decrypt the character string into something meaningful.
Custom Authentication Form
Some Web systems incorporate the use of a login interface by creating a customized form that is used to obtain a user ID and password from the user. Custom authentication forms are more attractive than the HTTP basic authentication dialog box, providing a professional look and a better end user experience (Figure 3-3). In this figure, the User ID field is a standard <INPUT TYPE="text"> HTML element, whereas the Password field is defined as <INPUT TYPE="password">, which instructs the browser to hide the password characters from the user as the user types in the password. To initiate the login process, the user clicks the Login button.
Figure 3-3 Custom Authentication Form
This approach does have quite a few security implications, however. Because authentication is not supported at the HTTP level with this approach, a standard HTML form on a Web page must be used to create the login page. From the browser and the server perspective, these authentication forms are handled just like any other form on a Web page. As with HTTP basic authentication, the password is not encrypted, so this method is typically combined with a secure HTTP connection in order to protect the password during transmission.
A server-side component, such as Active Server Pages (ASP) script, Component Object Model (COM) object, or Web server extension, needs to be created to perform the authentication, which verifies the user ID and password through an external mechanism, such as the Web server or the operating system's API. In addition, Web server components must verify that the user has logged in prior to allowing access to any page in a secured area. Without this check, a user could simply bypass the logon form and directly request a page. In contrast, HTTP basic authentication performs this check automatically. Because of these additional requirements, custom authentication forms are more expensive to implement from a development perspective.
Figure 3-4 shows the message dialog captured by the network monitor, reflecting the communication between the client browser and the Web server following the user's clicking the Login button.
Figure 3-4 Network Monitor Capture
Frame 1 represents the POST request being transmitted from LOCAL, the client computer, to HOMER, the Web server. Within the packet being sent, LOCAL provides the form values that the user provided for the user ID and password fields. The data of the packet transmitted in frame 1 is as follows:
POST /login.asp HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, application/vnd.ms-excel, application/msword Referer: http://Homer/ Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) Host: Homer Content-Length: 44 Connection: Keep-Alive userid=James&password=secretword
The last line in the packet contains the form field values. The user ID entered on the form is present as the text James, as well as the value for the password, in readable form, as the text secretword. This example clearly demonstrates the need for invoking secure HTTP (SSL). Recall that when this type of communication exchange occurs between a client computer and a Web server over a secure connection, the text of such a packet appears as a long string of strange characters.
It is not always possible to tell whether the login is being performed over a secure connection simply by looking for the lock icon on the status bar. A Web page can be received on the user's computer over a non-SSL connection that includes an HTML form. The form given to the user contains some fields for the user to enter and also provides a way for the user to submit the contents of the form back to the Web server. The form also contains an action, which specifies the URL that the browser should use to submit the form contents. The form action can specify a SSL connection, which in this case would submit the form data over an encrypted connection. Thus, the credentials will still be transmitted in a secure manner, even though the browser is not displaying a secure icon during user entry of the credentials.
Following acceptance of the form information by the Web server, it may then choose to redirect the user back to a non-SSL connection. This is usually done to avoid the server and network performance overhead involved with unnecessary encrypted communication. The use of SSL is expensive for the server in terms of bandwidth and central processing unit (CPU) usage, so it is beneficial to encrypt only the transmission of user credentialsthe user ID and password. Specifying an encrypted connection for getting the credentials from the client to the server is all that is necessary in this case. Note, however, that encrypting just the credentials is not always the case. Web systems operated by most banks, brokerages, and similar businesses commonly use encryption on a full-time basis, as most of the data being passed back and forth is sensitive. As previously mentioned, sites making use of HTTP Basic Authentication should encrypt the entire session with SSL to protect the user's credentials.
Authorization
For a Web system, authorization means giving a user permission to access a resource, such as a Web page, on the Web server. Web sites generally contain a combination of public and private content, requiring the user to log in to view the private content. Once the user has successfully logged in to a secure area, the system allows the user to view pages and to invoke back-end scripts to perform various tasks. Once the user completes the login procedure, the security approach taken by the system will vary, depending on its access control requirements.
In general, the system can invoke two categories of access control:
Web server access: Physical access to HTML pages, ASP scripts, and so on, on the Web server. The web server will restrict access to pages and scripts to authorized users.
Database access: Access to read or write data stored in the site's database. This access may vary, depending on the data the user is attempting to read or change, and may not be constant for the user across the entire database.
In the Technology Bookstore case study, described in Appendix C, users are either anonymous or authorized. Anonymous users are Internet users browsing the item catalog. They are accessing public data that does not require any security checks. Authorized users are customers who wish to make a purchase or access customer service functionality. These users possess a user ID and password and can log on to the site and access the customer service pages to view a previous order or to make a purchase. All authorized users are considered equal; no users are designated as being more privileged than others.
It is good user interface (UI) practice to disable or hide functions that the user is not allowed to perform. This makes for a more pleasant user experience, as users are not shown "access denied" messages, the cause of which may not be immediately obvious. It is not prudent in a Web application, however, to rely on the absence of a UI button or control as a means of enforcing security. Users lacking a Delete button may nonetheless attempt to access the delete script by trying to identify the appropriate URL. One way to restrict access to the script on the server is to use the server's access control mechanism to restrict certain users from accessing it at all. Users who do not have the appropriate permissions to access the script will receive a message indicating that they are not authorized to view the page. Unfortunately, this is not always an appropriate solution, as those users may need to access the script based on the data they are working with at the time.
The TBS case study, for example, allows users to view their previous orders. Although it is acceptable for all authorized users to have access to execute the ViewPreviousOrder script, they cannot use it to view all the data accessible to these scripts. For example, a user is presented with the order list page, which contains hyperlinks for several orders the customer has placed. The user could modify the order numbers and attempt to view orders placed by someone else. To prevent this type of security violation, the script must verify that the data the user is attempting to access does in fact belong to that user. In the case of ViewPreviousOrder, that can be accomplished by associating the user's ID with the order records in the site's database.
Content Attacks
Most Web sites rely on dynamic pagesASP, PHP, or JSPCGI scripts or executables, and other forms of dynamic content delivery to provide a useful and interesting experience for the user. Unfortunately, these technologies are often the source of security holes.
These technologies are often used by Web sites to provide a mechanism for the user to send data to the Web server by submitting a form or clicking a link with variables in the URL. A component on the Web server will take these inputs and execute some useful business logic, such as retrieving a product page from a catalog or supplying bank account information for display back on the user's Web browser.
Depending on the technologies being used, a malicious user can manipulate these user-supplied inputs to cause the component to perform a function that it was not intended to perform. For example, an attacker could embed special characters in a form fieldfor example, "customer name" or "address"in addition to a system command. When the form is submitted, the component, if susceptible to this kind of attack, will be tricked into executing the system command, as it was expecting a string containing the customer's name or address, not a special sequence of characters and a system command, to be placed into the field. In this way, an attacker could gain unauthorized access to the system by using a Web server component to invoke commands on the server.
System Command Execution
Many programming languages enable a developer to execute a system command from code. Perl, for example, provides several functions for this purpose, including system(), exec(), and back-tick (') quotes. The C/C++ language also supports this type of activity through the popen() and system() commands.
Program code containing these types of instructions is a convenient way to implement scripts that need to perform functions that are already provided by the operating system or by other scripts. As a simple example, a Perl script could use the system() function to place the current time, according to the Web server, into a page that is being sent to the client browser. More complicated scripts could use system() to invoke mail programs, complex UNIX commands, or other scripts and programs.
The security implications associated with the use of these types of commands within system code can be quite serious when a script uses a portion of the form variables to invoke a system call, particularly on a UNIX-based Web server. The exact reasons for the vulnerabilities are quite specific to the script language being used; here, we consider an example that applies to Perl scripts. The basic problem from a security perspective is the fact that the execution of a system command may often lead to the creation of a subshella child process containing a command processing environmentto carry out the system call. Consider the following line of Perl code:
system("echo $mail_message_text >> tmpfile");
At first glance, this code appears to be harmless, demonstrating a quick way to create a temporary file on the server disk containing the contents of a form field. When Perl sees this command, it will create a child process on the server and, on UNIX systems, will pass the command to the shell interpreter in the child process for processing. Because this command contains the text of a user form field, the shell interpreter in the child process must examine the contents of the field prior to processing the command. This is the source of the security problem.
On a UNIX system, the shell will look for certain characters, called metacharacters, in the string that it is processing, in order for the user to specify any additional shell operations. One such character is the semicolon. Placing this character in a shell string will cause the shell to break up the command into two commands, one on each side of the semicolon. So, in our previous example, the user could enter something like the following into the form field:
; cp /etc/passwd /home/ftp/pub; echo gotcha!
During execution of the system command, the Perl interpreter will pass the following string to the child process's shell interpreter for processing:
echo ; cp /etc/passwd /home/ftp/pub; echo gotcha! >> tmpfile
The shell interpreter will break the shell string into three separate commands, as follows, based on the location of the semicolon characters:
echo cp /etc/passwd /home/ftp/pub echo gotcha! >> tmpfile
The first echo won't do much; it will simply print a blank line for no reason. The next command does the real damage, copying the system password file to the public directory of the FTP server, where the attacker can then retrieve it in order to be able to perform dictionary or brute-force password cracking against it. The last command simply writes the text gotcha! to a temporary file. Granted, this exact sequence of events can be foiled by the most basic of security measuresshadow password files, running the Web server or CGIs as a nonroot user, properly securing the FTP public directory against writesbut with a little creativity, an attacker can probably find an opening with which to make full use of this kind of exploit.
It is important to avoid user input in file accessrelated system commands in Web system components. However, if it is absolutely necessary, it is critical to check the user input for shell metacharacters before passing the input to a system call. Alternatively, it may be easier to filter for allowable characters instead of dangerous characters, as the characters you wish to allow will probably vary by situation (Garfinkel and Spafford 1997). The reason is that it is time consuming and difficult to track user input through all possible paths. In addition, it is good practice to check all inputs first, prior to performing any other component logic. Doing so protects against future changes and unforeseen use of user-supplied input from opening up future security holes.
The following characters should be considered dangerous and filtered out of all inputs, preferably by rejecting inputs that contain any of them:
&;´'\"|*?~<>^()[]{}$ \n \r
In addition, most form field values should not allow the user to specify file path operations, so the following character sequence should be checked as well:
/..
When using system commands in program scripts, it is worthwhile to bypass the creation of a subshell in the child process, when supported by the language. For example, the Perl system command supports this capability by allowing arguments to be passed as individual parameters to the system call, as follows:
system(echo, "string1", "string2");
It is also important to remember that system or third-party-supplied executables may contain flaws that may be exploitable by potential attackers. An attacker who learns that the script is invoking a potentially unsafe system executable may be able to exploit the flaws in that executable through user input (Kamthan 1999). Finally, it should be readily apparent that storing arguments to system commands or entire system command lines in hidden form fields (<INPUT TYPE="hidden">) should be avoided entirely, for the same reasons that the use of form input in system commands should be avoided.
Server-Side File Access
Server-side components often need to access files on the Web server in order to perform a task or to assist in creating a Web page for the user. This kind of file access can be initiated from just about any kind of Web system component, including ASP, Perl- or C/C++-based CGI executables, and so on. For example, a script that provides a simple Web voting or poll interface may store the votes in a text file on the server and embed the name of this file in the Web page sent to the user, most likely as a hidden form field in the voting form. This form field is sent to the server when the user votes, and the script accepts the user input and then updates the file with the new vote. This type of interaction is illustrated in Figure 3-5.
Figure 3-5 Server-Side File Access
Figure 3-6 shows how an attacker could simply save the page source to a local disk and modify the name of the file to point elsewhere on the server, perhaps to an important system configuration file or the password file, and then reload the page and submit the form. Depending on how the script is implemented, this action could damage the file's contents or possibly even display the file's contents to the attacker. Such file name vulnerabilities are a very serious security concern.
Figure 3-6 Exploiting Server-Side File Access
A slight variation of this kind of security problem pertains to the use of a hidden form field as a component of a file name. In the voting example, instead of placing the entire file name and path into the hidden form field, the name of the topic is stored in the hidden form field. This topic name is then concatenated onto the prefix of the file name, such as votes_, in order to be able to locate the correct file for this topic. In this case, the attacker could attempt to place a series of ../ character sequences onto the end of the topic name and attempt to navigate to another part of the file system on the server. This second type of exploit would be caught through the implementation of a proper user input filter, however, as discussed in the previous section. Because of the danger that this type of file access can impose, the best approach is to avoid storing file names and paths in hidden form fields or browser cookies. When storage of file names and paths is necessary, storing a file name suffix in the variable and filtering the input for file path characters as described is a much safer approach.
Buffer Overflows
One common security hole exploited on Web systems is the buffer overflow, used mostly against compiled executables, particularly operating system tools and utilities. A buffer overflow occurs when the length of a program or a function input exceeds the space allocated to store it. C/C++ programs are particularly vulnerable to this kind of attack, as developers often declare variables that reside on the program's stack. For example, consider the following C/C++ code:
void createFullName(char* firstName, char* lastName) { char fullName[1024]; strcpy(fullName, firstName); strcat(fullName, " "); strcat(fullName, lastName); }
This C++ code simply takes the supplied first and last names and puts them together, separated by a space. Of particular importance is the fullName variable. The way it is declared causes it to reside on the stack. The problem is that this variable can easily exceed 1,024 characters whenever firstName or lastName or both values are too long.
In most cases, this situation will simply cause a program crash as the stack is corrupted by the strcpy or strcat function calls. However, if these arguments are carefully crafted, they can in fact be used to send malicious code to the program, embedded in the first- and last-name arguments. If the arguments manage to overflow the fullName stack variable, they can cause the execution of this code by manipulating the return address, which also resides on the stack.
The return address is a hidden piece of data that resides on the stack with the rest of the variables passed to a function. Depending on the specific language, compiled programs place this data on the stack prior to calling the function. That way, the program knows where to go when the function is finished.
By overflowing one of the variables on the stack, a malicious input can overwrite the return address, as it exists on the stack as well. In the example of createFullName, by overflowing the firstName or lastName inputs with precisely the correct number of characters, the return address can be overwritten and made to point back to a specific place in the data that was supplied in the firstName or lastName inputs. With a little creativity, this data can be executable program code, with malicious intent. Because the return address is simply a pointer to code, the return address is pulled off the stack when the function completes and is used as the place to start executing the next sequence of instructions.2 Unfortunately, the next sequence of instructions is code written by the attacker and will be blindly executed by the server.
Once the malicious program argument has been submitted and the input buffer has been successfully overflowed, the attacker effectively has his or her own code executing on the site's Web server machine. Depending on the buffer size, a lot of bad things can happen. The code could read the contents of the password file, e-mail the file, make changes to configuration files, start up a TELNET session, or even connect to another Web server and download a larger, more damaging program, such as a Trojan horse.
Preventing buffer overflows consists mainly of checking the length of user-supplied input variables. In the previous example, limiting the size of the firstName and the lastName inputs to 511 bytes would protect against overflows (511*2 = 1,022, plus one for the space and one for the terminating "null" totals 1,024). In addition, using the strncpy and the strncat functions instead of strcpy and strcat is also advised, as the former two functions limit the number of characters copied into the buffer. Keep in mind that it is not realistic to restrict the input on the Web page or form, as a malicious user could simply alter the page on his or her local disk and remove the length restrictions on the form fields or simply access the URL of the form action directly, without using the form itself. The only sure way to prevent buffer overflows is to examine and to reject excessively long inputs in all Web system components.
Again, the most common source of buffer overflows is C/C++ code, particularly when string manipulation is involved. Scripting languages, such as Perl, and Java code are less of a risk, as they use dynamic memory management to allocate space for variables. This does not mean that input lengths should be ignored when using scripting languages or Java. It is still possible that these variables could be passed to other programs that are susceptible to buffer overflows.
Sometimes, variables are simply "passed through" a program. For example, a Perl script could be created that takes a form input, such as a customer name, and simply hands it off to another program, possibly one written in C++ or one that came with the operating system that was probably also written in C or C++. The second program in the chain may be susceptible to buffer overflows, so the attacker could still cause damage. It just wouldn't affect the Perl script, as it is passing it through. Therefore, it's important to always check input lengths regardless of language, function, and so on.