Debugging CGI Scripts
Some people would say that the toughest step in the process of creating an application is debugging it after all the functionality that you wanted is, in theory, complete. Certainly working out all of the nagging bugs in an application can take as long as getting the main functionality written.
One problem with CGI scripts is that they exist as part of a larger system. The actual CGI program is just one component of the system; there's also the operating system, the Web server, and the network connection between the Web browser and the Web server. A problem can occur with any of these components, and one of the toughest tasks facing a CGI programmer is determining where a problem occurred.
In many cases, problems crop up in the network connection between the Web server and Web browsers. In this book, I'm not going to discuss problems caused by bad Internet connections. Instead, I'm going to focus on problems that crop up on the Web server. However, you should be mindful of the fact that in some cases, the problem lies in the network connection, or even in the configuration of the browser or computer running the browser. Sometimes, when errors in your CGI scripts are reported, it turns out that the user just didn't understand how to use the script.
Finding the Source of an Error
Before you can fix an error, you have to figure out what caused it. Because this book focuses on CGI programming and not Web server administration, I'm not concerned with problems like your Web server crashing or your router preventing incoming connections. The important point to make is that if you can't connect to the Web server at all, either because the domain name in the URL you entered can't be looked up, or the server is refusing network connections, don't blame your CGI program. You've got some problem cropping up at a lower level that has to be fixed first.
Let's assume that the Web server is up and running, and there's some problem with your CGI script. Before you start digging through your program's source code, you should verify that a few common mistakes haven't been made. These mistakes plague everyone who writes CGI scripts, even people who are old hats. You can learn a lot about what went wrong by looking at the response code that's returned with the response.
Examining the HTTP Status Code
Every HTTP response is accompanied by a status code that indicates what the result of the request was. The status code is part of the response headers that are sent from the server to the browser. The most common status code by far is 200 Success. You never see or hear about this one because it means that the request was valid and returned a successful response. Whenever you request a page that is displayed properly, the response had a status code of 200.
When the status is something other than 200 Success, the Web server generally sends an error document back with the response. Often, the response code will be displayed as part of the error document. If it is not, you have to check the server's access log to find the response code for the failed request. The most common error code encountered is 404 Not Found. This code is returned when the requested resource could not be located by the Web server. Usually this error message crops up when a user clicks on a link to a file that no longer exists.
A 403 Access Denied error is returned when a user attempts to request a file that the Web server is not allowed to read. The Web server user must have read permission for a file in order to send it to a user.
When a user tries to visit a site that is password protected using basic authentication and she enters an invalid account and password combination, she receives a 401 Unauthorized error.
None of these errors are specific to CGI programming, I'm just including them here so that you'll know what you're looking at when you see them. The most important error for CGI programmers is 500 Server Error. It indicates that something went wrong when the server tried to execute the CGI script. It doesn't necessarily mean that the program itself is broken, just that the server had trouble requesting it and getting back the proper results.
Reading the Error Log
Web servers that support CGI programs maintain an error log of some kind. Any time a request fails, the error that occurred is stored in the log so that responsible system administrators and programmers can see what went wrong when their applications failed to work properly. The error log isn't restricted to CGI-related errors. It stores any error that the Web server sends as a response to a request, so all the 404 Not Found errors and other errors associated with regular requests go there as well.
The most important feature of the error log, from a CGI programmer's standpoint, is that it goes beyond storing the error code generated by the Web server to storing the error message produced by the program itself. This works in an interesting way. When a CGI program fails, it generally displays an error message. If the error message is displayed before (or instead of) the required Content-type header, the Web server reports an error to the user (because of the missing header). When the request for the CGI program fails, the server copies some of the output of the attempt to execute the program (in other words, the CGI program's error message) into the error log. This message is generally the most important clue for determining what went wrong with the CGI program.
The error log for the Apache Web server is generally found in the logs directory under the server root and is usually named error_log. However, the name and location of the error log can be changed to something else, so you can't count on that being the case. You can generally get the location of this file from the server administrator, or if you're able, by checking the Web server's configuration files. Some service providers don't turn on error logging by default, and you may have to ask them to enable error logs for your site so that you can debug your programs more easily.
Listing 3.1 contains an excerpt from an Apache error log. I'll discuss what some of the common errors found in the error log are a bit later.
Listing 3.1 An Excerpt from an Apache Error Log
1: [Wed Dec 1 21:33:20 1999] [error] (2)No such file or directory: exec of /web/cgi-bin/bad.cgi failed 2: [Wed Dec 1 21:33:20 1999] [error] [client 220.127.116.11] Premature end of script headers: /web/cgi-bin/bad.cgi
Fixing Setup Errors
Now that you've learned how to find errors, I'm going to go over some of the common errors you might encounter and explain how to fix them. Because these common errors are generally related to features required of all CGI programs, they're easy to track down and fix. Errors in your application logic are far more insidious and can take an awful lot longer to fix than the simpler errors listed here. I will talk about some debugging techniques later that you can use to isolate these types of bugs in your code.
Setting the Proper File Permissions
One of the most common mistakes most people make when they write CGI programs is setting the file permissions improperly. CGI scripts must be executable by the user that the Web server runs as. The easiest way around this is to make sure that all your CGI programs are executable by everyone. If the Web server isn't allowed to execute the program, the response to any request for it will be 500 Server Error.
File Permissions Under UNIX-Based Operating Systems
If your CGI programs are installed on a server running some UNIX-based operating system, you can make your program executable by everyone using the following command:
You can tell if a program is executable by looking at the long version of the directory listing, which looks like this:
drwxr-xr-x 3 rafeco users 512 Dec 1 00:11 ./ drwxr-xr-x 26 rafeco users 1024 Nov 30 22:56 ../ -rwxr-xr-x 1 rafeco users 5018 Oct 27 11:21 archive.pl* lrwxrwxrwx 1 rafeco users 9 Dec 1 00:11 example.sh@ -> simple.sh drwxr-xr-x 2 rafeco users 512 Nov 30 22:56 guestbook/ -rwxr-xr-x 1 rafeco users 280 Oct 24 12:44 pinggeneric.sh* -rwxr-xr-x 1 rafeco users 666 Aug 23 12:31 sample.cgi* -rwxr-xr-x 1 rafeco users 3867 Oct 27 11:07 search.pl* -rwxr-xr-x 1 rafeco users 156 Aug 22 23:51 simple.sh*
The file permissions are cryptically expressed by the string -rwxr-xr-x. Let me explain how this string is decoded. The first character, a dash in this case, indicates what type of file the current file is. The dash indicates that this is a normal file. Directories have a d in this space, and symbolic links have an l. The next nine characters are used to display the access permissions for the file.
The characters are divided into three groups of three permissions. There are three sets of people that can be granted permission for a file, and there are three types of permission for each file. From left to right, the three sets are user, group, and others.
The user permissions pertain to the owner of the file. The owner is listed in the third column of the long directory listing. The group permissions apply to the members of the group to which the file is assigned. The group associated with a file is listed in the fourth column of a long directory listing. The last group of permissions is for others. The others permissions apply to all the system's users.
When you attempt to access a file, the permissions for the most restrictive set of users of which you are a member apply to you. In other words, the others permissions are only used if you're neither the owner of the file nor a member of the group associated with it. Similarly, the group permissions do not apply to the file's owner, even for a member of the group associated with the file.
Now let me explain what the individual permissions mean. As I said before, there are three permissions for each set of users. The permissions are read, write, and execute. If that set of users has the permission, the appropriate letter will appear in that space. If they do not, a dash will appear there instead. For example, if the owner has the permissions rw-, he can read and write the file, but not execute it. Similarly, if others have r-x permissions, they can read and execute the file, but not modify it.
Permissions are slightly different for directories. The names of the permissions read, write, and execute are the same, but how they work differs. If you have read permission for a directory, you're allowed to list files in that directory. If you have execute permission for a directory, you can make it your current directory. If you have write access to a directory, you can create files in that directory. It's possible to access files in a directory if you have execute permission for the directory but not read permission. You just have to be allowed access to the file, and you have to know the name of it because you aren't allowed to get a listing.
So now let's go back and look at the full file permissions for a file. If a file has the permissions -rwxr-xr-x, it's a normal file, and the owner has read, write, and execute permission for the file. Both the group and others have read and execute permission, but not write permission.
Checking Your Headers
A common pitfall for CGI programmers is forgetting to include the code that produces the content-type header; instead the program goes straight to generating HTML or other output, causing an error.
Anytime your program won't execute because of syntax errors, or because the Web server is unable to call it, you'll generally get an error complaining that the script didn't print the proper header. As described earlier, the script probably sent an error message instead, so you should look in the error log to find out what the error was.
Checking the Path to Your Script Interpreter
One very common problem you'll often find when you move scripts from one server to another, or you install a script downloaded from the Internet, is that the path to the Perl interpreter (or whatever script interpreter you're using) is wrong in the shebang line.
If the path to your script interpreter is incorrect, instead of seeing the output of your script, you'll see something like the following in the error log:
[Wed Dec 1 21:33:20 1999] [error] (2)No such file or directory: exec of /web/cgi-bin/bad.cgi failed [Wed Dec 1 21:33:20 1999] [error] [client 18.104.22.168] Premature end of script headers: /web/cgi-bin/bad.cgi
If you encounter one of these errors, you just need to enter the right path in the shebang line. To do so, find the directory where your script interpreter is installed, and use that path in your program.
If you have a UNIX shell account on the server where the CGI program resides, you can often find the location of executable files using the which command. The which command searches all the directories in your path for the program you specify, and tells you where it is. For example, the following sequence illustrates how which is used to find the Perl interpreter
$ which perl /usr/local/bin/perl