Mac OS X Unleashed

Mac OS X Unleashed

By John Ray and William C. Ray

Programming CGIs in Perl

This chapter assumes that you either know a reasonable amount of Perl basics, or have diligently read the introduction to Perl scripts in Chapter 22, "Perl Scripting and SQL Connectivity." In addition, you must have set ExecCGI permissions for the directory you are programming in and the CGI AddHandler directive uncommented, as described in Chapter 27, "Web Serving."

Let's start with the most basic example possible—Hello World. Your initial reaction is probably (hopefully!) to create a Perl script (helloworld.cgi) along the lines of

#!/usr/bin/perl
print "Hello World! I have a Mac, do you?";
exit;

After enabling execution (chmod +x helloworld.cgi), try running the application from the command line (./helloworld.cgi), and then by accessing its URL through a Web browser. Although the command-line version runs fine, the browser will report an execution error message, as shown in Figure 28.2.

28fig02.jpg

Figure 28.2 A simple Hello World isn't quite so simple.

So, what went wrong? Why is this program, which runs perfectly from a command prompt, broken when it tries to send its results over the Web?

The answer lies in the way the Web servers communicate their results back to a client browser.

HTTP Headers

For the simple Hello World application to work, it must produce the sort of output that a Web browser expects. To the browser, it should send the same response as when a standard .html static Web page is loaded. The easiest way to see that response is to generate one manually by using telnet to connect to a Web server and request a page. For example, to retrieve the primary page from the local Mac OS X box, you would telnet to localhost (or 127.0.0.1) on port 80, and then use GET / HTTP/1.0 to retrieve the root level of the Web site:

[localhost:~] jray% telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.columbus.rr.com.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.1 200 OK
Date: Thu, 24 May 2001 00:55:56 GMT
Server: Apache/1.3.14 (Darwin)
Content-Location: index.html.en
Vary: negotiate,accept-language,accept-charset
TCN: choice
Last-Modified: Fri, 31 Mar 2000 01:45:46 GMT
ETag: "3c390-54e-38e4034a;3aac49b0"
Accept-Ranges: bytes
Content-Length: 1358
Connection: close
Content-Type: text/html
Content-Language: en
Expires: Thu, 24 May 2001 00:55:56 GMT

There are quite a few interesting lines in the group of headers that are returned, such as the language content and an expiration date (used to keep a page from being cached beyond a certain day and time). Only one of these headers, however, is required.

The Content-type header tells the remote Web browser what MIME type of file it is about to receive. When a user requests a JPEG image file, the server sends a header that reads

Content-type: image/jpeg

Each type of file has a different MIME type (determined by the file /private/etc/mime.types) The server can decide what type of file it is about to serve based on the filename. Unfortunately, when working with CGIs, the Web server cannot be certain what type of information is going to be sent back. In fact, a single CGI could very easily send an image with one request, and an HTML page with another.

To create a fully working CGI, the first thing that the Web application must send is an appropriate MIME type. The initial version of helloworld.cgi did nothing but print out the Hello World message. The browser, however, was expecting a Content-type header; when the header didn't appear, an error was generated. To correct the problem, the Content-type must be printed before any other output occurs:

#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "Hello World! I have a Mac, do you?";
exit;

After making the small change to the script, this smallest of Web applications will happily run, as demonstrated in Figure 28.3.

28fig03.jpg

Figure 28.3 When the appropriate header is added to the CGI script, everything works as planned.

HTML Output

Creating the output of a CGI is the second step of developing a Web application. Unlike normal Perl scripts that produce plain text output, Web applications produce HTML. This can take awhile to get used to, but keep in mind that the goal is to produce a dynamic Web page, not a plain text file.

When creating output from a CGI script, you can use any tags that you normally would in an HTML document. The trouble with doing this in Perl is that you have to escape all quotes when printing the HTML.

For example:

<TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0">

When printed in Perl, this becomes

print  "<TABLE BORDER=\"0\" CELLPADDING=\"0\" CELLSPACING=\"0\">";

When creating complicated output, this can get a bit tedious. It can also lead to programmers taking shortcuts and leaving out quotes around HTML tag attributes. The easiest way to display large amounts of complex HTML is to use Perl's alternative print method:

print <<ENDOFHTML;
   <TABLE BORDER="0" CELLPADDING="0" BGCOLOR="#FFDDDD" CELLSPACING="0">
   <TR><TD align="right">This is more HTML</TD></TR>
   </TABLE>
ENDOFHTML

So, let's go ahead and take a look at an example of CGI output in action. This is CGI output, so don't think that you won't be able to get information into your Web application. We're going to get there, just be patient!

Let's start with something simple, such as creating a script that will display all the images and the associated filenames in a given folder.

To start the CGI, build a simple Perl script that lists all the JPEG (.jpg) files in a folder. Listing 28.1 shows such as script.

Example 28.1. When Building a CGI, It's Often Easiest to Start with Something That Runs from the Command Line

1: #!/usr/bin/perl
2:
3: $imagedir="imagefolder";
4: @imagelist=glob("$imagedir/*jpg");
5:
6: for ($x=0;$x<@imagelist;$x++) {
7:         $imagename=$imagelist[$x];
8:         print "Image $x = $imagename\n";
9: }

Line 3 sets the variable $imagedir to the directory that contains the images. In this case, I'm using imagefolder inside my Sites directory, which is also where this script is located. I have not specified the entire path because I'm only interested in the location of the images relative to the script.

Line 4 loads all the filenames within $imagedir that end in .jpg into the array @imagelist. The Perl glob function takes a path and filename pattern as input, and then returns any results that match.

Lines 6–9 loop through each element in the @imagelist array, temporarily storing them in the $imagename variable. Print a line that displays the image and its name.

When run, the CGI-in-the-making, which I've named showimages.cgi, produces the list we were hoping for:

[localhost:~/Sites] jray% ./showimages.cgi
Image 0 = imagefolder/4jr2.jpg
Image 1 = imagefolder/BLvividLotuses1600x1024.jpg
Image 2 = imagefolder/BLyellows1600x1024.jpg
Image 3 = imagefolder/berries.jpg
Image 4 = imagefolder/bluesilk.jpg
Image 5 = imagefolder/door.jpg
Image 6 = imagefolder/flower.jpg
Image 7 = imagefolder/flowers.jpg
Image 8 = imagefolder/forjohn.jpg
Image 9 = imagefolder/funnyflow.jpg
Image 10 = imagefolder/snow.jpg
Image 11 = imagefolder/snowstorm.jpg

So, how can this be translated into a CGI that displays the actual images in a Web browser? The first step, as mentioned earlier, is to produce a Content-type header. Without this information, the browser has no idea what type of data it is receiving. At the same time, it's a good idea to translate any \n (newline) characters in the program into their HTML equivalent: <br>. Listing 28.2 shows the new CGI, which is capable of running in a browser.

Example 28.2. Adding a Content-Type and Fixing Line Breaks Is All You Need to Turn a Simple Command-Line Script into a CGI

1: #!/usr/bin/perl
2: print "Content-type: text/html\n\n";
3: $imagedir="imagefolder";
4: @imagelist=glob("$imagedir/*jpg");
5:
6: for ($x=0;$x<@imagelist;$x++) {
7:         $imagename=$imagelist[$x];
8:         print "Image $x = $imagename<br>";
9: }

Figure 28.4 shows the result of running the new CGI in a Web browser.

28fig04.jpg

Figure 28.4 The command-line application now runs within a Web browser.

Unfortunately, things still aren't quite where we want them. What good is a CGI that lists pictures but doesn't display them? To be able to show the pictures, the CGI must be modified so that the name is used within an <IMG> (image) tag, rather than just displayed on the screen. Try adding a new line that uses an image, rather than the image name, as seen in Listing 28.3.

Example 28.3. The Revised Code Will Display an Image as Well as Its Name

1: #!/usr/bin/perl
2: print "Content-type: text/html\n\n";
3: $imagedir="imagefolder";
4: @imagelist=glob("$imagedir/*jpg");
5:
6: for ($x=0;$x<@imagelist;$x++) {
7:         $imagename=$imagelist[$x];
8:         print "<IMG SRC=\"$imagename\" width=\"120\" height=\"90\"><br>";
9:         print "Image $x = $imagename<br>";
10: }

Line 8 performs the magic in the application. Using the same $imagename variable used to print an image's name (now in line 9), the variable is instead used to set an image source within an <IMG> tag. I've also added a width and height to the image tag to maintain some consistency in the display.

When viewed in a Web browser, the result resembles Figure 28.5.

28fig05.jpg

Figure 28.5 With the addition of the IMG tag, the images themselves can now be seen in the listing.

Hopefully, by now, you're starting to see the method to the madness. CGIs are just applications that write HTML as their output. The example we've been looking at is barely modified from the original command-line version, yet it includes full images for each file it finds. To fully realize the potential of a CGI, you must use HTML to its fullest. So far, the Perl script we've been developing is nothing but a simple port of the initial command-line utility. With only a small amount of work, we can turn it into something far more useful. Listing 28.4 shows a more developed version of the application. Unlike the previous version of the CGI, this revision uses an HTML table to structure the layout of the images.

Example 28.4. With a Little Work, the CGI Can Take Advantage of All of HTML's Layout Capabilities

1: #!/usr/bin/perl
2: print "Content-type: text/html\n\n";
3:
4: $imagedir="imagefolder";
5: $columns=3;
6:
7: @imagelist=glob("$imagedir/*jpg");
8:
9: print "<TABLE BGCOLOR=\"#FFFFFF\" BORDER=\"1\" BORDERCOLOR=\"#000000\">";
10: while ($x<@imagelist) {
11:     print "<TR>";
12:     for ($y=0;$y<$columns;$y++) {
13:         $imagename=$imagelist[$x];
14:         if ($x<@imagelist) {
15:             $x++;
16:             print "<TD align=\"center\">";
17:            print "<IMG SRC=\"$imagename\" width=\"120\" height=\"90\"><br>";
18:             $imagename=~s/$imagedir\///;
19:             print "<FONT TYPE=\"Arial\">$imagename";
20:             print "</TD>";
21:         }
22:     }
23:     print "</TR>";
24: }
25: print "</TABLE>";

Line 5 sets a limit for the number of columns in the table (how many images will be displayed in a single line), whereas line 9 sets up the table structure using a table with a white (#FFFFFF) background and a black (#000000) border. In line 10, instead of using a for loop to go through each image, the counter $x is incremented when an image tag is output. The while loop will continue as long as the counter is less than the total number of images.

Line 11 starts a new table row (<TR>). Lines 12–22 loop through the number of columns set for the table. For each column, increment the variable $x. If $x has not exceeded the total number of images available, output a table data cell (<TD>) that contains the image and its name. Line 18 removes the path from the image filename. This is done using a simple Perl regular expression search and replace. After displaying all the data cells for a row, line 23 ends the table row (</TR>). Line 24 repeats lines 11–23 until all images have been displayed, and line 25 ends the table (</TABLE>).

Figure 28.6 shows the output from the finalized CGI.

28fig06.jpg

Figure 28.6 The final version of the CGI outputs the image directory in a nicely formatted table.

CGI Input

This quick-and-dirty image viewer provides a reasonable start to CGI programming, but it is lacking in the one area that can be used to create truly dynamic and user-driven sites—user input. Getting input into a CGI can be a bit of a challenge if you're starting from scratch.

Thankfully, others have been here before, so there are some well-developed routines that will help you get data into your CGIs in only a few minutes. Listing 28.5 contains the additional code that you'll need for parsing input from URLs and HTML forms.

Example 28.5. These Functions Enable Data Input from Remote Browsers

1: sub MethGet {
2:   return ($ENV{ 'REQUEST_METHOD'}  eq "GET");
3: }
4:
5: sub MethPost {
6:   return ($ENV{ 'REQUEST_METHOD'}  eq "POST");
7: }
8:
9:  sub ReadParse {
10:   local (*variable) = @_ if @_;
11:   local ($i, $key, $val);
12:
13:   if (&MethGet) {
14:     $variable = $ENV{ 'QUERY_STRING'} ;
15:   }  elsif (&MethPost) {
16:     read(STDIN,$variable,$ENV{ 'CONTENT_LENGTH'} );
17:   }
18:
19:   @variable = split(/[&;]/,$variable);
20:
21:   foreach $i (0 .. $#variable) {
22:     $variable[$i] =~ s/\+/ /g;
23:     ($key, $val) = split(/=/,$variable[$i],2); # split on the equal sign
24:     $key =~ s/%(..)/pack("c",hex($1))/ge;
25:     $val =~ s/%(..)/pack("c",hex($1))/ge;
26:     $variable{ $key}  .= "\0" if (defined($variable{ $key} ));
27:     $variable{ $key}  .= $val;
28:   }
29:   return scalar(@in);
30: }

You can place this code anywhere you want within your CGI. To save some space, place it in its own file, such as cgiinput.pl. You can then require this file at the start of any CGI that needs to access the routines. You will also need to add a line 1; at the bottom of the file. This is a requirement for any included Perl libraries—it produces a value of true when the file is read. If the line is missing, the CGI will exit with an error.

There are three functions in the code:

The MethGet and MethPost functions (lines 1–7 of Listing 28.5) should be mostly self-explanatory. Each checks the environment variable REQUEST_METHOD to determine how data was transferred to the CGI. This variable is automatically set by Apache. If the GET method was used, MethGet returns true. If POST is used, MethPost is true.

Because the real work occurs in ReadParse, let's take a look at how it works its magic:

Lines 13–17 of Listing 28.5 check MethGet to see whether GET was used to send the data. If it was, the variable $variable is set equal the contents of the environment variable QUERY_STRING. If the MethPost function returns true, $variable is instead filled by reading in a number of bytes from standard input. The amount of input is determined by the environment variable CONTENT_LENGTH.

Line 19 creates an array @variable that contains each of the variable=value pairs.

Lines 21–28 loop through each of the variable pairs, extracting the key and value, and storing them in the %variable associative array.

Line 22 coverts any encoded spaces (+) in the variable to real spaces. Line 23 sets $key and $val equal to the incoming variable and value, respectively, and line 24 decodes any hex characters in $key. For example, a space is encoded as %20 (hex 20 = 2x16 = 32 ASCII).

Line 25 decodes any hex characters in the variable $val. If the key in the %variable array is already defined, line 26 adds a NULL (for separating multivalue fields, such as <SELECT> fields) and line 27 stores the key and value in the %variable associative array. If you found that enlightening, great! If not, don't worry, there's no real need to know too much about these functions beyond typing them in and saving them. To read input into a CGI application, use Rea d Parse on a line by itself—that's all there is to it.

Let's take a look at practical CGI input by altering the Hello World application we used previously so that it personalizes the message. If your name happens to be World, you might skip this exercise. Listing 28.6 shows the helloworld.cgi modified to display a person's name. I'll refer to this new version as helloworld2.cgi.

Example 28.6. Using the ReadParse Function, Any Script Can Receive Input

1: #!/usr/bin/perl
2: require "cgiinput.pl";
3:
4: &ReadParse;
5: $myname=$variable{ "name"} ;
6:
7: print "Content-type: text/html\n\n";
8: print "Hello $myname! I have a Mac, do you?";

Although mostly apparent, the breakdown of the code is as follows:

Line 2 loads the input functions defined earlier in this section. The cgiinput.pl file must exist in the same directory as the CGI in order for the require statement to work.

Line 4 uses the ReadParse function to load the %variable associative array with any incoming variables and values. Line 5 sets the variable $myname to the submitted variable name. Line 7 sends the required content-type, and line 8 prints a greeting containing the name submitted to the CGI in the name variable.

As you can see, the number of changes to the original application is very small. This CGI should now correctly allow a name to be sent to it for use in a customized greeting. The problem remains, however, how do you go about actually sending the variable and value to the application?

Because the ReadParse routing handles either POST or GET method transmission, there are two ways that this new CGI can be called. Using the URL to pass a variable is the easiest, so let's start there. Start a Web browser and enter the URL for the new CGI, adding ?name=John (or whatever is appropriate for you) to the end:

http://<your host>/<your cgi path>/helloworld2.cgi?name=<your name>

My test system, for example, looks like this:

http://primal.ag.ohio-state.edu/~jray/bookstuff/helloworld2.cgi?name=john

Figure 28.7 shows the new personalized message.

28fig07.jpg

Figure 28.7 Providing an input method for CGIs enables you to customize their output.

To use the POST method to send information to the CGI, create an HTML form that will submit its data to the Web application. For helloworld2.cgi, the form needs nothing more than a name field and a submit button:

<form action="helloworld2.cgi" method="post">
Enter your name: <input type="text" name="name">
<input type="submit" name="submit">
</form>

Save the form code in a new HTML file (hello.html) in the same directory as the he l loworld2.cgi. Open the new Web page in your browser, type a name, and click Submit. You should see results almost identical to the earlier URL-based input seen in Figure 28.7.

As it stands, if using a separate HTML page to submit information to the CGI, two files comprise the entire project: helloworld2.cgi and hello.html. This isn't excessive, but it can be consolidated. Rather than hello.html containing the form, it can be added directly to helloworld2.cgi. Listing 28.7 consolidates the form and application into a single CGI file.

Example 28.7. A CGI Can Encapsulate HTML and Application Logic

1: #!/usr/bin/perl
2: require "cgiinput.pl";
3:
4: &ReadParse;
5: $myname=$variable{ "name"} ;
6: print "Content-type: text/html\n\n";
7:
8: if ($myname eq "") {
9:     print <<ENDOFHTML;
10:         <form action="helloworld2.cgi" method="post">
11:         Enter your name: <input type="text" name="name">
12:         <input type="submit" name="submit">
13:         </form>
14: ENDOFHTML
15:     exit;
16: }
17:
18: print "Hello $myname! I have a Mac, do you?";

Consolidating the code into the single CGI brings into play some of the session management techniques discussed earlier in the chapter. This revision of helloworld2.cgi has two states—prior to entering the name and after entering the name. To determine what the program should be doing, it checks the value of $myname—if a name hasn't been set, the HTML form should be displayed. If a name is defined, the Hello message is shown. A more detailed analysis of the changes follows:

Line 8 checks to see whether the C variable is empty. If it is, this is the first time the CGI has been executed—the user hasn't entered his name yet.

Lines 9–14 display the HTML form, and line 15 exits the CGI. This line is more important than it might appear. If it is not included, the CGI will continue to execute after displaying the HTML form; this will generate an empty hello message immediately following the form. Finally, line 18 displays the hello message with the user's name.

This demonstrates the fundamental workings of CGI applications. Although the example is only a two-step process, it could easily be extended to multiple steps by passing data from screen to screen. For an encore, let's add another form to the hello page that collects the user's age. After submitting this second form, a third page is shown with the user's name, age, and a few comments. Listing 28.8 shows the final version of this overly long Hello World application.

Example 28.8. The Extended Version of Hello World Now Includes Three Steps and Demonstrates CGI Input and Variable Passing

1: #!/usr/bin/perl
2: require "cgiinput.pl";
3:
4: &ReadParse;
5: $myname=$variable{ "name"} ;
6: $myage=$variable{ "age"} ;
7:
8: print "Content-type: text/html\n\n";
9:
10: if ($myname eq "") {
11:     print <<ENDOFHTML;
12:         <form action="helloworld2.cgi" method="post">
13:         Enter your name: <input type="text" name="name">
14:         <input type="submit" name="submit">
15:         </form>
16: ENDOFHTML
17:     exit;
18: }
19:
20: if ($myage eq "") {
21:     print "Hello $myname!";
22:     print "<BR>";
23:     print <<ENDOFHTML2;
24:         <form action="helloworld2.cgi" method="post">
25:         Enter your age: <input type="text" name="age"><br>
26:         <input type="hidden" name="name" value="$myname">
27:         <input type="submit" name="submit">
28:         </form>
29: ENDOFHTML2
30:     exit;
31: }
32:
33: $dayage=$myage*365;
34: $hourage=$dayage*24;
35: $minage=$hourage*60;
36: print "Hello again $myname!<BR>";
37: print "You have lived for $dayage days...<br>";
38: print "... or $hourage hours...<br>";
39: print "... or $minage minutes!<br>";

This final revision adds an additional form and output screen. Lines 20–31 display the standard hello message, but also show a form where the user is prompted for his age. What makes this form unique is that it includes a hidden name field that is set to the original $myname value. This shows how information can be carried from page to page.

The final page, generated in lines 33–39, calculates a user's name in days, hours, and minutes. This demonstrates that the name has indeed been carried through each of the CGI screens.

As an exercise, you might want to try adding a search screen to the image catalog creator that was built earlier in the chapter. Suppose, for instance, that there are multiple image folders to view, a need for the number of columns to be adjusted, or even searching based on the image filename—these features can all be added very easily to the application. Listing 28.9 is a two-step version of the image catalog application.

Example 28.9. This New Version of the Image Catalog CGI Now Offers Searching and Display Settings

1: #!/usr/bin/perl
2:
3: require "cgiinput.pl";
4: &ReadParse;
5: $imagedir=$variable{ "imagedir"} ;
6: $imagename=$variable{ "imagename"} ;
7: $columns=$variable{ "columns"} ;
8: $match=$variable{ "match"} ;
9: if ($imagedir=~/\//) {  $imagedir="imagefolder"; }
10: if ($imagename=~/\//) {  $imagename=""; }
11:
12: print "Content-type: text/html\n\n";
13: if ($imagedir eq "") {
14:      print <<ENDOFHTML;
15:         <form action="showimages5.cgi" method="post">
16:Choose image dir: <input type="text" name="imagedir" value="imagefolder"><br>
17:         Select the number of columns in the display: <select name="columns">
18:             <option>1</option>
19:             <option>2</option>
20:             <option>3</option>
21:             <option>4</option>
22:         </select><br>
23:         Show images that match: <input type="text" name="match">
24:         <input type="submit" name="submit">
25:         </form>
26: ENDOFHTML
27: }
28:
29: @imagelist=glob("$imagedir/*$match*jpg");
30:
31: print "<TABLE BGCOLOR=\"#FFFFFF\" BORDER=\"1\" BORDERCOLOR=\"#000000\">";
32: while ($x<@imagelist) {
33:     print "<TR>";
34:     for ($y=0;$y<$columns;$y++) {
35:         $imagename=$imagelist[$x];
36:         if ($x<@imagelist) {
37:             $x++;
38:             print "<TD align=\"center\">";
39:            print "<IMG SRC=\"$imagename\" width=\"120\" height=\"90\"><br>";
40:             $imagename=~s/$imagedir\///;
41:             print "<FONT TYPE=\"Arial\">$imagename";
42:             print "</TD>";
43:         }
44:     }
45:     print "</TR>";
46: }
47: print "</TABLE>";

The only modifications to the original image catalog are the addition of lines 3–27. The rest remains the same.

Lines 3–4 load the cgiinput library, and then use the ReadParse function to read any submitted form information.

Lines 5–8 store values for the columns to display, image directory to use, and a string to search for in the image names.

Lines 9–10 are very important. When processing user input an application can never trust the incoming data. If the image catalog blindly accepted an arbitrary path, it could pose a serious security risk and give the user access to other parts of the file system. For that reason, any input that includes a / is disregarded. This eliminates the potential for the user to input any path information.

If an image directory has not been set (such as the application has not received the search criteria yet), lines 13–27 display a search form. This is a simple HTML form that includes elements for setting the image directory, number of columns, and a search string for the image name.

A modification to the original glob, this line 29 variation adds the $match string to the pattern, displaying only images that match the specified string.

By now, you should have a grasp of the basics of CGI programming, and how Perl can be used to create quick-and-dirty Web applications.

Although Perl is certainly capable of generating large-scale applications, it isn't necessarily the best choice in terms of speed and ease of use. It's time to look at something a bit more suited to Web development: PHP.

Share ThisShare This

Informit Network