Improving Performance with mod_perl
THE PRIMARY GOAL IN CHAPTER 2,“GETTING CONNECTED—PuttingYour Database on the Web,” was to take care of the basic aspects of getting your MySQL server to work with Apache.The important thing there was just to get data in and out of your databases over the Web, without any particular regard for performance. In this chapter, we’ll consider a simple method for improving performance that can help you throughout the rest of this book: use the Apache mod_perl module. In other words, the chapter isn’t so much about how to write scripts as about how to make them run faster. I’ll describe what mod_perl is and how it changes the way Apache handles scripts to make them execute more quickly.Then we’ll see what configuration changes are necessary for using mod_perl and discuss some guidelines for making sure your scripts run properly.
For more information about mod_perl, see Appendix B,“References and Further Reading.” You’ll probably want to consult the mod_perl Guide as a general reference. Several other mod_perl documents make informative reading, too.
What mod_perl Is and How It Works
When Apache receives a request for a static HTML file, it can serve the request directly by opening the file and writing it out over the network to the client.This is not true for programs, such as the Perl cgi-bin scripts we wrote in Chapter 2.Those scripts aren’t processed by Apache itself. Instead, Apache starts up Perl as an external process and returns its output to the requesting client.This works well for extending Apache’s capabilities, but invoking an external program taxes the Web server host and also introduces some delay into serving the request.This overhead is incurred repeatedly as script requests arrive because Apache starts up a new Perl process to handle each one.
An alternative to running scripts using external processes is to make the script handler part of Apache itself. In the case of Perl scripts, we can use the mod_perl module to embed the Perl interpreter into Apache.The result is that Apache gains the capability to execute Perl scripts directly.This approach has several advantages:
Apache can execute Perl scripts more quickly because it need not start up or wait for standalone external processes.
A given Apache process can serve many script requests because it doesn’t terminate when a script finishes; it just waits for the next request.This allows Apache to perform caching for scripts that are requested repeatedly, which results in a further performance improvement. (The script-execution process involves examining the script and compiling it to an internal form, and then running that form.When Perl runs as part of Apache, the compiled script remains loaded in memory and is immediately available for execution. It need not be recompiled if the server receives another request for it.)
Because the Perl interpreter doesn’t just exit when the script terminates the way it does when Perl is executed as a standalone process, the script-execution environment persists across scripts.This makes possible some things that can’t be done when scripts are executed individually by independent Perl processes. One of these is persistent database connections (connections to a database server that can be shared over successive scripts, minimizing overhead for setup and tear-down).
These benefits do come at a price, of course.There are also some disadvantages to using mod_perl:
Apache installation and configuration becomes more involved.
The script-execution environment persists across scripts. (Yes, I included this in the preceding list of advantages, but persistence of the Perl interpreter’s internal state also can cause problems if a script is a bad citizen that contaminates the environment of its successors.) Such scripts require some tweaking to behave better.
Embedding the Perl interpreter into Apache causes httpd processes to become larger and take more memory.You may find it prudent or necessary to perform some configuration tuning.
mod_perl scripts always run under the user and group IDs of the httpd process. You can’t execute them with the privileges of another user or group using the suEXEC mechanism the way you can with standalone scripts.
I’ve listed more disadvantages than advantages, which you may find alarming. However, most of the disadvantages are one-time issues. Apache configuration is more complicated with mod_perl, but after you get things set up the way you want them, you usually can leave your configuration alone. A script that needs some modification to run as a good citizen under mod_perl generally needs to be fixed once, not multiple times.
The benefits, on the other hand, are continuous. Having scripts run faster with mod_perl than when executed as standalone programs is a benefit you enjoy for as long as you continue to use Apache.The amount of improvement varies from site to site, but the mod_perl Guide indicates that developers report scripts running anywhere from 2 to 20 times faster than the equivalent standalone versions. See Appendix B for other reports of user experiences.
Should You Use mod_perl?
Now, having made an effort to convince you that mod_perl is a good thing, allow me to point out that you do not have to use it if you don’t want to. Indeed, if you run a low-traffic site, performance may be adequate as is, response time for clients may be perfectly satisfactory, and you may never experience any compelling reason to use mod_perl. If you don’t want to deal with mod_perl now, just skip ahead to the next chapter. If your site’s activity increases, however, you may find performance becomes an issue about which to be concerned. In that case, you can always return to this chapter and reconfigure Apache for mod_perl when you need it.
If you decide to use mod_perl and do find that it’s useful (as I expect you will), there are other Apache modules you may want to consider. Perl isn’t the only language that can be embedded into Apache as a module; languages such as PHP, Python, Ruby, and Java also can be used in module form for writing Web scripts, with advantages similar to those offered by mod_perl.
If You Are Not Using mod_perl…
You should know one thing about this chapter even if you decide to skip it for now: Most of the rest of this book assumes you’ll run your scripts under mod_perl. You can recognize such scripts, because they’ll be located in the cgi-perl directory, not in cgi-bin. To use any such script in standalone fashion (assuming it doesn’t require mod_perl, of course), just put it in your cgi-bin directory and adjust the URL accordingly.
Other Uses for mod_perl
As described thus far, mod_perl is a means for improving performance of Perl scripts on a Web site, and in fact, that’s the main reason we use it in this book. mod_perl actually is more than that, however, and the performance boost can be viewed as something of a side effect of its primary purpose.
Apache is written in C and provides a C application programming interface (API). Developers can extend Apache’s capabilities by writing modules in C that communicate through that API. But not everybody wants to write C code.This is where mod_perl comes in. Its principal function is to provide an alternative to writing in C by mapping the Apache API onto a Perl API so that you can extend Apache by writing Perl programs.That’s why mod_perl embeds the Perl interpreter into Apache—it provides a bridge between the Apache C API and programs written in Perl.The primary effect of this is to make Apache internals available through the Perl language. The secondary side effect (the one we’re actually more interested in here) is that because Perl is available immediately and has already been started up, Perl scripts execute significantly faster.
Perl programs that exploit the Perl API can do some clever things. For example, packages such as Embperl, ePerl, and Mason provide you with the ability to write HTML pages that contain embedded Perl code.They use the Perl API to access Apache’s page-processing mechanisms, which allows them to look for bits of embedded code, run the Perl interpreter to evaluate the code, and use the results in producing the page. A related module, AxKit, uses mod_perl to process XML pages. mod_perl allows AxKit to combine the power of the Apache API with Perl’s XML support to transform XML pages containing embedded Perl into a variety of formats. From the same XML source, for example, you can produce an HTML page for browser display, a printer-friendly version for hard copy, or a minimal-text version for handheld wireless devices.
Alternatives to mod_perl
mod_perl isn’t the only mechanism available for speeding up Perl script execution. Others, such as FastCGI and VelociGen, can be used instead of mod_perl or in tandem with it.You can configure Apache to use mod_perl for some scripts and FastCGI or VelociGen for others. For more information, visit www.fastcgi.com and www.velocigen.com.