2.2 Two-Phase Operation
Apache operation proceeds in two phases: start-up and operational. System start-up takes place as root, and includes parsing the configuration file(s), loading modules, and initializing system resources such as log files, shared memory segments, and database connections. For normal operation, Apache relinquishes its system privileges and runs as an unprivileged user before accepting and processing connections from clients over the network. This basic security measure helps to prevent a simple bug in Apache (or a module or script) from becoming a devastating system vulnerability, like those exploited by malware such as "Code Red" and "Nimda" in MS IIS.
This two-stage operation has some implications for applications architecture. First, anything that requires system privileges must be run at system start-up. Second, it is good practice to run as much initialization as possible at start-up, so as to minimize the processing required to service each request. Conversely, because so many slow and expensive operations are concentrated in system start-up, it would be hugely inefficient to try to run Apache from a generic server such as inetd or tcpserver.
One non-intuitive quirk of the architecture is that the configuration code is, in fact, executed twice at start-up (although not at restart). The first time through checks that the configuration is valid (at least to the point that Apache can successfully start); the second pass is "live" and leads into the operational phase. Most modules can ignore this behavior (standard use of APR pools ensures that it doesn't cause a resource leak), but it may have implications for some modules. For example, a module that dynamically loads new code at start-up may want to do so just once and, therefore, must use a technique such as setting and checking a static flag to ensure that critical initialization takes place just once.
2.2.1 Start-up Phase
The purpose of Apache's start-up phase is to read the configuration, load modules and libraries, and initialize required resources. Each module may have its own resources, and has the opportunity to initialize those resources. At start-up, Apache runs as a single-process, single-thread program and has full system privileges.
Apache's main configuration file is normally called httpd.conf. However, this nomenclature is just a convention, and third-party Apache distributions such as those provided as .rpm or .deb packages may use a different naming scheme. In addition, httpd.conf may be a single file, or it may be distributed over several files using the Include directive to include different configuration files. Some distributions have highly intricate configurations. For example, Debian GNU/Linux ships an Apache configuration that relies heavily on familiarity with Debian, rather than with Apache. It is not the purpose of this book to discuss the merits of different layouts, so we'll simply call this configuration file httpd.conf.
The httpd.conf configuration file is a plain text file and is parsed line-by-line at server start-up. The contents of httpd.conf comprise directives, containers, and comments. Blank lines and leading whitespace are also allowed, but will be ignored.
Most of the contents of httpd.conf are directives. A directive may have zero or more arguments, separated by whitespace. Each directive determines its own syntax, so different directives may permit different numbers of arguments, and different argument types (e.g., string, numeric, enumerated, Boolean on/off, or filename). Each directive is implemented by some module or the core, as described in Chapter 9.
LoadModule foo_module modules/mod_foo.so
This directive is implemented by mod_so and tells it to load a module. The first argument is the module name (string, alphanumeric). The second argument is a filename, which may be absolute or relative to the server root.
This directive is implemented by the core, and sets the directory that is the root of the main document tree visible from the Web.
SetEnv hello "Hello, World!"
This directive is implemented by mod_env and sets an environment variable. Note that because the second argument contains a space, we must surround it with quotation marks.
This directive is implemented by mod_choices (Chapter 6) and activates that module's options.
A container is a special form of directive, characterized by a syntax that superficially resembles markup, using angle brackets. Containers differ semantically from other directives in that they comprise a start and an end on separate lines, and they affect directives falling between the start and the end of the container. For example, the <VirtualHost> container is implemented by the core and defines a virtual host:
<VirtualHost 10.31.2.139> ServerName www.example.com DocumentRoot /usr/www/example ServerAdmin firstname.lastname@example.org CustomLog /var/log/www/example.log </VirtualHost>
The container provides a context for the directives within it. In this case, the directives apply to requests to www.example.com, but not to requests to any other names this server responds to. Containers can be nested unless a module explicitly prevents it. Directives, including containers, may be context sensitive, so they are valid only in some specified type of context.
Any line whose first character is a hash is read as a comment.
# This line is a comment
A hash within a directive doesn't in general make a comment, unless the module implementing the directive explicitly supports it.
If a module is not loaded, directives that it implements are not recognized, and Apache will stop with a syntax error when it encounters them. Therefore mod_so must be statically linked to load other modules. This is pretty much essential whenever you're developing new modules, as without LoadModule you'd have to rebuild the entire server every time you change your module!
2.2.2 Operational Phase
At the end of the start-up phase, control passes to the Multi-Processing Module (see Section 2.3). The MPM is responsible for managing Apache's operation at a systems level. It typically does so by maintaining a pool of worker processes and/or threads, as appropriate to the operating system and other applicable constraints (such as optimization for a particular usage scenario). The original process remains as "master," maintaining a pool of worker children. These workers are responsible for servicing incoming connections, while the parent process deals with creating new children, removing surplus ones as necessary, and communicating signals such as "shut down" or "restart."
Because of the MPM architecture, it is not possible to describe the operational phase in definite terms. Whereas the standard MPMs use worker children in some manner, they are not constrained to work in only one way. Thus another MPM could, in principle, implement an entirely different server architecture at the system level.
There is no shutdown phase as such. Instead, anything that needs be done on shutdown is registered as a cleanup, as described in Chapter 3. When Apache stops, all registered cleanups are run.