Is It Worth the Effort?
Reading the description in the preceding section, you might wonder whether it’s worth the effort to implement the proper HTTP caching. In most cases (more so if you have content that doesn’t change rapidly), the answer is a resounding yes. The performance gains and speed increase observed by your visitors—particularly the unhappy ones stuck with low-speed Internet connections—are sometimes incredible, more so if you can separate the actual content (in XML format) from its presentation (in XSLT format), a technique that I’ll cover in an upcoming article. In some cases, I’ve experienced a tenfold increase in speed. Once, my web site downloaded the required information over a dial-up connection in seconds, whereas a competitive web site needed minutes to do the same task. (Well, they also had a lot of bloated HTML code.)
What About User-Specific Views?
In many cases, the content of a web page shown to a logged-in user is different from the one presented to an anonymous visitor. For example, this technique is used heavily in online solutions forums to force you to register and disclose your email address before they give you a potential answer to your problem. As the content-modification time hasn’t changed just because you’ve logged in, there’s no way for the web server or your browser to detect that the web page has changed. Fortunately, HTTP/1.1 provides a solution to this dilemma as well:
- When returning the content, the web server can use the ETag header to indicate the semantic context of a web page.
- The client has to store the value of the ETag header together with the value of the Last-Modified header.
- If the cached copy contains an ETag header together with the Last-Modified header, the client uses the If-Match header together with the If-Modified-Since header to indicate the semantic context of its cached copy.
- The server might use the If-Match value together with the If-Modified-Since value to determine whether it should send a new copy of the content or respond with status code 304.
In our example, the web application would use the user ID as the value of the ETag header (or anonymous if the user hasn’t logged in yet). The initial content would thus be served with ETag set to anonymous; when the visitor logs in, new content would be sent with a different ETag and thus replace the previously cached copy.
The ETag header is not a security measure. Once the content is cached, your visitor can access it until it expires (or even afterwards, if she switches to work offline mode).
There’s one last glitch we have to work out to complete this example: How do we force the browser to check for a changed copy of the page after the user has logged in? You might be tempted to pre-expire the content (setting the Expires header to a date in the past or setting the max-age parameter to zero), but then the browser wouldn’t cache the content, and the whole effort would be moot. The HTTP/1.1 standard again provides a clean solution: Use the must-revalidate keyword in the Cache-Control response header to ask the browser to check with the server every time the visitor wants to see the content. (Of course, the content would be downloaded only once.)
Last but not least, we don’t want public caches to store the content provided only to authenticated users, so the web server should add a Cache-Control: private header to the response.
If I’ve persuaded you that your application can benefit from HTTP caching, you have to take only two more steps:
- For each dynamic web page that you want to have cached, you have to calculate when the content was last modified.
- You have to pass the calculated modification time to a library routine. You can download VBScript code from my web site. (I’m positive that PHP programmers will be able to write the PHP version easily.)