Reap the Benefits of Web Caching, Part 1: Explicit Content Expiration
- Part 1 (this article) covers the explicit content expiration model.
- Part 2 will focus on the conditional GET and the Last-Modified header.
- Part 3 will demonstrate the integration of caching concepts with a back-end
Browser Caching Challenges
If you were ever forced to browse the Web over a slow-speed link (analog dial-up or wireless connection is slow enough), you’ve probably noticed that the pages with static content (usually pages with .htm or .html file types) might appear significantly faster than equivalent dynamic pages (for example, pages with .asp, .aspx, or .php file types). Furthermore, as you navigate between pages on a web site, static pages are almost always displayed immediately, indicating that they’ve been reloaded from the browser cache, whereas the dynamic pages are usually fetched from the server, resulting in prolonged download time.
As an application developer, you might not care about visitors on slow-speed links if your applications are not targeted at consumers, where broadband penetration is still low. However, the continuous fetching of dynamic pages from the server will increase the server load and server bandwidth requirements on your end—an important factor in high-traffic environments.
Before we proceed to the technical details, you might want to check my claims. Here’s how you do it:
- Install a browser add-in that will help you watch the actual Hypertext Transfer Protocol (HTTP) headers exchanged between your browser and the web server. If you use Internet Explorer, you could use the ieHTTPHeaders toolbar; if you’re a Firefox user, download LiveHTTPHeaders. You might also find some other free or commercial debugging tools on the Internet.
- Using your selected debugging tool, check a set of static web pages and compare the HTTP requests (or lack of them) with the HTTP requests issued when viewing a set of dynamic ASP pages. (PHP is no better.)
In both cases, the web pages contain no explicit expiration date, so the browser cannot easily determine whether the cached copy is stale. All browsers defer to a heuristic approach in this case (an example is documented in the HTTP standard), trying to estimate whether it’s likely that the page has changed. Web content served from static files always contains a Last-Modified header, indicating the date/time at which the file was modified. By reviewing this header, browsers can estimate the age of the content and the likelihood that it has changed. No such information is available for dynamic pages, however, as they’re always built on the fly; the browser-caching heuristic is thus unusable.