Reap the Benefits of Web Caching, Part 2: Reduce the Download Time
Content Caching in Action
In part 1 of this series, we barely scratched the surface of the Hypertext Transfer Protocol (HTTP) caching architecture by providing explicit content expiration dates, thus making it easy for browsers and proxy caches to decide whether the cached copy is still valid or whether they should download a newer copy from the web server. In this article, you’ll learn how you can use the conditional GET feature built into the HTTP protocol to further decrease the download time of your web pages.
Let’s assume that a visitor returns to your web site and the cached content in her browser’s cache has already expired. When the browser checks the web server for a new copy of the expired content, the content on the server might not have changed. In that case, it doesn’t make sense for the server to return the same content again, which would waste the server bandwidth and slow down the visitor’s experience. The designers of HTTP recognized this challenge early on and provided HTTP with a powerful caching mechanism:
- Whenever a web server provides the content to a client (usually a web browser), it computes the content’s modification date and time and indicates it with the Last-Modified HTTP header.
- If the client caches the content, it also stores the value of the Last-Modified header in the cache.
- If the client decides that the cached copy of the content has expired (based on the Expires header, the max-age option of the Cache-Control header, or its internal heuristic algorithms), it provides the modification date and time of its cached copy using the If-Modified-Since HTTP header in the HTTP request. (Literally, "Only serve me the content if it has been modified since the last time I’ve seen it.")
- If the server discovers that the content’s modification date and time is different from the one provided by the client, it sends the modified content (with a new Last-Modified header) with status code 200 (OK); otherwise, it responds with the status code 304 (not modified).
These actions are performed by all web servers automatically when they serve static files. (The file-modification timestamp is used as the content-modification date and time.) Most scripting environments (for example, ASP or PHP) provide no support whatsoever for dynamic web pages—you have to do all the work in your script.
If you decide to implement HTTP caching for your dynamic web pages, you have to compute the actual content-modification time carefully for the solution to work properly. If you’ve misjudged the content-expiration date, visitors will still see the old content if they decide to force a page reload; if you fail to adjust the Last-Modified header accurately, visitors have no way of getting a new copy of the content.