15.8 Validators and Freshness Look back at Figure 15-8 The client does not initially have a copy of the resource, so it sends a request to the server asking for it The server responds with Version 1 of the resource The client can now cache this copy, but for how long? Once the document has "expired" at the client (i.e., once the client can no longer consider its copy a valid copy), it must request a fresh copy from the server If the document has not changed at the server, however, the client does not need to receive it again—it can just continue to use its cached copy This special request, called a conditional request, requires that the client tell the server which version it currently has, using a validator, and ask for a copy to be sent only if its current copy is no longer valid Let's look at the three key concepts—freshness, validators, and conditionals—in more detail 15.8.1 Freshness Servers are expected to give clients information about how long clients can cache their content and consider it fresh Servers can provide this information using one of two headers: Expires and Cache-Control The Expires header specifies the exact date and time when the document "expires"—when it can no longer be considered fresh The syntax for the Expires header is: Expires: Sun Mar 18 23:59:59 GMT 2001 For a client and server to use the Expires header correctly, their clocks must be synchronized This is not always easy, because neither may run a clock synchronization protocol such as the Network Time Protocol (NTP) A mechanism that defines expiration using relative time is more useful The CacheControl header can be used to specify the maximum age for a document in seconds—the total amount of time since the document left the server Age is not dependent on clock synchronization and therefore is likely to yield more accurate results The Cache-Control header actually is very powerful It can be used by both servers and clients to describe freshness using more directives than just specifying an age or expiration time Table 15-3 lists some of the directives that can accompany the Cache-Control header Table 15-3 Cache-Control header directives Directive Message type Description no-cache Request Do not return a cached copy of the document without first revalidating it with the server no-store Do not return a cached copy of the document Do not store the response from the server Request max-age Request The document in the cache must not be older than the specified age The document may be stale based on the server-specified max-stale Request expiration information, but it must not have been expired for longer than the value in this directive The document's age must not be more than its age plus the min-fresh Request specified amount In other words, the response must be fresh for at least the specified amount of time noRequest The document must not be transformed before being sent transform only-ifcached Request Send the document only if it is in the cache, without contacting the origin server public Response Response may be cached by any cache private Response Response may be cached such that it can be accessed only by a single client If the directive is accompanied by a list of header fields, the content may be cached and served to clients, but the listed header no-cache Response fields must first be removed If no header fields are specified, the cached copy must not be served without revalidation with the server no-store Response Response must not be cached noResponse Response must not be modified in any way before being served transform mustResponse Response must be revalidated with the server before being served revalidate proxyShared caches must revalidate the response with the origin server Response revalidate before serving This directive can be ignored by private caches max-age Response s-maxage Specifies the maximum length of time the document can be cached and still considered fresh Specifies the maximum age of the document as it applies to shared Response caches (overriding the max-age directive, if one is present) This directive can be ignored by private caches Caching and freshness were discussed in more detail in Chapter 7 15.8.2 Conditionals and Validators When a cache's copy is requested, and it is no longer fresh, the cache needs to make sure it has a fresh copy The cache can fetch the current copy from the origin server, but in many cases, the document on the server is still the same as the stale copy in the cache We saw this in Figure 15-8b; the cached copy may have expired, but the server content still is the same as the cache content If a cache always fetches a server's document, even if it's the same as the expired cache copy, the cache wastes network bandwidth, places unnecessary load on the cache and server, and slows everything down To fix this, HTTP provides a way for clients to request a copy only if the resource has changed, using special requests called conditional requests Conditional requests are normal HTTP request messages, but they are performed only if a particular condition is true For example, a cache might send the following conditional GET message to a server, asking it to send the file /announce.html only if the file has been modified since June 29, 2002 (the date the cached document was last changed by the author): GET /announce.html HTTP/1.0 If-Modified-Since: Sat, 29 Jun 2002, 14:30:00 GMT Conditional requests are implemented by conditional headers that start with "If" In the example above, the conditional header is If-Modified-Since A conditional header allows a method to execute only if the condition is true If the condition is not true, the server sends an HTTP error code back Each conditional works on a particular validator A validator is a particular attribute of the document instance that is tested Conceptually, you can think of the validator like the serial number, version number, or last change date of a document A wise client in Figure 15-8b would send a conditional validation request to the server saying, "send me the resource only if it is no longer Version 1; I have Version 1." We discussed conditional cache revalidation in Chapter 7, but we'll study the details of entity validators more carefully in this chapter The If-Modified-Since conditional header tests the last-modified date of a document instance, so we say that the last-modified date is the validator The IfNone-Match conditional header tests the ETag value of a document, which is a special keyword or version-identifying tag associated with the entity LastModified and ETag are the two primary validators used by HTTP Table 15-4 lists four of the HTTP headers used for conditional requests Next to each conditional header is the type of validator used with the header Table 15-4 Conditional request types Request type Validator Description Send a copy of the resource if the version that was last modified If-Modified- Lastat the time in your previous Last-Modified response header is Since Modified no longer the latest one IfSend a copy of the resource only if it is the same as the version LastUnmodifiedthat was last modified at the time in your previous LastModified Since Modified response header If-Match ETag Send a copy of the resource if its entity tag is the same as that of the one in your previous ETag response header If-NoneMatch ETag Send a copy of the resource if its entity tag is different from that of the one in your previous ETag response header HTTP groups validators into two classes: weak validators and strong validators Weak validators may not always uniquely identify an instance of a resource; strong validators must An example of a weak validator is the size of the object in bytes The resource content might change even thought the size remains the same, so a hypothetical byte-count validator only weakly indicates a change A cryptographic checksum of the contents of the resource (such as MD5), however, is a strong validator; it changes when the document changes The last-modified time is considered a weak validator because, although it specifies the time at which the resource was last modified, it specifies that time to an accuracy of at most one second Because a resource can change multiple times in a second, and because servers can serve thousands of requests per second, the last-modified date might not always reflect changes The ETag header is considered a strong validator, because the server can place a distinct value in the ETag header every time a value changes Version numbers and digest checksums are good candidates for the ETag header, but they can contain any arbitrary text ETag headers are flexible; they take arbitrary text values ("tags"), and can be used to devise a variety of client and server validation strategies Clients and servers may sometimes want to adopt a looser version of entity-tag validation For example, a server may want to make cosmetic changes to a large, popular cached document without triggering a mass transfer when caches revalidate In this case, the server might advertise a "weak" entity tag by prefixing the tag with "W/" A weak entity tag should change only when the associated entity changes in a semantically significant way A strong entity tag must change whenever the associated entity value changes in any way The following example shows how a client might revalidate with a server using a weak entity tag The server would return a body only if the content changed in a meaningful way from Version 4.0 of the document: GET /announce.html HTTP/1.1 If-None-Match: W/"v4.0" In summary, when clients access the same resource more than once, they first need to determine whether their current copy still is fresh If it is not, they must get the latest version from the server To avoid receiving an identical copy in the event that the resource has not changed, clients can send conditional requests to the server, specifying validators that uniquely identify their current copies Servers will then send a copy of the resource only if it is different from the client's copy For more details on cache revalidation, please refer back to Section 7.7 ... When a cache's copy is requested, and it is no longer fresh, the cache needs to make sure it has a fresh copy The cache can fetch the current copy from the origin server, but in many cases, the document on the server is still the same as the stale copy in the cache... revalidating it with the server no-store Do not return a cached copy of the document Do not store the response from the server Request max-age Request The document in the cache must not be older than the specified... strong validators must An example of a weak validator is the size of the object in bytes The resource content might change even thought the size remains the same, so a hypothetical byte-count validator only weakly indicates a change