Server responses, like client requests, always contain HTTP headers and an optional body. Here is the server response from our earlier example:
HTTP/1.1 200 OK Date: Sat, 18 Mar 2000 20:35:35 GMT Server: Apache/1.3.9 (Unix) Last-Modified: Wed, 20 May 1998 14:59:42 GMT ETag: "74916-656-3562efde" Content-Length: 141 Content-Type: text/html <HTML> <HEAD><TITLE>Sample Document</TITLE></HEAD> <BODY> <H1>Sample Document</H1> <P>This is a sample HTML document!</P> </BODY> </HTML>
The structure of the headers for the response is the same as for requests. The first header line has a special meaning, and is referred to as the status line. The remaining lines are name-value header field lines. See Figure 2-8.
The first line of the header is the status line, which includes the protocol and version just as in HTTP requests, except that this information comes at the beginning instead of at the end. This string is followed by a space and the three-digit status code, as well as a text version of the status. See Figure 2-9.
Web servers can send any of dozens of status codes. For example, the server returns a status of 404 Not Found if a document doesn't exist and 301 Moved Permanently if a document is moved. Status codes are grouped into five different classes according to their first digit:
These status codes were introduced for HTTP 1.1 and used at a low level during HTTP transactions. You won't use 100-series status codes in CGI scripts.
200-series status codes indicate that all is well with the request.
300-series status codes generally indicate some form of redirection. The request was valid, but the browser should find the content of its response elsewhere.
400-series status codes indicate that there was an error and the server is blaming the browser for doing something wrong.
500-series status codes also indicate there was an error, but in this case the server is admitting that it or a CGI script running on the server is the culprit.
We'll discuss each of the common status codes and how to use them in your CGI scripts in the next chapter.
After the status line, the server sends its HTTP headers. Some of these server headers are the same headers that browsers send with their requests. The common server headers are listed in Table 2-3.
Header |
Description |
---|---|
Content-Base |
Specifies the base URL for resolving all relative URLs within the document |
Content-Length |
Specifies the length (in bytes) of the body |
Content-Type |
Specifies the media type of the body |
Date |
Specifies the date and time when the response was sent |
ETag |
Specifies an entity tag for the requested resource |
Last-Modified |
Specifies the date and time when the requested resource was last modified |
Location |
Specifies the new location for the resource |
Server |
Specifies the name and version of the web server |
Set-Cookie |
Specifies a name-value pair that the browser should provide with future requests |
WWW-Authenticate |
Specifies the authorization scheme and realm |
The Content-Base field contains a URL to use as the base for relative URLs in HTML documents. Using the <BASE HREF=...> tag in the head of the document accomplishes the same thing and is more common.
As with request headers, the Content-Length field in response headers contains the length of the body of the response. Browsers use this to detect an interrupted transaction or to tell the user the percentage of the download that is complete.
You will use the Content-Type header very often in your CGI scripts. This field is provided with every response containing a body and must be included for all requests accompanied by a status code of 200. The most common value for this response is text/html, which is what is returned with HTML documents. Other examples are text/plain for text documents and application/pdf for Adobe PDF documents.
Because this field originally derived from a similar MIME field, this field is often referred to as the MIME type of the message. However, this term is not accurate because the possible values for this field differs for the Web than for Internet email. The IANA maintains a registry of registered media types for the Web, which may be viewed at http://www.isi.edu/in-notes/iana/assignments/media-types/. Although you could invent your media type values, it is a good idea to stick with these registered ones since web browsers need to know how to handle the associated documents.
HTTP 1.1 requires that servers send the Date header with all responses. It specifies the date and time the response is sent. Three different date formats are acceptable in HTTP:
Mon, 06 Aug 1999 19:01:42 GMT Monday, 06-Aug-99 19:01:42 GMT Mon Aug 6 19:01:42 1999
The HTTP specification recommends the first option, but all should be supported by HTTP applications. The last is the format generated by Perl's gmtime function.[4]
[4]More specifically, gmtime generates a date string like this when it is called in a scalar context. In list context, it returns a list of date elements instead. If this distinction seems unclear, then you may want to refer to a good Perl book like Programming Perl for the difference between list and scalar context.
The ETag header specifies an entity tag corresponding to the requested resource. Entity tags were added to HTTP 1.1 to address problems with caching. Although HTTP 1.1 does not specify any particular way for a server to generate an entity tag, they are analogous to a message digest or checksum for a file. Clients and proxies can assume that all copies of a resource with the same URL and same entity tag are identical. Thus, generating a HEAD request and checking the ETag header of the response is an effective way for a browser to determine whether a previously cached response needs to be fetched again. Web servers typically do not generate these for CGI scripts, although you can generate your own if you wish to have greater control over how HTTP 1.1 clients cache your responses.
The Last-Modified header returns the date and time that the requested resource was last updated. This was intended to support caching, but it did not always work as well as hoped in HTTP 1.0, so the ETag header now supplements it. The Last-Modified header is restrictive because it implies that HTTP resources are static files, which is obviously not always the case. For example, for CGI scripts the value of this field must reflect the last time the output changed (possibly due to a change in a data source), and not the date and time that the CGI script itself was last updated. Like the ETag header, the web server does not typically generate the Last-Modified header for your CGI scripts, although you can output it yourself if you desire.
The Location header is used to inform a client that it should look elsewhere for the requested resource. The value should contain an absolute URL to the new location. This header should be accompanied by a 3xx series status code. Browsers generally fetch the resource from the new location automatically for the user. Responses with a Location field may also contain contents with instructions for the user since very old browsers may not respond to the Location field.
The Server header provides the name and version of the application acting as the web server. The web server automatically generates this for standard responses. There are circumstances when you should generate this yourself, which we will see in the next chapter.
The Set-Cookie header asks the browser to remember a name-value pair and send this data back on subsequent requests to this server. The server can specify how long the browser should remember the cookie and to what hosts or domains the browser should provide it. We'll discuss cookies in detail in our discussion of maintaining state in Chapter 11, "Maintaining State".
As we discussed earlier in Section 2.3.2.4, "Authorization", web servers can restrict certain resources to users who provide a valid username and password. The WWW-Authenticate field is used along with a status code of 401 to indicate that the requested resource requires a such a login. The value of this field should contain the form of authentication and the realm for which the authorization applies. An authorization realm generally maps to a certain directory on the web server, and a username and password pair should apply to all resources within a realm.
Copyright © 2001 O'Reilly & Associates. All rights reserved.