Book HomeApache: The Definitive GuideSearch this book

Chapter 3. Toward a Real Web Site

Contents:

More and Better Web Sites: site.simple
Butterthlies, Inc., Gets Going
Block Directives
Other Directives
Two Sites and Apache
Controlling Virtual Hosts on Unix
Controlling Virtual Hosts on Win32
Virtual Hosts
Two Copies of Apache
HTTP Response Headers
Options
Restarts
.htaccess
CERN Metafiles
Expirations

3.1. More and Better Web Sites: site.simple

We are now in a position to start creating real(ish) web sites, which can be found on the accompanying CD-ROM. For the sake of a little extra realism, we will base them loosely round a simple web business, Butterthlies, Inc., that creates and sells picture postcards. We need to give it some web addresses, but since we don't yet want to venture into the outside world, they should be variants on your own network ID so that all the machines in the network realize that they don't have to go out on the Web to make contact. For instance, we edited the \windows\hosts file on the Win95 machine running the browser and the /etc/hosts file on the Unix machine running the server to read as follows:

127.0.0.1 localhost
192.168.123.2 www.butterthlies.com
192.168.123.2 sales.butterthlies.com
192.168.123.3 sales-IP.butterthlies.com
192.168.124.1 www.faraway.com

localhost is obligatory, so we left it in, but you should not make any server requests to it since the results are likely to be confusing.

You probably need to consult your network manager to make similar arrangements.

site.simple is site.toddle with a few small changes. The script go is different in that it refers to ... /site.simple/conf/httpd.conf rather than ... /site.toddle/conf/httpd.conf.

Unix:

% httpd -d /usr/www/site.simple

Win32:

>apache -d c:/usr/www/site.simple

This will be true of each site in the demonstration setup, so we will not mention it again.

From here on there will be minimal differences between the server setups necessary for Win32 and those for Unix. Unless one or the other is specifically mentioned, you should assume that the text refers to both.

It would be nice to have a log of what goes on. In the first edition of this book we found that a file access_log was created automatically in ...site.simple/logs. In a rather bizarre move since then, the Apache Group has broken backward compatibility and now requires you to mention the log file explicitly in the Config file using the TransferLog directive.

The ... /conf/httpd.conf file now contains the following:

User webuser
Group webgroup
ServerName localhost
DocumentRoot /usr/www/site.simple/htdocs
TransferLog logs/access_log

In ... /htdocs we have, as before, 1.txt :

hullo world from site.simple!

Now, type go on the server. Switch to the client machine and retrieve http://www.butterthlies.com. You should see:

Index of /
. Parent Directory
. 1.txt

Click on 1.txt for an inspirational message as before.

This all seems satisfactory, but there is a hidden mystery. We get the same result if we connect to http://sales.butterthlies.com. Why is this? Why, since we have not mentioned either of these URLs or their IP addresses in the configuration file on site.simple, do we get any response at all?

The answer is that when we configured the machine the server runs on, we told the network interface to respond to any of these IP addresses:

192.168.123.2
192.168.123.3

By default Apache listens to all IP addresses belonging to the machine and responds in the same way to all of them. If there are virtual hosts configured (which there aren't, in this case), Apache runs through them, looking for an IP name that corresponds to the incoming connection. Apache uses that configuration if it is found, or the main configuration if it is not. Later in this chapter, we look at more definite control with the directives BindAddress, Listen, and <VirtualHost>.

It has to be said that working like this (that is, switching rapidly between different configurations) seemed to get Netscape or Internet Explorer into a rare muddle. To be sure that the server was functioning properly while using Netscape as a browser, it was usually necessary to reload the file under examination by holding down the Control key while clicking on Reload. In extreme cases, it was necessary to disable caching by going to Edit Preferences Advanced Cache. Set memory and disk cache to and set cache comparison to Every Time. In Internet Explorer, set Cache Compares to Every Time. If you don't, the browser tends to display a jumble of several different responses from the server. This occurs because we are doing what no user or administrator would normally do, namely, flipping around between different versions of the same site with different versions of the same file. Whenever we flip from a newer version to an older version, Netscape is led to believe that its cached version is up-to-date.

Back on the server, stop Apache with ^C (or whatever your kill character is) and look at the log files. In ... /logs/access_log, you should see something like this:

192.168.123.1 - - [<date-time>] "GET / HTTP/1.1" 200 177

200 is the response code (meaning "OK, cool, fine"), and 177 is the number of bytes transferred. In ... /logs/error_log, there should be nothing because nothing went wrong. However, it is a good habit to look there from time to time, though you have to make sure that the date and time logged correspond to the problem you are investigating. It is easy to fool yourself with some long-gone drama.

Life being what it is, things can go wrong, and the client can ask for something the server can't provide. It makes sense to allow for this with the ErrorDocument command.

3.1.1. ErrorDocument

ErrorDocument error-code document
Server config, virtual host, directory, .htaccess

In the event of a problem or error, Apache can be configured to do one of four things:

  1. Output a simple hardcoded error message.

  2. Output a customized message.

  3. Redirect to a local URL to handle the problem/error.

  4. Redirect to an external URL to handle the problem/error.

The first option is the default, whereas options 2 through 4 are configured using the ErrorDocument directive, which is followed by the HTTP response code and a message or URL. Messages in this context begin with a double quotation mark ("), which does not form part of the message itself. Apache will sometimes offer additional information regarding the problem or error.

URLs can be local URLs beginning with a slash ("/") or full URLs that the client can resolve. For example:

ErrorDocument 500 http://foo.example.com/cgi-bin/tester
ErrorDocument 404 /cgi-bin/bad_urls.pl
ErrorDocument 401 /subscription_info.html
ErrorDocument 403 "Sorry can't allow you access today

Note that when you specify an ErrorDocument that points to a remote URL (i.e., anything with a method such as "http" in front of it), Apache will send a redirect to the client to tell it where to find the document, even if the document ends up being on the same server. This has several implications, the most important being that if you use an ErrorDocument 401 directive, it must refer to a local document. This results from the nature of the HTTP basic authentication scheme.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.