How Does Apache Work?
What to Know About TCP/IP
How Does Apache Use TCP/IP?
What the Client Does
What Happens at the Server End?
Which Unix?
Which Apache?
Making Apache Under Unix
Apache Under Windows
Apache Under BS2000/OSD and AS/400
When you connect to the URL of someone's home page -- say the notional http://www.butterthlies.com/ we shall meet later on -- you send a message across the Internet to the machine at that address. That machine, you hope, is up and running, its Internet connection is working, and it is ready to receive and act on your message.
URL stands for Universal Resource Locator. A URL such as http://www.butter-thlies.com/ comes in three parts:
<method>://<host>/<absolute path URL (apURL)>
So, in our example, < method> is http, meaning that the browser should use HTTP (Hypertext Transfer Protocol); <host> is www.butterthlies.com; and <apURL> is "/ ", meaning the top directory of the host. Using HTTP/1.1, your browser might send the following request:
GET / HTTP/1.1 Host: www.butterthlies.com
The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in three parts: a method (an HTTP method, not a URL method), that in this case is GET, but could equally be PUT, POST, DELETE, or CONNECT; the Uniform Resource Identifier (URI) "/"; and the version of the protocol we are using. It is then up to the web server running on that host to make something of this message.
It is worth saying here -- and we will say it again -- that the whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming.
The host machine may be a whole cluster of hypercomputers costing an oil sheik's ransom, or a humble PC. In either case, it had better be running a web server, a program that listens to the network and accepts and acts on this sort of message.
What do we want a web server to do? It should:
Run fast, so it can cope with a lot of inquiries using a minimum of hardware.
Be multitasking, so it can deal with more than one inquiry at once.
Be multitasking, so that the person running it can maintain the data it hands out without having to shut the service down. Multitasking is hard to arrange within a program: the only way to do it properly is to run the server on a multitasking operating system. In Apache's case, this is some flavor of Unix (or Unix-like system), Win32, or OS/2.
Authenticate inquirers: some may be entitled to more services than others. When we come to virtual cash, this feature (see Chapter 13, "Security") becomes essential.
Respond to errors in the messages it gets with answers that make sense in the context of what is going on. For instance, if a client requests a page that the server cannot find, the server should respond with a "404" error, which is defined by the HTTP specification to mean "page does not exist."
Negotiate a style and language of response with the inquirer. For instance, it should -- if the people running the server can rise to the challenge -- be able to respond in the language of the inquirer's choice. This ability, of course, can open up your site to a lot more action. And there are parts of the world where a response in the wrong language can be a bad thing. If you were operating in Canada, where the English/French divide arouses bitter feelings, or in Belgium, where the French/Flemish split is as bad, this feature could make or break your business.
Offer different formats. On a more technical level, a user might want JPEG image files rather than GIF, or TIFF rather than either of the former. He or she might want text in vdi format rather than PostScript.
Run as a proxy server. A proxy server accepts requests for clients, forwards them to the real servers, and then sends the real servers' responses back to the clients. There are two reasons why you might want a proxy server:
The proxy might be running on the far side of a firewall (see Chapter 13, "Security"), giving its users access to the Internet.
The proxy might cache popular pages to save reaccessing them.
Be secure. The Internet world is like the real world, peopled by a lot of lambs and a few wolves.[2] The wolves like to get into the lambs' folds (of which your computer is one) and, when there, raven and tear in the usual wolfish way. The aim of a good server is to prevent this happening. The subject of security is so important that we will come back to it several times before we are through.
[2]We generally follow the convention of calling these people the Bad Guys. This avoids debate about "hackers," which, to many people, simply refers to good programmers, but to some means Bad Guys. We discover from the French edition of this book that in France they are Sales Types -- dirty fellows.
These are services that the developers of Apache think a server should offer. There are people who have other ideas, and, as with all software development, there are lots of features that might be nice -- features someone might use one day, or that might, if put into the code, actually make it work better instead of fouling up something else that has, until then, worked fine. Unless developers are careful, good software attracts so many improvements that it eventually rolls over and sinks like a ship caught in an Arctic ice storm.
Some ideas are in progress: in particular, various proposals for Apache 2.0 are being kicked around. The main features Apache 2.0 is supposed to have are multithreading (on platforms that support it), layered I/O, and a rationalized API.
If you have bugs to report or more ideas for development, look at http://www.apache.org/bug_report.html. You can also try news:comp.infosystems.www.servers.unix, where some of the Apache team lurk, along with many other knowledgeable people, and news:comp.infosystems.www.servers.ms-windows.
Apache is a program that runs under a suitable multitasking operating system. In the examples in this book, the operating systems are Unix and Windows 95/98/NT, which we call Win32. The binary is called httpd under Unix and apache.exe under Win32[3] and normally runs in the background. Each copy of httpd/apache that is started has its attention directed at a web site , which is, for practical purposes, a directory. For an example, look at site.toddle on the demonstration CD-ROM. Regardless of operating system, a site directory typically contains four subdirectories:
[3]This double name is rather annoying, but it seems that life has progressed too far for anything to be done about it. We will, rather clumsily, refer to httpd/apache and hope that the reader can pick the right one.
Contains the configuration file(s), of which httpd.conf is the most important. It is referred to throughout this book as the Config file.
Contains the HTML scripts to be served up to the site's clients. This directory and those below it, the web space, are accessible to anyone on the Web and therefore pose a severe security risk if used for anything other than public data.
Contains the CGI scripts. These are programs or shell scripts written by or for the webmaster that can be executed by Apache on behalf of its clients. It is most important, for security reasons, that this directory not be in the web space.
In its idling state, Apache does nothing but listen to the IP addresses and TCP port or ports specified in its Config file. When a request appears on a valid port, Apache receives the HTTP request and analyzes the headers. It then applies the rules it finds in the Config file and takes the appropriate action.
The webmaster's main control over Apache is through the Config file. The webmaster has some 150 directives at his or her disposal; most of this book is an account of what these directives do and how to use them to reasonable advantage. The webmaster also has half a dozen flags he or she can use when Apache starts up. Apache is freeware : the intending user downloads the source code and compiles it (under Unix) or downloads the executable (for Windows) from www.apache.org or a suitable mirror site. You can also load the source code from the demonstration CD-ROM included with this book, although it is not the most recent. Although it sounds like a difficult business to download the source code and configure and compile it, it only takes about 20 minutes and is well worth the trouble.
Under Unix, the webmaster also controls which modules are compiled into Apache. Each module provides the code to execute a number of directives. If there is a group of directives that aren't needed, the appropriate modules can be left out of the binary by commenting their names out in the configuration file [4] that controls the compilation of the Apache sources. Discarding unwanted modules reduces the size of the binary and may improve performance.
[4]It is important to distinguish between the configuration file used at compile time and the Config file used to control the operation of a web site.
Under Windows, Apache is normally precompiled as an executable. The core modules are compiled in, and others are loaded, if needed, as dynamic link libraries (DLLs) at runtime, so control of the executable's size is less urgent. The DLLs supplied in the .../apache/modules subdirectory are as follows:
APACHE~1 DLL 5,120 19/07/98 11:47 ApacheModuleAuthAnon.dll APACHE~2 DLL 5,632 19/07/98 11:48 ApacheModuleCERNMeta.dll APACHE~3 DLL 6,656 19/07/98 11:47 ApacheModuleDigest.dll APACHE~4 DLL 6,144 19/07/98 11:48 ApacheModuleExpires.dll APACHE~5 DLL 5,120 19/07/98 11:48 ApacheModuleHeaders.dll APACHE~6 DLL 46,080 19/07/98 11:48 ApacheModuleProxy.dll APACHE~7 DLL 35,328 19/07/98 11:48 ApacheModuleRewrite.dll APACHE~8 DLL 6,656 19/07/98 11:48 ApacheModuleSpeling.dll APACHE~9 DLL 10,752 19/07/98 11:47 ApacheModuleStatus.dll APACH~10 DLL 6,144 19/07/98 11:48 ApacheModuleUserTrack.dll
What these are and what they do will become more apparent as we proceed. You can add other DLLs from outside suppliers; more will doubtless become available.
It is also possible to download the source code and compile it for Win32 using Microsoft Visual C++ v5.0. We describe this in Section 1.9, "Apache Under Windows", later in this chapter. You might do this if you wanted to write your own module (see Chapter 15, "Writing Apache Modules").
Copyright © 2001 O'Reilly & Associates. All rights reserved.