Book HomeApache: The Definitive GuideSearch this book

Chapter 1. Getting Started

Contents:

How Does Apache Work?
What to Know About TCP/IP
How Does Apache Use TCP/IP?
What the Client Does
What Happens at the Server End?
Which Unix?
Which Apache?
Making Apache Under Unix
Apache Under Windows
Apache Under BS2000/OSD and AS/400

When you connect to the URL of someone's home page -- say the notional http://www.butterthlies.com/ we shall meet later on -- you send a message across the Internet to the machine at that address. That machine, you hope, is up and running, its Internet connection is working, and it is ready to receive and act on your message.

URL stands for Universal Resource Locator. A URL such as http://www.butter-thlies.com/ comes in three parts:

<method>://<host>/<absolute path URL (apURL)>

So, in our example, < method> is http, meaning that the browser should use HTTP (Hypertext Transfer Protocol); <host> is www.butterthlies.com; and <apURL> is "/ ", meaning the top directory of the host. Using HTTP/1.1, your browser might send the following request:

GET / HTTP/1.1
Host: www.butterthlies.com

The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in three parts: a method (an HTTP method, not a URL method), that in this case is GET, but could equally be PUT, POST, DELETE, or CONNECT; the Uniform Resource Identifier (URI) "/"; and the version of the protocol we are using. It is then up to the web server running on that host to make something of this message.

It is worth saying here -- and we will say it again -- that the whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming.

The host machine may be a whole cluster of hypercomputers costing an oil sheik's ransom, or a humble PC. In either case, it had better be running a web server, a program that listens to the network and accepts and acts on this sort of message.

What do we want a web server to do? It should:

These are services that the developers of Apache think a server should offer. There are people who have other ideas, and, as with all software development, there are lots of features that might be nice -- features someone might use one day, or that might, if put into the code, actually make it work better instead of fouling up something else that has, until then, worked fine. Unless developers are careful, good software attracts so many improvements that it eventually rolls over and sinks like a ship caught in an Arctic ice storm.

Some ideas are in progress: in particular, various proposals for Apache 2.0 are being kicked around. The main features Apache 2.0 is supposed to have are multithreading (on platforms that support it), layered I/O, and a rationalized API.

If you have bugs to report or more ideas for development, look at http://www.apache.org/bug_report.html. You can also try news:comp.infosystems.www.servers.unix, where some of the Apache team lurk, along with many other knowledgeable people, and news:comp.infosystems.www.servers.ms-windows.

1.1. How Does Apache Work?

Apache is a program that runs under a suitable multitasking operating system. In the examples in this book, the operating systems are Unix and Windows 95/98/NT, which we call Win32. The binary is called httpd under Unix and apache.exe under Win32[3] and normally runs in the background. Each copy of httpd/apache that is started has its attention directed at a web site , which is, for practical purposes, a directory. For an example, look at site.toddle on the demonstration CD-ROM. Regardless of operating system, a site directory typically contains four subdirectories:

[3]This double name is rather annoying, but it seems that life has progressed too far for anything to be done about it. We will, rather clumsily, refer to httpd/apache and hope that the reader can pick the right one.

conf

Contains the configuration file(s), of which httpd.conf is the most important. It is referred to throughout this book as the Config file.

htdocs

Contains the HTML scripts to be served up to the site's clients. This directory and those below it, the web space, are accessible to anyone on the Web and therefore pose a severe security risk if used for anything other than public data.

logs

Contains the log data, both of accesses and errors.

cgi-bin

Contains the CGI scripts. These are programs or shell scripts written by or for the webmaster that can be executed by Apache on behalf of its clients. It is most important, for security reasons, that this directory not be in the web space.

In its idling state, Apache does nothing but listen to the IP addresses and TCP port or ports specified in its Config file. When a request appears on a valid port, Apache receives the HTTP request and analyzes the headers. It then applies the rules it finds in the Config file and takes the appropriate action.

The webmaster's main control over Apache is through the Config file. The webmaster has some 150 directives at his or her disposal; most of this book is an account of what these directives do and how to use them to reasonable advantage. The webmaster also has half a dozen flags he or she can use when Apache starts up. Apache is freeware : the intending user downloads the source code and compiles it (under Unix) or downloads the executable (for Windows) from www.apache.org or a suitable mirror site. You can also load the source code from the demonstration CD-ROM included with this book, although it is not the most recent. Although it sounds like a difficult business to download the source code and configure and compile it, it only takes about 20 minutes and is well worth the trouble.

Figure 1.1 Under Unix, the webmaster also controls which modules are compiled into Apache. Each module provides the code to execute a number of directives. If there is a group of directives that aren't needed, the appropriate modules can be left out of the binary by commenting their names out in the configuration file [4] that controls the compilation of the Apache sources. Discarding unwanted modules reduces the size of the binary and may improve performance.

[4]It is important to distinguish between the configuration file used at compile time and the Config file used to control the operation of a web site.

Figure 1.1 Under Windows, Apache is normally precompiled as an executable. The core modules are compiled in, and others are loaded, if needed, as dynamic link libraries (DLLs) at runtime, so control of the executable's size is less urgent. The DLLs supplied in the .../apache/modules subdirectory are as follows:

APACHE~1 DLL         5,120  19/07/98  11:47 ApacheModuleAuthAnon.dll
APACHE~2 DLL         5,632  19/07/98  11:48 ApacheModuleCERNMeta.dll
APACHE~3 DLL         6,656  19/07/98  11:47 ApacheModuleDigest.dll
APACHE~4 DLL         6,144  19/07/98  11:48 ApacheModuleExpires.dll
APACHE~5 DLL         5,120  19/07/98  11:48 ApacheModuleHeaders.dll
APACHE~6 DLL        46,080  19/07/98  11:48 ApacheModuleProxy.dll
APACHE~7 DLL        35,328  19/07/98  11:48 ApacheModuleRewrite.dll
APACHE~8 DLL         6,656  19/07/98  11:48 ApacheModuleSpeling.dll
APACHE~9 DLL        10,752  19/07/98  11:47 ApacheModuleStatus.dll
APACH~10 DLL         6,144  19/07/98  11:48 ApacheModuleUserTrack.dll

Figure 1.1 What these are and what they do will become more apparent as we proceed. You can add other DLLs from outside suppliers; more will doubtless become available.

Figure 1.1 It is also possible to download the source code and compile it for Win32 using Microsoft Visual C++ v5.0. We describe this in Section 1.9, "Apache Under Windows", later in this chapter. You might do this if you wanted to write your own module (see Chapter 15, "Writing Apache Modules").



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.