RPMs, mod_perl, apachectl, telnet, mod_status, DSO, apxs, Apache::Status...Scared yet? Mr. Lerner tells us about all of these so we can better manage our web servers.
by Reuven M. Lerner
As longtime users know, Linux, Apache and other free software packages are both more stable and more easily configured than their closed-source counterparts. However, this obviously does not mean that free software advocates are immune from software bugs and configuration problems. And indeed, the complexity of much open-source software can sometimes be frustrating; it's often unclear which option to modify, or even where to begin.
This month, we will look at some of the tools and techniques that webmasters can use when trying to configure, tune and debug their Apache configurations. This cannot be a comprehensive list, simply because so many things can go wrong. However, being able to identify the cause of a problem often makes it relatively easy to come up with a solution, or at least to begin working toward one.
I have been using Red Hat Linux for about five years, back from the days in which Red Hat was a small company in Connecticut, rather than a well-known start-up whose name even my mother recognizes. During that time, I have become a big fan of RPM, the Red Hat Package Manager. I still remember when I had to modify makefiles by hand in order to compile software downloaded from the Internet; the fact that I can download a binary version of a program, install it with a single command, and then remove it just as easily, continues to amaze me. (I'm told that Debian's packaging system is even more sophisticated, but I have not yet had the opportunity to try it out.)
System administrators who learn to love RPMs are often reluctant to begin compiling programs from source. Not only does this take more time and more knowledge, but compiling and installing a program from its source code makes removal difficult, months or years later. However, there are certain programs that I insist on compiling from source, despite the inherent disadvantages. One of these is Apache, which is normally installed as part of the Red Hat installation.
Apache itself is a relatively small program. The bulk of the functionality resides in individual modules which are normally incorporated into Apache at compile time. So if you want Apache to automatically correct misspelled URLs, you can include mod_speling (and no, that isn't a typo) in your compilation. If you will be running a server without any CGI programs, you can remove mod_cgi. And the list continues, making it possible to customize Apache according to your exact needs.
Thus, I strongly recommend that everyone compile Apache from scratch. This process is normally quite straightforward, and should take no more than a few minutes on a typical modern computer. The first step toward compiling Apache is to remove any RPMs that might already be on the system, to avoid confusing either the RPM database or yourself regarding a file's exact origins.
To remove the Apache RPM from a Red Hat system, use the following command: rpm -e apache.
To get a better idea of what is going on as your disk drive crunches away, you may turn on the ``very verbose'' option: rpm -evv apache.
On most systems, this command will not be sufficient by itself. RPM keeps track of not only which files are installed, but also of the dependencies between packages. Since a typical Red Hat installation includes RPMs for mod_perl (which is often broken) and mod_php, you may need to erase these as well:
rpm -evv apache mod_perl mod_phpIf removing a package will break a dependency, RPM will exit with a fatal error, indicating which dependencies will not be satisfied if you remove the package in question. You should then consider whether the package (e.g., mod_perl) is only useful with another package (e.g., Apache) or if removing it would cause too many other problems.
Once you have removed any existing Apache RPMs, you can begin to compile and install Apache on your system. (Of course, you can always compile Apache while the RPM exists, removing the RPM just before you install the compiled version.) Download the latest version, which at the time of this writing was 1.3.12, from a local mirror of http://www.apache.org/. Once you have done that, you can unpack it with:
tar -zxvvf apache_1.3.12.tar.gzThe z option uncompresses the tar archive with gunzip before proceeding, and the vv flags ask for ``very verbose'' output. Both of these options work only with GNU tar, which is the standard tar version on Linux systems.
Once you have unpacked the source code, you can compile Apache with the following commands:
cd apache_1.3.12 # Switch into the Apache directory ./configure # Get a default configuration make # Compile the source code make install # Install Apache under /usr/local/apache/The above four steps will place the Apache-related programs in /usr/local/apache/bin, logfiles in /usr/local/apache/logs, HTML files in /usr/local/apache/htdocs, and CGI programs in /usr/local/apache/cgi-bin. The ``make install'' command must be performed while logged in as root.
To run Apache, use the apachectl program which is installed by default in /usr/local/apache/bin. apachectl is a shell script which makes it relatively easy to start or stop Apache, ensuring that only one Apache process will run on a system at a given time. To start Apache, use:
/usr/local/bin/apachectl start You may need to insert this line in one of your startup files, such as /etc/rc.d/rc.local, to ensure that Apache starts up when your computer is booted. This is especially true if you removed the RPM version of Apache using the above instructions, since the RPM comes with a startup file that is automatically placed (and invoked) inside of /etc/rc.d/init.d. apachectl can also be used to shut down the server:
/usr/local/bin/apachectl stopOne of the first things that Apache does when it starts up is to look through its configuration file, traditionally called httpd.conf and located (by default) in /usr/local/apache/conf. This file contains a number of commands, known as ``directives'', followed by one or more values. We can thus set the name of the server explicitly with the directive:
ServerName www.lerner.co.ilEach Apache module is allowed to define its own directives which control the program's behavior when invoking that module. For example, mod_userdir installs the UserDir directive, indicating which directory name should be treated specially inside of each user's home directory, for the purposes of creating sites on the Web.
But what happens if mod_userdir is not installed? Then Apache will not know what to do with the UserDir directive, and will exit with a fatal error. The solution is to run apache configtest, which reads httpd.conf and checks it to ensure that all of the directives are known and have legal values. If all of the directives have legal definitions, apachectl will respond with ``OK''.
Another partial solution to this issue is to put all module-specific directives inside of an <IfModule> section. <IfModule> tells Apache that it should pay attention to one or more directives only if a particular module is actually loaded. Thus, we can always define our UserDir directive, assuming it is wrapped in the following section:
<IfModule mod_userdir.c> UserDir public_html </IfModule><IfModule> is particularly useful when working with modules compiled as DSOs, because of the flexibility it adds.
Once your Apache server is compiled, installed, configured and running, you might still encounter some problems. apachectl configtest is a good way to ensure that all of the directives are legal--but this does not ensure that they will work, nor that they are the values you really want.
A good tool for checking that a server is working in the right way is telnet. Telnet is probably familiar to most Linux users as a way to log into one computer while sitting at another. However, telnet can open a TCP connection on any port, not just the default port 23 used for the ``telnet'' protocol. You can thus use telnet for SMTP (port 25), POP (port 110) and even HTTP (port 80). This is a great way to test web servers to see if they're running, as well as run some basic tests.
The key to this technique is understanding that every TCP/IP service is associated with an IP address and port on both the origin and destination computers. Thus, a telnet connection between two computers will almost certainly include two IP addresses, an arbitrary port on the client, and the well-known port 23 on the server. Similarly, an SMTP connection carrying e-mail between two computers will normally include a connection from an arbitrary port on the client to port 25 on the server. For a list of well-known ports, including many that should not be changed without good reason (such as FTP, SMTP, telnet), look at the standard file /etc/services on any computer running Linux.
To use the telnet program to connect to an arbitrary port, simply indicate the port number (or name, if one appears in /etc/services). For example, to connect to the HTTP server running on port 80 of the computer www.lerner.co.il, simply say telnet www.lerner.co.il 80.
If the server is running on a different port, as defined by the Apache Listen and Port directives, simply specify a different number. For example, if the server is running on port 8080, we can connect to it with telnet www.lerner.co.il 8080.
Of course, a single Apache process can handle input on two or more port numbers, so long as they are specified correctly in httpd.conf.
If no server is running on the specified port, then telnet will exit with a fatal ``connection refused'' error. In such a case, consult the Apache error logs, normally placed in /usr/local/apache/logs/error_log, to determine the source of the problem.
If an HTTP server is indeed running on the specified port, then telnet will display a message indicating how to exit back into the telnet command prompt (usually with CONTROL-]), and will then connect you. What prompt you see depends on the type of server to which you connect; while SMTP, FTP and POP all greet an incoming user with the name of the server computer, HTTP is unusually silent. The assumption is that you know the name of the server to which you have connected, and that the connection was obviously successful.
At this point, you can issue an HTTP request. The simplest form of request is GET /, followed by a single newline character. This is an HTTP request of the most primitive sort, which is no longer used but is built into HTTP servers in order to ensure backward compatibility with older clients. Soon after pressing ENTER, you should see the contents of the root (or ``index'') page of the site to which you connected. It may be difficult to read the unparsed HTML at first, but remember that this is a simple debugging method.
A more sophisticated request uses HTTP/1.0, the first version of HTTP to include headers in the request and response. The request and response are each preceded by one or more ``header'' lines, containing a header name, a colon and a text string. The headers and request/response body are separated by a single blank line.
Our simple request can thus be rewritten as GET / HTTP/1.0, with the latter part tacked on to indicate that our client understands a more sophisticated version of HTTP. We then have to press ENTER twice--once to indicate the end of this line, and a second time to indicate that we don't want to send any headers between the command line and the body of the HTTP request.
You can also submit name-value pairs to the server with telnet, separating each name from its corresponding value with ``='', and each pair with an ampersand. Normally, such arguments are passed in a GET request to a CGI program or other dynamic content producer. For example:
GET /cgi-bin/foo.pl?name1=val1&name2=val2 HTTP/1.0 After pressing ENTER twice, you should see a set of response headers, followed by whatever output is produced by the CGI program foo.pl. You can also pass headers along with the request. After pressing ENTER following the initial GET line, type one or more headers, with a new line after each one. For example:
GET /cgi-bin/foo.pl?name1=val1&name2=val2 HTTP/1.0 Accepts: text/html Accept-language: text/html Following the final header, press ENTER twice--once to finish the header, and again to indicate that the request is complete.
Even moderately popular web sites tend to run more than one copy of Apache at a time. Older servers would wait until a new connection came in before deciding whether to ``fork'' off a new copy of the existing server process. The authors of Apache abandoned this method in favor of ``pre-forking'', meaning that Apache creates a large number of child processes immediately upon startup.
Each child process handles only one HTTP connection at a given time, meaning that there must always be at least as many Apache processes running as the number of simultaneous connections a site should handle. This maximum number is set with the MaxClients directive, which defaults to 150. If MaxClients is set too low, then one or more users may end up waiting until an Apache process finishes handling the previous connection, and is available to service a new one.
Apache dynamically changes the number of servers available depending on the number of requests that it gets, based on hints in httpd.conf. The MinSpareServers and MaxSpareServers directives tell Apache how many extra servers to keep around in preparation for incoming requests. If the number of spare servers ever goes below MinSpareServers, Apache spawns several new servers. By contrast, if there are more unused servers than defined in MaxSpareServers, Apache will kill off the extra ones.
If the server starts and responds successfully, but is taking a long time to accept connections, it could be that you have not told Apache to start enough servers. Try increasing either MaxSpareServers or MaxClients so that fewer people will have to wait for a free server to handle their request.
Of course, adding new servers is a potentially large drain on the computer's resources, consuming more CPU time and memory. mod_perl is a particularly large user of memory; so you should be more conservative when adding new Apache processes that include mod_perl. Use the standard Linux free command, which displays the amount of available physical and virtual memory, to get a better understanding of where the memory is going. I also like to use top, which displays, among other things, the amount of CPU and memory each process is consuming.
Because web servers need to respond to requests as quickly as possible, and because virtual memory is far slower than physical RAM, you should pay particularly close attention to virtual memory usage and minimize its use.
Small web sites that run one or more database servers (such as MySQL or PostgreSQL) on the same computer as a web server can find themselves in an unenviable bind. As the number of visitors to a dynamically generated site increases, the number of Apache processes must also increase. But in order to service all of these visitors, the number of database connections must also increase. At a certain point, a site becomes a victim of its own success, with the database and web site competing for system resources. For this reason, most popular database-backed sites separate the two functions onto at least two computers, with one or more database servers connected to one or more HTTP servers.
One nice way to get a snapshot of the current Apache status is with the mod_status module. mod_status describes the current state of every HTTP server, whether it is waiting for a new connection, reading the request, handling the request or writing a response.
mod_status is compiled into Apache by default, meaning that all you need to do in order to activate it is to set the appropriate directives, and set the default handler, or request-handling subroutine, to be ``server-status''. Any URL defined to have a handler of ``server-status'' then produces a status listing, ignoring the rest of the user's request.
It is thus most common for mod_status to be activated for only one URL. For example, we can create the virtual ``/server-status'' URL on our web server, such that anyone visiting /server-status will be shown the output from mod_status. We also indicate that Apache should always produce a full status listing, rather than the simple version. Here is one such simple configuration:
<Location /server-status> SetHandler server-status </Location> ExtendedStatus OnOnce I put those four lines inside of httpd.conf and restart Apache--or send it a HUP signal--I get the following output from the /server-status URL:
Server Version: Apache/1.3.12 (UNIX) mod_perl/1.24 Server Built: Mar 29 2000 12:25:42 Current Time: Friday, 21-Jul-2000 16:02:51 IDT Restart Time: Friday, 21-Jul-2000 16:02:48 IDT Parent Server Generation: 2 Server uptime: 3 seconds Total accesses: 0 - Total Traffic: 0 kB CPU Usage: u0 s0 cu0 cs0 0 requests/sec - 0 B/second - 1 requests currently being processed, 4 idle serversThe status information begins with a fair amount of text indicating how long the server has been running, and how many times people have accessed the server. It also indicates just how many bytes are being served by this web process and how many servers are sitting idle. mod_status thus provides a nice window into the world of the Apache server, allowing us to see whether we have defined MaxSpareServers in the most resource-efficient manner.
mod_status then produces output in the following format, which can seem cryptic at first:
W____............................................. ................................................. ................................................. .................................................Each ``.'' character represents a potential Apache server process which is currently not running. Those that are waiting for a new connection are represented by ``_''; those that are reading input from the user's HTTP request are represented by ``R''; and writing their output to the user's browser are represented by ``W''. Not all letters may be visible at a given time; the current Apache status changes dynamically, and the output you see from mod_status will change to reflect that.
Following this display, we get a play-by-play view of what each active process is doing. We can see which connections are taking a long time to be processed, which connections are the most popular each month, and nearly any other facet.
Of course, it is normally a bad idea to open up your status information to the entire world. Luckily, you can use the ``Order'', ``Deny'' and ``Allow'' directives to restrict access to a set of IP addresses or to an entire domain. For example:
<Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from .lerner.co.il </Location>With the above configuration, mod_status will display results only for IP addresses in my domain. Requests coming from another domain will get an HTTP response indicating that access is forbidden to them.
Until now, we have assumed our Apache installation is statically compiled, with all of the modules placed inside of Apache at compile time. This is the traditional way to compile Apache, and the default if you simply perform a ``./configure''.
The problem with the above configuration is that it is relatively inflexible. What happens if you discover that you forgot to configure Apache to include a particular module? You will have to recompile the entire program, specifying which modules you do and don't want to include when running configure. This does not seem bad at first--after all, how often will you want to add a new module to the system?
However, the problem is deeper than that, in at least two ways. Why should Apache consume memory for modules that might not be used? In addition, why should I have to recompile Apache every time a new version of just one module is released?
In order to solve this problem, the Apache developers now support a system known as DSO, or ``dynamic shared objects''. DSOs make it possible to compile Apache with only two statically linked modules, mod_core (which provides core functionality) and mod_so (which handles the loading of DSOs). All other modules can be built such that they are loaded only when necessary. Because modules are not an inherent part of Apache, they can also be upgraded without having to recompile the web server itself. As we will see below, this functionality can come in handy if you need to upgrade or debug an already-running and configured Apache server.
To compile Apache with DSOs, you will need to decide which modules you wish to compile statically, and which you will compile as DSOs. My preference is to make everything a DSO, but to compile only the modules that Apache installs by default. We can do this by changing our invocation of ``configure'':
./configure --enable-shared=maxThis will automatically enable mod_so, and will compile Apache with all of the default modules. After compiling Apache with make and installing it with make install, Apache will continue to work as before. The only difference is that it can now load modules dynamically, adding new modules to the already-compiled HTTP server as they are needed.
The default httpd.conf created by an Apache server compiled with --enable-shared looks slightly different from that created for a statically compiled Apache server. For one, most of the directives are hidden inside <IfModule> sections, making it possible for Apache to load even if some of its modules have not been loaded yet. In addition, each module must first be loaded with the LoadModule directive, and then enabled with the AddModule directive. For example:
LoadModule perl_module libexec/libperl.so AddModule mod_perl.cLoadModule takes two arguments, the name of the module and the file in which the .so file sits. The name must match the name with which the DSO module was compiled, while the file name should point to a directory relative to /usr/local/apache. In the above example, and by default, DSO modules are placed in /usr/local/apache/libexec.
In contrast with most of the other directives in httpd.conf, LoadModule and AddModule are sensitive to the order in which they are placed. LoadModule must come before AddModule, and each module must be loaded and added before its directives will work. In addition, some modules must be loaded before others, and will not work otherwise. If it is possible to let an automatic configuration tool take care of the insertion of LoadModule and AddModule, let it do so. This may save your web server from hard-to-track-down configuration problems.
Once Apache is compiled with DSO support, new modules can be added at any time. However, these modules must be compiled with the same configuration information that was present when the Apache server itself was compiled. This is handled automatically by apxs, the ``Apache Extension'' program, written by Ralf S. Engelschall. apxs makes it possible to compile an Apache module into a DSO (.so file), and then to install it into the appropriate Apache configuration. Unfortunately, there seems to be little or no documentation for apxs, meaning that it can be difficult to understand exactly what this program does or how it works.
For example, let us assume we have already compiled and installed Apache with DSO support. Several weeks later, we notice that many of the hits on our site are resulting in ``file not found'' errors, because users cannot spell the odd names we have used in our URLs. One solution is obviously to revamp the site such that users will be able to spell things more easily. But an easier solution is to install mod_speling, so capitalization and spelling are largely ignored.
To compile mod_speling as a DSO, type:
/usr/local/apache/bin/apxs -c mod_speling.ocapxs will invoke gcc, compiling mod_speling. The result will not be directly executable, but rather a library that can be invoked by Apache. If the compilation was successful, then we can install mod_speling with the following command:
/usr/local/apache/bin/apxs -i -n -a mod_speling.soUsing apachectl configtest is particularly useful when installing new DSO modules. It ensures that the module we have added is indeed there and working, and that Apache now understands any new directives we added outside of <IfModule> sections.
mod_perl, the Apache module which makes it possible to write new modules in Perl and to configure existing ones using the popular language, can be complied as a DSO. This requires the use of apxs, as in the case of mod_speling. However, mod_perl is much more complicated than a single module, and depends on many outside pieces of information for its compilation. mod_perl is thus configured and compiled similarly to stand-alone Perl modules, with perl Makefile.PL, followed by a make, make test and make install.
In order to compile a new version of mod_perl into an existing version of Apache, issue the following command:
perl Makefile.PL \ USE_APXS=1 WITH_APXS=/usr/local/apache/bin/apxsIf you want to enable mod_perl for all of the different Apache handlers, rather than just the default PerlHandler (for creating dynamic content), turn on the EVERYTHING switch. For example:
perl Makefile.PL \ USE_APXS=1 WITH_APXS=/usr/local/apache/bin/apxs \ EVERYTHING=1Once you have created the Makefile in this way, you can compile and install mod_perl into the existing Apache server with:
make make test make installThis method works not only for installing a new copy of mod_perl into Apache, but also for upgrading an existing copy. You can then test to see if mod_perl has been compiled into the server by telneting to the server's address and port number, and issuing the command:
HEAD / HTTP/1.0 This will return the HTTP headers associated with the / document on the server. Among other things, there should be a ``Server'' header indicating what kind of server is running. mod_perl adds a tag to this output string, so you should see output like the following:
Server: Apache/1.3.12 (UNIX) mod_perl/1.24Because mod_perl updates come out at different times than Apache updates, I have found it to be extremely useful to install and upgrade mod_perl in this way.
While mod_status can tell us what is happening with each Apache process, it cannot tell us what is happening inside a particular module. In general, this is not such a bad thing; do I really care what is happening inside mod_mime or mod_speling?
But in the case of mod_perl, where so many complex things are going on, it would be nice to be able to find out what is going on. Perl Apache::Status module, which works with mod_perl, provides this information.
In order to activate Apache::Status, we need to insert another section into httpd.conf. As with mod_status, we will create a new Location section which associates a particular handler with a virtual URL, traditionally known as ``/perl-status'':
PerlModule Apache::Status <Location /perl-status> SetHandler perl-script PerlHandler Apache::Status </Location>Once we have restarted the server or sent it a HUP signal, requesting the URL /perl-status will produce a menu of options, such as ``Environment'' and ``Inheritance tree''. Other Perl modules for mod_perl, such as HTML::Mason, can install their own hooks for Apache::Status, making it possible to look through their environments. For example, Mason provides a simple interface for viewing the current configuration, as well as a list of components that have been compiled and cached.
Over the days preceding my writing of this column, I used all of the above techniques to track down a problem with my installation of HTML::Mason. The colocated server that I help run was having some problems handling a growing load. Every few hours, all of the Mason-based sites on the server would fail, followed one or two hours later by the non-Mason sites. Actually, ``fail'' is too strong a word--the browser would send a request to the server, but would time out (after ten minutes or so) of waiting to receive a connection. What was going on, and how was I going to fix it?
My first reaction was to think that our server was running out of RAM. I used ``top'' and ``free'' to inspect the system, but did not see anything out of the ordinary. On the one hand, this was a relief. At the same time, this meant we were somehow running out of available servers, even though I had configured the system for a maximum of 150 simultaneous clients. My server might be popular, but 150 simultaneous accesses is highly unusual, even for me. Something else was obviously going on here.
There was clearly some connection to Mason, and perhaps to mod_perl. I decided it was about time to upgrade mod_perl to the latest version (1.24), using the technique I described above. So I upgraded the copy of HTML::Mason and several other related modules, restarted Apache with apachectl, and hoped the problem would go away.
Unfortunately, the simple upgrade did not do the trick. The system still stopped responding to requests after several hours of activity. I used Apache::Status to look at the state of mod_perl, and nothing seemed to be out of the ordinary.
I looked at the system status with mod_status, and discovered that over time, many of the Apache processes were getting stuck in the ``write'' state. In other words, the list of Apache processes was slowly, but surely, being transformed from a list of ``.'' (unallocated) processes to a list of ``W'' (writing an HTTP response) processes. Every invocation of a Mason component on the system was using up another Apache process, and never letting it go! It was no surprise, then, that the system was locking up after only a few hours; if the Mason-related parts of the site had been more popular, the system would have gone down even more quickly.
I looked through the Mason configuration file and determined that my use of the Apache::Session module, which allows mod_perl programs to track a user's movements and actions (as we saw in my article, ``Session Management with Mason'' in the August Linux Journal) was failing to return. So each time a Mason component was invoked, everything would work fine--until the component needed to return, at which point the executing program would simply spin its wheels, waiting to connect to the MySQL database.
My solution was not particularly elegant, but did the trick: I decided to stop using the MySQL version of Apache::Session (known as Apache::Session::MySQL) and start using the simple file-based version (known as Apache::Session::File). I restarted the server, and was delighted to discover that everything was working as it should.
Apache is a wonderful, robust and easily configurable HTTP server. However, it is also a complex piece of software that requires some experience in order to tune correctly. When mod_perl is added to the mix, the complexity increases even more. Luckily, it is possible to install Apache such that installing and upgrading modules can be easy and painless.
Moreover, a number of modules and tools (such as mod_status and Apache::Status) make it possible to view the internals of Apache as it is executing. By taking advantage of these tools, we can find and fix problems with our servers, spending our time on more interesting issues than trying to figure out why a server does not run.
Reuven M. Lerner owns a consulting firm specializing in web and Internet technologies, based in Modi'in, Israel. As you read this, he should (finally) be done writing Core Perl, published by Prentice-Hall. You can reach him via e-mail at reuven@lerner.co.il, or at the ATF home page, http://www.lerner.co.il/atf/.