return to first page linux journal archive
keywordscontents

At the Forge

Learning to use the httpd error log to debug CGI programs

Reuven M. Lerner

As discussed in my last column (LJ, Issue 34, WWW section), debugging CGI programs is often more difficult than debugging their non-CGI counterparts, if only because our programs aren't interacting directly with the user's terminal. We have only one chance to get input from the user--when the program is invoked--and only one chance to send a response back to the user--just before the program terminates, in the HTML that our program creates.

This difficulty is complicated further by the fact that running CGI programs correctly requires that a number of other items be in place. For example, you must set the permission bits correctly, the web server must be configured to serve CGI programs from your directory, and the correct version of Perl (if you are using Perl for your CGI tasks) must be installed. None of this is very difficult, and it's particularly uncomplicated when compared with the full administration of a Unix system, but it's not quite as simple as compiling and executing other, non-CGI programs.

In the last article in this series, we looked at techniques that you can use when things go wrong. But what happens when all of the permissions and directories are set correctly and you still end up getting a mysterious error on your browser?

The answer is that while web servers send a generic error message back to the user's web browser, they also keep an ``error log,'' a file into which information about each error is placed. This month, we will examine the error log to see what it contains, look through several errors and how they appear in the error log, and finally consider some ways in which we can use the error log to make debugging easier.

Why an Error Log?

As Unix programmers are well aware, programs typically have at their disposal three basic file descriptors: standard input (stdin), standard output (stdout), and standard error (stderr).

Standard input is typically associated with the user's keyboard; when you are entering text in Emacs or similar program, the program is probably collecting your keystrokes via stdin. Standard output, by contrast, typically goes back to your console--either the window into which you are typing or the entire screen.

Standard error is normally sent to the user's console, and thus is often confused with stdout. But the two are completely separate and can be redirected to different places. You can see this in action by trying the command:

ls *zz* > zz-files
which, on my system, produces the following output on the screen:

ls: *zz*: No such file or directory
Since there aren't any files containing zz in their names, the file zz-files is empty. This error message is sent to stderr, which is kept on the screen, while the (empty) list of files is sent to stdout, which we redirected to the file named zz-files.

A CGI program still has access to stdin, stdout and stderr, but these files are used somewhat differently as the program isn't running on the user's console. For a CGI program, stdin reflects the contents of an HTML form as submitted via the POST method. Stdout, not surprisingly, is used to send a response, which is typically not in HTML format, to the user. Finally, stderr is redirected to the HTTP server's error log, which is usually kept far away from visitors to the web site.

Therefore, if our program dies unexpectedly, error messages are directed to the httpd error log rather than to our browser. On the one hand, this is probably a good thing--after all, you probably don't want the world at large to know the number and kind of bugs found in your code. On the other hand, this means that you need to look at a separate file, and that you need to log on to the server computer before you can know exactly what has happened to your program.

If you are renting space on a web server run by a web space provider or Internet service provider--or more importantly, if you are looking to rent space from such a provider--make sure that you have access to these logs. You can certainly write and debug CGI programs without access to the error log, but your life will be infinitely more pleasant if you can get to it.

Looking at the Log

Each HTTP server logs information in a slightly different way. In this column, I will describe the logs produced by Apache, a server derived from NCSA httpd, which was written and distributed by the National Center for Supercomputing Applications (NCSA). You can get a copy of Apache, as well as documentation relating to this excellent server, at http://www.apache.org/.

In my copy of Apache, the error messages are placed in the file with one line per error. Error messages are in the format:

[DayOfWeek Month Date Time Year] Message
Thus one possible entry in the error log would be:

[Mon Dec 2 16:05:50 1996] Coke machine needs refilling
As you might have guessed, this means that the server recorded the error at 4:05:50 P.M. on Monday, December 2, 1996.

Your httpd probably records all sorts of things in the error log that you probably wouldn't classify as errors. For example, my system's error log contains the following error message:

[Tue Nov 5 17:03:48 1996] access to \
  /home/httpd/cgi-bin failed for ahad-haam, reason: \

  script not found or unable to stat
The above message means that the server wasn't able to find a particular file on my web server. That's right--every time someone points his web browser to your system and requests a document that doesn't exist, an error message is recorded. This is actually a useful thing to know, both for security reasons (since it might be nice to learn if people are trying to grab sensitive files via your web server) and in order to help users who often but mistakenly request particular files. For example, if a particular file is often requested under an incorrectly spelled name, you can set up an alias or symbolic link on your system.

As interesting and useful as an error log might be for system administrators, network administrators and web masters, we are going to concentrate on the error log's role when trying to debug CGI programs. For example, try the following:

#!/usr/local/bin/perl5 -w

  use strict;      # Check our syntax strictly
  use diagnostics; # Tell us how to fix mistakes
  use CGI;         # Import the CGI module

  # Create an instance of CGI
  my $query = new CGI;

  # Notice that we are *not* sending a MIME
  # header, which we should always remember
  # to do.

  # Begin the HTML
  print $query->start_html(-title =>
	"Hello, world!)";

  # Print something fairly uninteresting
  print "<P>Hello, World Wide Web!</P>\n";

  # End the HTML
  print $query->end_html();
The above program will work just fine if run from the command line, since the only error is a missing MIME header, which is necessary only for CGI programs. But if you try to run it from a browser, you get the dreaded message:

Server Error. The server encountered an internal
error or misconfiguration and was unable to complete your request.
Not very useful for debugging your program, particularly since running it from the command line revealed no obvious errors. How can you discover what has gone wrong? By looking at the error log.

On my system, the error log contains the following entry:

[Mon Dec 2 17:32:04 1996] access to
/home/reuven/Text/Websmith/test-8.pl failed
for ahad-haam, reason: malformed header from script
Not only does Apache tell us that there was an error, and that the program died, but it even presents the reason, which is listed as ``malformed header from script''. Not the most obvious message for newcomers, but it tells you what you need to know, namely that the header sent wasn't in the proper format and thus caused an error.

We can now fix the program and remove the error message by sending a MIME header before anything else:

#!/usr/local/bin/perl5 -w

  use strict;      # Check our syntax strictly
  use diagnostics; # Tell us how to fix mistakes
  use CGI;         # Import the CGI module

  # Create an instance of CGI
  my $query = new CGI;

  # Send the MIME header
  print $query->header("text/html");

  # Begin the HTML
  print $query->start_html(-title =>
	"Hello, world!");

  # Print something fairly uninteresting
  print "<P>Hello, World Wide Web!</P>\n";

  # End the HTML
  print $query->end_html();
Sure enough, the fixed program doesn't produce any new messages in the error log.

Premature Program Termination

A typical idiom in Perl might be:

open(FILE, "$filename") ||
	die "Cannot open $filename";
The above line means that $filename should be opened for reading, and that the opened file should be accessed via the descriptor named FILE. If the program cannot open the file for some reason the program should terminate, producing a message. Let's look at how that works within a CGI program:

#!/usr/local/bin/perl5 -w

  use strict;      # Check our syntax strictly
  use diagnostics; # Tell us how to fix mistakes
  use CGI;         # Import the CGI module

  # Create an instance of CGI
  my $query = new CGI;

  # Set this to a file 
  my $filename = "/foo/bar/blah.txt";

  # Send the MIME header
  print $query->header("text/html");

  # Begin the HTML
  print $query->start_html(-title =>
	"Hello, world!");

  # Print the contents of $filename
  open (FILE, "$filename") ||
	die "Cannot open $filename ";
  print while (<FILE>);
  close (FILE);

  # End the HTML
  print $query->end_html();
Looks good, right? Well, the above will indeed work just fine, assuming that $filename exists. But if and when it doesn't, or if the program doesn't have permission to read the file, the second half of the statement is executed--and the program terminates, producing an error message.

If $filename doesn't exist, and you run the program from the command line, you get the following output:

Content-type: text/html

<HTML><HEAD><TITLE>Hello, world!</TITLE>
</HEAD><BODY>Uncaught exception from user code:
Cannot open /tmp/blah.txt at ./test-8.pl line 20.
But if you run it from your web browser, you get nothing at all--no error message, and no response, because the error message ``Cannot open /tmp/blah.txt'' was sent to stderr, rather than stdout. The error log, however, contains the following:

Uncaught exception from user code:
Cannot open /tmp/blah.txt at /home/reuven/Text/Websmith/test-8.pl line 20.
Note that the error message is inserted into the log without date or time information, since the text was sent directly by the Perl program (or by the Perl binary), rather than the web server. The web server automatically stamps its own error messages with the current time and date, but your programs don't enjoy that feature--unless, of course, you include the Carp library included with CGI.pm, which redefines the die routine, as well as several others, such that error messages are preceded by the relevant time and date. Thus, if you include the line:

use CGI::Carp;
at the top of the above program, the program still fails as expected, but your error log now contains the following entry:

[Tue Dec 3 11:55:14 1996] test-8.pl:
Cannot open /tmp/blah.txt at
/home/reuven/Text/Websmith/test-8.pl line 21.
Now we know not only what went wrong, but also when it happened. The Carp library can be very useful when debugging CGI programs, especially if you find yourself running the same program repeatedly. This way, there isn't any question whether the error occurred the first or fifth time that you ran the program, since the time at which the program ran is written right next to the error message.

Once you have installed Carp, you can use it for recording non-fatal errors, as well. For example, if you are trying to keep track of the value of a particular variable, you can include the line:

carp 'variablename = "'. $variablename . '"';
Because it includes the time and date information, this is better than:

print STDERR 'variablename = "', $variablename,
	'"';
Although using Perl's built-in syntax checkers and warnings (with the -w flag and the strict and diagnostics modules) is a great idea, realize that loading the CGI::Carp module will set off warning bells because it redefines several subroutines. You can ignore these warnings, but don't be surprised when they appear.

What About the User?

All this time, we have ignored our users, who are really supposed to be the focus of our efforts. It's all well and good that we are improving error messages for the purpose of debugging programs, but none of the ideas that we have looked at until now have done much for what the user sees.

The problem is that if you use die to output a message to the error log, the program dies off before it has a chance to produce any sort of ``whoops'' message for the user. If you're building an interactive web site, the last thing you want to do is present a user with a blank results page; at the very least, you should tell him that something has gone wrong, and that you are trying to fix the problem.

One easy way to do this is to replace the line:

open (FILE, "$filename") ||
	die "Cannot open $filename ";
with the following:

open (FILE, "$filename") ||
	&log_and_die ("Cannot open $filename ");
This new version opens the file as necessary, but invokes a subroutine named log_and_die, if the file cannot be opened. You can then define the subroutine as follows:

sub log_and_die
  {
	# Get arguments
	my $string = shift;

	# Begin the HTML
	print $query->start_html(-title => "Error");

	# Give a message
	print "<P>Sorry, but this program ";
	print "has encountered an error. The system ";
	print "administrator has been ";
	print "informed.</P>\n";

	print "<P>Would you like to return to the;
	print "<a href=\"/\">home page</a>?\n";

	# End the HTML
	print $query->end_html();

    # Now die with the error message
    die $string;
  }
The above routine is obviously a bit Spartan for most tastes, but the point is obvious: By wrapping errors into a separate routine, you can have our cake and eat it, too. The user gets an error message directing him or her to another site, and apologizing profusely for the program's errors. And the system administrator or web master gets a note in the error log describing exactly what went wrong, making it possible to fix the program more easily.

You could obviously add to this routine. For starters, a good idea might be to e-mail the system administrator or programmer to let him or her know that an error had occurred. If the program were part of a critical system, then perhaps it would even be a good idea to send e-mail to a system tied into an alphanumeric paging system.

Sending MIME Headers First

In all of the above examples, we call die only after having already sent the MIME header to the browser (with the built-in CGI.pm ``header'' method). Failing to do so can create all sorts of odd problems, and results in the dreaded message:

Server Error. The server encountered an internal
error or misconfiguration and was unable to complete your request.
For example, running the following CGI program:

#!/usr/local/bin/perl5 -w

  use strict;      # Check our syntax strictly
  use diagnostics; # Tell us how to fix mistakes
  use CGI;         # Import the CGI module
  use CGI::Carp;

  # Create an instance of CGI
  my $query = new CGI;

  # Don't do very much
  die "This program crashed on purpose ";
The above program dies upon reaching the last line, and inserts an appropriate message into the httpd error log. What will the user see on his or her browser? In the above example, the user will get the dreaded:

The server encountered an internal error or
misconfiguration and was unable to complete your request.
You can alleviate this problem by sending a MIME header before performing any serious computations:

#!/usr/local/bin/perl5 -w

  use strict;      # Check our syntax strictly
  use diagnostics; # Tell us how to fix mistakes
  use CGI;         # Import the CGI module
  use CGI::Carp;

  # Create an instance of CGI
  my $query = new CGI;

  print $query->header("text/html");

  # Don't do very much
  die "This program crashed on purpose ";
Note: while the documentation for CGI::Carp claims to alleviate this problem, experiments with the version installed on my system (from early September 1996) didn't seem to work quite this way. I thus suggest that you output MIME headers as close to the beginning of your programs as possible, simply to alleviate such problems.

Despite the colorful language that might occur to you when your program produces an error message, good warnings and logs are extremely helpful when writing and debugging code. This is especially true for CGI programs, where much of the work takes place behind the scenes and the only output to the user comes via a web browser. Learn to use the httpd error log effectively, and while your programming problems certainly won't disappear, your mistakes should be easier to identify and correct.

Reuven M. Lerner has been playing with the Web since early 1993, when it seemed more like a fun toy than the world's Next Great Medium. He currently works from his apartment in Haifa, Israel as an independent Internet and Web consultant. When not working on the Web or informally volunteering with school-age children, he enjoys reading (on just about any subject, but especially computers, politics, and philosophy-- separately and together), cooking, solving crossword puzzles, and hiking. You can reach him at reuven@the-tech.mit.edu or reuven@netvision.net.il.