return to first page linux journal archive
keywordscontents

At the Forge

Sending Mail via the Web

Mr. Lerner continues his look at building a simple, integrated mail system that can be accessed using a web browser.

by Reuven M. Lerner

Last month, we looked at a simple CGI program (read-mail.pl) that allows us to view our e-mail from within a web browser. This program takes advantage of the fact that most e-mail today is delivered to a ``post office'' server, from which it is downloaded by a user's mail-reading program. Mail clients use the POP3 protocol to retrieve messages from a post office server, which means our program can retrieve a user's mail by connecting to a server and retrieving one or more messages.

read-mail.pl is good enough for basic purposes, in that it makes it possible to retrieve mail from any web browser in the world. However, while it makes it possible to read e-mail messages, it does not provide any mail-sending capabilities. True, many web browsers include such a capability--but many, such as Netscape Navigator, do not.

This month, then, we will take a look at how to send mail via the web. Between read-mail.pl (from last month) and send-mail.pl (in this article), we will have a simple, integrated mail system that allows users to perform all rudimentary tasks from any web browser.

Sending mail based on the contents of an HTML form was one of the first uses to which CGI programs were put, back in 1993 when CGI and HTML forms first arrived on the scene. As we will see, sending e-mail is not particularly difficult from within a CGI program. However, we will look at issues related to security as well as what we would need to do to turn these simple programs into a full-fledged Hotmail competitor.

Basic Mail Sending

Sending e-mail from within a program is normally quite straightforward, particularly if you are using Perl. You open a pipe to a mail-sending program, and send it the headers and data for the message you want to send. For instance, here is a simple program that sends a short ``hello, world'' message to my address, reuven@lerner.co.il:

#!/usr/bin/perl -w
use strict;
use diagnostics;
my $mailprog = '/usr/lib/sendmail';
my $recipient = 'reuven@lerner.co.il';
open (MAIL, "|$mailprog $recipient") 
die "Cannot open $mailprog: $! ";
print MAIL "From: nobody\n";
print MAIL "To: $recipient\n";
print "\n";
print MAIL "Hello there!\n";
close MAIL;
There are several things to notice about this program. First of all, we set $mailprog to ``/usr/lib/sendmail'', the default name and location of the mail transfer agent (MTA) on Linux systems. If your copy of sendmail is in another location or if you are using a program other than sendmail, you will need to change the value of $mailprog.

Similarly, mail is sent to a single address, what is defined in $recipient. We will discuss the issue of recipients later, when we look at the issue of program security. Keep in mind that restricting the number of recipients to which the program will send e-mail reduces the possibility that your program will be turned into a mail gateway by spammers or others interested in sending anonymous mail.

We open a connection to $mailprog using Perl's open function, which allows us to write to a program's standard input (STDIN) by treating the program name as a file name, and by prefacing the program's name with a | character. Anything we print or write to that file handle will be treated as if it were sent to the program's STDIN. Any output from the program is ignored.

Finally, notice how we insert a new line character between the final header and the message body. As with HTTP, SMTP (the ``simple mail transfer protocol'' used for most mail transfer on the Internet) requires a blank line between the header and any data. This allows the receiving program to identify which lines are headers and which are in the body.

Mail::Sendmail

Those of us who have been sending mail from within Perl programs were delighted when the Mail::Sendmail module, written by Milivoj Ivkovic, was released to CPAN (the Comprehensive Perl Archive Network, based at http://www.cpan.org/). This module provides a portable method for sending mail from within a Perl program, but also provides a layer of abstraction between your program and the underlying mail system.

It is important to understand how the mail-sending mechanism works on your computer, particularly when it comes time to debug problems with sending or receiving e-mail. However, being able to send mail with three or four lines of Perl from a package maintained and updated independently of your program makes it possible to write shorter, more reliable programs. I have begun using Mail::Sendmail in all of my programs that send e-mail, and I suggest you do so as well, unless you have good reason to stick with the old system described above. One possible reason not to use Mail::Sendmail is if your program will be installed on a system without this package and on which you could not expect it to be installed. Given the ease with which packages can be downloaded and installed from CPAN, however, this should not deter you in a serious way.

The Mail::Sendmail module, like all other modules, must be imported at the top of any program that uses it with a use statement:

use Mail::Sendmail;
If you have not installed the module, or if it is not installed in one of the directories named in @INC (the path through which Perl searches when importing modules), Perl will fail with a fatal error.

After Mail::Sendmail is imported, sending a message becomes a two-step process. In the first stage, define a hash in which the keys are the headers and contents of the message. Specify the message body with the Message (or Body or Text) key. For example:

my %mail = (To => $recipient, From => $sender,
  Message => "Hello, there!");
You can then send the mail using the statement:

sendmail %mail;
The sendmail function is imported into the current name space automatically with the use Mail::Sendmail instruction. Mail::Sendmail defines many other functions as well, but none of these are imported into the default name space unless you explicitly request it.

As you can see, the above code is significantly shorter and easier to understand than what we did first. The fact that it is more portable and easier to maintain are nice additional benefits.

Moving to the Web

Now that we have seen how to send mail from within our program, we can concentrate on how to create a simple mail-sending facility from within a CGI program. Listing 1 shows an initial stab at send-mail.pl, which is a CGI wrapper around the above functionality.

Listing 1.

As you can see from the top of the program, send-mail.pl imports a large number of modules before it gets down to business. It uses strict and diagnostics to ensure our variables are lexicals (i.e., temporary variables defined with my), only hard references are used, and barewords are not considered subroutine calls. (Bareword is a Perl term for a word in which its use in a program is unclear. Originally, any such words were simply prohibited. Now that subroutines can be called without a leading &, barewords are interpreted as subroutines. This can confuse programmers and lead to buggy programs, so it is usually best to avoid them.)

Then, because this is a CGI program, we import the CGI.pm, a module which provides us with all the CGI functionality we could imagine, useful for receiving user input and sending output to a web browser. We also import CGI::Carp, which provides us with improved messages in the web server's error log. By importing the fatalsToBrowser symbol from CGI::Carp, we also ensure that fatal error messages are sent to the user's browser, as well as the error log. Normally, a fatal error in a CGI program results in an incomprehensible numeric error message on the user's browser. While the output from fatalsToBrowser might not seem much more useful or comprehensible to a non-programmer, it is not as scary as a set of numeric codes. Also, it makes the program much easier to debug than it would be otherwise.

Finally, we import Mail::Sendmail as described previously.

Other than retrieving three HTML form parameters (sender, recipient and message) and using them in the invocation of Mail::Sendmail::sendmail, this program contains little you have not seen before. We do want to ensure the mail is sent before reporting it has been, so we use die to exit with a fatal error; it will end the program after printing an error message to the user's browser and the error log.

We can determine if the mail was sent by checking the return value from the ``sendmail'' subroutine. If it returns true, we know the mail was sent. If it returns false, the program stopped before it was sent. Here is one simple way to accomplish this:

if (sendmail %mail) 
{
# Print a message for success
}
else
{
die "Error sending mail: $Mail::Sendmail::error \n";
}
The variable $Mail::Sendmail::error (i.e., the variable $error inside of the package Mail::Sendmail) contains a detailed description of why the mail was not sent. Since the sendmail subroutine returns true when it succeeds and false when it fails, the above construct tells Perl, ``try to send the mail contained in %mail--and if you cannot, exit and print a message describing why it failed.''

If the mail is sent successfully, the user is returned a message indicating the program performed its task. It also prints the contents of the mail. Giving the user detailed feedback of this sort is always better than printing a simple ``success'' message, since the user might not be sure which e-mail message is being referenced.

Creating the Form

Now that we have a CGI program capable of sending mail, we need some way to invoke it. We could pass parameters as name-value pairs in the URL, but that is difficult and not very user friendly. We will thus send the name-value pairs using POST, which sends them to the program's standard input (STDIN). POST input to a program is generally sent from an HTML form. Here is a sample form that invokes send-mail.pl:

<HTML>
<Head>
<Title>Send e-mail!</Title>
</Head>
<Body>
<H1>Send e-mail!</H1>
<Form method="POST"
  action="/cgi-bin/send-mail.pl">
<P>Sender: <input type="text" name="sender"></P>
<P>Recipient: <input type="text"
  name="recipient"></P>
<P>Message:</P>
<textarea cols="60" rows="20" 
  name="message"></textarea>
<P><input type="submit"></P>
</Form>
</Body>
</HTML>
This form has three elements, named sender, recipient and message. These are the same elements we retrieved with the param method in send-mail.pl. If you modify the names of the parameters in the HTML form, make sure to modify the program as well, or the form elements will not be picked up.

All HTML form elements are sent as name-value pairs in which the value is a text string. The CGI program receiving and interpreting the data does not know, and furthermore does not have to know, whether the input field was a text field, a text area, a check box, a radio button or a pull-down menu.

Indeed, we can even substitute a hidden field--which does not appear on the web browser and cannot be changed by the user--for a text field, which comes in handy if we want to hard-code a value, such as that of the recipient. Simply replace the recipient line with

<input type="hidden" name="recipient"
value="reuven@lerner.co.il">
and all mail will be sent to my address.

Similarly, if you want to allow people to send mail to a number of addresses, but still restrict them somewhat, you can use a selection list:

<select name="recipient">
<option value="reuven@lerner.co.il">Reuven
<option value="eviltwin@lerner.co.il">Reuven's evil twin
<option value="ljeditor@ssc.com">LJ editor
</select>
Changing our HTML form in any of these ways requires no changes to our CGI program. Once again, send-mail.pl expects to receive a name-value pair in which the name is recipient and the value is a valid e-mail address.

With the above form and CGI program in place, we should be able to send mail to any e-mail address on the Internet.

Preventing Spam

The problem with the above form is it truly allows anyone to send mail to any address on the Internet. Furthermore, it allows the sender to pretend to be any address on the Internet. This is precisely the sort of tool spammers love to exploit. If you were to put our original version of send-mail.pl on your site, you would eventually discover someone was using your server and bandwidth to send their spam.

Several possible ways can be used to prevent this. One is to remove the possibility of sending mail to users or domains outside of a selected list. For instance, we can define a hash where the keys are approved e-mail addresses:

my %approved_recipient = ('reuven@lerner.co.il' => 1,
  'ljeditor@ssc.com' => 1);
Using a hash allows us to check the status of any e-mail address in a constant time interval, regardless of the number of addresses. If we were to use an array, for example, we would potentially have to search through the entire array before we could be sure of an address's status, meaning that the time necessary to perform such a test would grow in proportion to the number of elements in the array.

We can thus check to see if an address is approved by inserting the following code:

if (!$approved_recipient{$recipient})
{
die "Unapproved address \"$recipient\": Mail" .
 " was not sent.\n";
}
A version of send-mail.pl with the above code can be found in Listing 2 in the archive file (ftp://ftp.ssc.com/pub/lj/listings/issue62/3449.tgz).

We can similarly allow mail to be sent to any address within a particular domain by putting all of the approved domains inside an array:

my @approved_domains = ('lerner.co.il'
       'ssc.com');
We then create a variable, $match_found, which defaults to 0:

my $match_found = 0;
$match_found will be set to 1 only if one of the approved domains matches the domain in $recipient. We check this with a short loop:

foreach my $domain (@approved_domains)
{
if ($recipient =~ m/$domain$/)
{
$match_found = 1;
last;
}
}
We use last to break out of the loop when we find a match, in order to save some time. If you know certain domains will receive mail more often than others, you should put them at the beginning of @approved_domains, since the earlier an item appears in that array, the sooner the match will be found.

We then send mail only if $match_found is true (i.e., non-zero). If $match_found is 0, we print an error message:

# If the domain was not approved
else
{
die "Mail was not sent: The recipient's domain " .
 "is not approved.\n";
}
The version of send-mail.pl in Listing 3 in the archive has these additions.

Checking for Errors

If we want our program to be robust, we must do more than check for security violations. We must check for input from the user that might not affect security, but might lead to bugs or other unpleasant surprises.

For instance, if we invoke send-mail.pl directly from a URL, for example

http://www.lerner.co.il/cgi-bin/send-mail.pl
the program will report that the mail was sent with a blank sender, recipient and message. This is bad for two reasons. First, no mail was sent, since necessary headers were not assigned any values, so the program is providing us with incorrect information. Second, we should never get to the point where blank data from the user is accepted as input for mail.

We can prevent this situation by ensuring send-mail is always invoked with POST, and that $sender, $recipient and $message are non-blank. If any of these is equivalent to the empty string, we exit prematurely from the program, telling the user each must have a non-blank value. Once again, using die is better in debugging environments than in production code, simply because of the style of error message it produces. There is no reason why you could not forward the user to an error message page, or print a nicely designed page describing what was missing, rather than simply dying.

Competing with Hotmail

Between send-mail.pl and read-mail.pl, we have created a small system to send and receive e-mail. Is this enough to compete with Hotmail creating our own small web-based mail service? The short answer is ``no'', although the longer answer is that it is probably enough to suit most purposes.

Part of the problem is these two programs are run using CGI. While CGI is portable across platforms and languages, it is inherently slow, requiring the web server to create a new process each time the program is invoked. While this is more than adequate for lightly loaded machines, it quickly becomes a performance drain as the number of hits increases.

Each HTTP server has developed its own native interface that allows you to attach your program to the server process. Since Apache is free software, several such interfaces have been developed for it, including mod_perl and mod_php. The former allows you to write CGI-like programs in Perl, attaching them to the server process. This means your functionality becomes a subroutine within the server program, rather than an external program that must be invoked separately. The speed difference between a program running under mod_perl and the same functionality in a CGI program is staggering and should convince just about any die-hard CGI user to switch to mod_perl.

A site wishing to compete with Hotmail would probably want to use mod_perl or a similar server-specific API in order to get the maximum performance out of its hardware.

Aside from performance, another issue is where the mail is to be stored. The programs we have discussed, read-mail.pl and send-mail.pl, expect the user's mail to be stored on a POP server elsewhere on the Internet. Hotmail and similar services have their own POP servers for incoming mail, as well as their own MTAs (usually sendmail, although other MTAs are apparently better for high-volume mail servers) running on their systems.

However, Hotmail will allow you to retrieve mail only from their own POP servers, while read-mail.pl will allow you to retrieve mail from any POP server, including one that would normally not have a web interface. Whether you restrict mail checking by users to your own system, a number of servers within your organization or anywhere else is up to you.

Finally, services such as Hotmail survive due to advertising, and one of the most popular ways to advertise is to add a short note to the bottom of each message indicating which web-based mail service was used to send it. We can easily do that by concatenating our own footer to the message the user sends with these instructions:

my $footer = "-\nBrought to you by ReuvenMail\n";
my $message = $query->param("message");
$message .= $footer;
my %mail = (To => $recipient, From => $sender,
  Message => $message);
Now everyone will know which mail service you were using when you sent mail from your web-based system. This functionality is included in the final version of the program, Listing 3 in the archive file.

Finally, Hotmail has millions of members, which means it is relying on more than a single computer running Linux for mail delivery. Operating a single system for sending and receiving mail is not nearly as hard as creating a large, scalable system. If you are interested in truly competing with Hotmail, you will need capital investment and a good knowledge of networking protocols, in addition to Linux, Apache and the above programs.

Conclusion

Sending mail from an HTML form is one of the oldest uses of the CGI standard. Many such programs have been created, although they have not always been careful to restrict spam or other abuses. As we have seen, it is possible to get around most of these problems, but it is important to think about them before putting your system on-line.

By combining a simple mail-reading program with a simple mail-sending program, we can create a basic web-based mail service that is as open or closed as we desire. Perhaps these programs do not scale as well as Hotmail, but they do give us some insight into how that service (and similar ones) work, as well as what we might need to do with our own programs in order to make them as useful as possible.

Resources

Reuven M. Lerner is an Internet and Web consultant living in Haifa, Israel, who has been using the Web since early 1993. His book ``Core Perl'' will be published by Prentice-Hall in the spring. Reuven can be reached at reuven@lerner.co.il. The ATF home page including archives and discussion forums, is at http://www.lerner.co.il/atf/.