Getting Rid of Spam

Here's a way to filter unwanted mail using procmail.

by Brandon M. Browning

If you regularly receive a high volume of mail (e.g., you subscribe to several mailing lists) or are the target of unsolicited bulk e-mail (UBE), mail filtering may have crossed your mind. procmail is a flexible tool that allows you to process incoming e-mail and perform user-definable actions, such as filtering, prioritizing or informing you of new mail. procmail is part of a larger toolset that also includes formail, a program that can handle tasks such as recognizing duplicate messages, digest bursting, header extraction/addition and the generation of auto-replies.

Setting It Up

If you do not have procmail on your system already, you can pick up the latest copy at ftp://ftp.informatik.rwth-aachen.de/pub/packages/procmail/procmail.tar.gz. The current version as of this writing is 3.11pre7. (Don't let the ``pre'' scare you; this version is very stable.) The package compiles out-of-the-box for Linux, but you may want to change a few compilation parameters (like installing into /usr/local instead of /usr). Read the INSTALL file included in the archive to get the complete installation process.

Once you have compiled and installed procmail, you can start using it. Since it is designed to process incoming mail, you need to modify your ~/.forward file (or create one if you do not have one) to include the following line that will automatically invoke procmail:

"|IFS=' '&&/usr/local/bin/procmail -f-||exit 75 #

Include all quotes and do change YOUR_LOGIN_NAME. The reason for doing this is described in the FAQ that is bundled with the archive. Next, you will need to create a recipe file that procmail will use to filter your mail.

procmail Usage

When executed, one of the things procmail does is look for a file called .procmailrc in your home directory. This file holds all of the commands, called recipes, that tell procmail what to do with the incoming mail. A recipe has three parts: the start (:0, followed by optional flags and local lockfile), the optional conditions and the action to be taken if the conditions evaluate out to true. Each part of a recipe is on a separate line.

The flags tell procmail which part of the message to look at (headers, body or both) and whether the recipe is a delivering (terminating) or non-delivering recipe. The local lockfile ensures that concurrently running procmail processes do not interfere with each other when writing out to a mailbox and should be used only for delivering recipes. The optional conditions start with an * (asterisk). There can be more than one condition or no conditions as you feel necessary. Everything proceeding the * is passed to an internal regular expression (regex) engine, which is compatible with egrep. All conditions are logically ANDed together. If you choose to have no conditions, procmail defaults to a true result (which is what you would expect). The action line tells procmail what action to take if the all the conditions match. The action line can start with a ! (to forward the message), a | (to pass the message to a program), or a { (to start a nested block). Anything else is taken to be a mailbox name.

Here is an example of a recipe that will filter all mail coming from the Whitehouse into a mailbox of the same name:

:0:
  * ^From: .*whitehouse\.gov
  whitehouse

Let's analyze this one line at a time. The first line tells procmail that we have started a recipe (:0) and that we want procmail to determine the local lockfile (the second :). The next line is the condition that procmail must match in order for this recipe to be true. By default, procmail scans only the headers, which is what we want. The last line is the action, which tells procmail to write the message to a file called whitehouse.

For contrast, here is a non-delivering recipe:

:0 f
  * ^X-Face:
  | formail "I X-Face"

This one uses formail to strip out an unwanted X-Face header. Notice the lack of a local lockfile. Since this is a filter, a lockfile is unnecessary. procmail will still work if you place one there, but it will complain.

Unsolicited Mail

At the beginning of this article, I mentioned that procmail is useful to filter unwanted mail. UBE (or spam as it is more commonly known) has become an annoying trend and a nuisance. The volume of spam is believed to have increased substantially on Usenet, where people excessively post the same message to various newsgroups. Usenet spam is perceived to be ``cancellable'', meaning a posted message can be deleted by the moderator before being read by too many people. To get around this type of cancellation, someone got the idea to send the message to you directly rather than posting it to Usenet where it might be deleted before you read it. Hence, UBE started to infiltrate users' in-boxes. Pioneers of this form of marketing quickly found out that many users disliked spam in any form, and often found their own mailboxes full of flames. They started to obscure headers to make it hard to find out where the message really came from.

Why is spam considered the bane of the Internet at large? Unlike the junk mail you receive in your postal box at home, spammers rarely pay as much to send the spam as the recipient does to receive it. The fact that they pay less than you is called ``cost-shifting''. Another form of this shifting is the use of third-party computer resources by the spammer for sending their bulk e-mail without permission. By doing so, the spammer is costing that innocent company both time and money spent to clean up after the spammer. Another tactic widely used is the munging of the headers in such a way that uneducated recipients may waste the time of innocent third parties who had nothing to do with the spam in the first place. This type of deceitfulness can also be considered cost-shifting and has been ruled illegal in the U.S. Courts.

Cost-shifting is not the only argument against spam. There is no single removal point as each spammer generally runs their own list. To that end, they are not required to honor a removal list. As more and more people send spam, you will never be able to remove your address from every single list. Why should you have to if you didn't ask for it in the first place? Finally, a great deal of the spam is or could be considered illegal, such as pyramid schemes, multi-level marketing schemes and lotteries.

Dealing with UBE

Rather than having all that garbage clog up your in-box and make it unusable for real work, you can now use procmail to filter it out. Earlier I mentioned that spammers try to obscure headers to make it hard to trace. By doing so, they sometimes give inadvertent ``signatures'' that you can tell procmail to filter on. For instance, a popular bulk e-mailer, the Stealth Mailer, inserts a false Received: line to deter flames. However, both versions generate the wrong time zone. Armed with this knowledge, you can now filter out a great deal of spam. I have yet to see a false positive on this one.

# Filter spam that used the Stealth Mailer Classic
  :0
  * ^Received:.*id GAA.*-0600 \(EST\)$
  spam

Another great spam filter looks for a ``Comments: Authenticated sender is'' header. Unfortunately, filtering on that alone does not do the trick because Pegasus Mail (a popular mail client for the Windows operating system) uses this header legitimately. Fortunately, Pegasus adds an X-Mailer: header in addition to the Comments: header. If both the Comments: and the X-Mailer: exist, then a Pegasus Mail user sent the message (and is probably legitimate); otherwise, it is a bulk mailer. The following recipe will filter this situation. (Note that there is a space and a tab between the square brackets. Unfortunately, procmail does not have a whitespace escape sequence as Perl does.)

# Only Pegasus Mail for the WinOS generates a
# valid "Comments: Authenticated sender is ..."
# header. If this is present and the X-Mailer is 
# not; then the message in the question is almost 
# certainly spam.
:0
* ^Comments:[ 	]*Authenticated sender
* !^X-Mailer: Pegasus Mail
spam

These two recipes alone filter out a majority of my spam. You can quickly see that a list of these recipes strung together would be beneficial. This is exactly what several free packages have done. My personal choice came down to Alcor's filters (http://alcor.concordia.ca/topics/email/auto/procmail/spam), which I found to be non-intrusive, easy to understand and quite flexible. Alcor's filters work by applying over 1300 filters to the message. If a filter is matched, the message is tagged with a special header. Then, all you have to do is take whatever action (e.g., delete, write sender, etc.) you deem appropriate for these messages with the special headers. I personally avoid ``reply'' because I dislike using auto-responders, and ``delete'' because I believe in checking for false positives (of which you will unavoidably get a few).

I recommend downloading all of the tag recipes (use the ``save as source'' [not text] feature on your browser). I placed the filters in a new directory, cleverly called ~/.procmail. You will most likely need to edit the file tag-radical in order to comment out (using a # at the beginning of the line) or change the three uncommented INCLUDERC lines. Otherwise, you will see annoying ``Couldn't read xxxx'' errors in your $LOGFILE each time you process a message. Once that is done, add the following recipe in your ~/.procmailrc at the point you wish to check the incoming message for spam. I check mine at the very top before I do any kind of filtering and have found that this works well.

# This enables Alcor's tagging filters
INCDIR=$HOME/.procmail
INCLUDERC=$INCDIR/tag
INCLUDERC=$INCDIR/tag-agis
INCLUDERC=$INCDIR/tag-aol
INCLUDERC=$INCDIR/tag-contents
INCLUDERC=$INCDIR/tag-jdfalk-cyberpromo
INCLUDERC=$INCDIR/tag-jdfalk-llv
INCLUDERC=$INCDIR/tag-jdfalk-nancynet
INCLUDERC=$INCDIR/tag-panix
INCLUDERC=$INCDIR/tag-radical

  :0:
  * $ ^$special_header
  spam

That is all. If everything goes well, you should notice that most (if not all) of your spam now goes into your mailbox named spam. You can test it by sending a message to yourself that contains content that these filters will catch (try sending yourself a message with -- Headers -- somewhere in the body).

I've Filtered. Now What?

Alcor's tagging system might catch legitimate mail, so I do not recommend deleting anything before you look at it. Once you have verified that it is spam, you have two options: complain or delete. If you want to fight spam, I recommend you to read the SPAM-L FAQ (http://www.ot.com/~dmuth/spam-l/) and possibly join the mailing list. Instructions on how to do so are in the FAQ.

Conclusions

This article is only the tip of the iceberg on using procmail and its accompanying programs. If you are interested in the continued use of procmail to filter your e-mail, I recommend the procmail mailing list. The regulars there are knowledgable and willing to help. You may also want to search out other procmail solutions. To put these filters through a stress test and to help further develop them, I have subscribed to a special mailing list that sends nothing but spam that is forwarded through it, which takes special care to try and filter duplicates. At the time of this writing, 83% of mail I have received from this list was properly filtered.

Resources

Acknowledgements

Brandon M. Browning is a Software Engineer for NorthWestNet, Inc., an ISP located in Bellevue, Washington. When he is not hacking Perl or fighting spam, he can often be found pursuing his other interests: The Tick, Babylon 5, Star Wars and on occasion sleeping. He can be reached by e-mail at brandon@powertie.org.