Home > Articles > Operating Systems, Server > Linux/UNIX/Open Source

Marcel's Linux Walkabout: But I Don't Like Spam!

  • Print
  • + Share This
Are you drowning in a sea of spam (unsolicited email)? Hoping somebody somewhere will throw you a life preserver? Never fear, rescue is at hand. Join Marcel Gagne on his Linux Walkabout, as he introduces you to your Linux system's new best friend: the SpamAssassin.
From the author of

But I don't like Spam!

To those of you who, upon reading the title of this article, share a mix of anger and Monty Python nostalgia all rolled into one, count me among your numbers. For all those others who have never seen the famous Monty Python Spam sketch, or have never eaten the spiced ham luncheon meat, Spam for you is simply unsolicited email. Incidentally, the term Spam, when referring to unsolicited email, was actually coined from the Monty Python sketch rather than from Hormel's meat product.

More and more, Spam is robbing us of our productivity, forcing us to wade through increasingly large numbers of unwanted junk in order to deal with the messages that are truly important. I am quite certain that in my zeal to delete my junk email, I have more than once accidentally deleted a valid message. The noise-to-signal ratio is getting far too high.

Just how much Spam are we getting, anyway? Well, let me give you a frightening quote. "Predicted number of spam e-mails per inbox per year by 2006: 1,500." That quote comes from the August 2002 issue of Linux Journal. And according to some figures, Spam already accounts for 36% of all the email we receive. These figures should be enough to make you consider taking drastic measures. Contemplation of just how drastic those measures might be probably had something to do with how the package featured in today's Walkabout got its name. Justin Mason's SpamAssassin is like saying "NO" to Spam in a big way.

Aside from a great name that kind of sums up how many of us feel about Spam, just what is SpamAssassin? Simply put, it is a mail filter that attempts to identify spam using text analysis and several Internet-based realtime blacklists. SpamAssassin doesn't actually delete mail—instead, it marks it for easy identification to then be filtered into a special folder (you don't want to automatically delete messages that might be genuine.) When you have some free time, have a quick look at the collection of messages and quickly delete what you don't need.

I've been running SpamAssassin on my system for several weeks now, and I must say that I am very impressed. The project's website claims 99.94% accuracy in identifying Spam. I'm not sure if it is quite that high, but I would certainly agree to 95% accuracy. To start getting some relief from Spam, start by taking a little walkabout of your own to http://www.spamassassin.org, where you'll find the latest source distribution. Since this is all Perl code, building the software does require that you have Perl installed on your system. All we have to do now is build SpamAssassin, which thankfully, is frightfully easy.

   tar -xzvf Mail-SpamAssassin-2.31.tar.gz
   cd spamassassin-2.31
   perl Makefile.PL
   make
   su -c "make install"

As the above installation instructions imply, this is all Perl code, and as such may require some prerequisites. The most significant of these is the Net::DNS set of modules. On my system, I also found that I needed to install the Time::HiRes module as well. Unfortunately, those packages may also have some prerequisites that you need. The easiest way to deal with this mess is to use perl CPAN shell.

  perl -MCPAN -e shell

Upon issuing this command (you should be doing this as root, by the way), you'll be at a cpan> prompt. One by one, enter the following commands. After each line, the cpan> prompt will return, waiting for you to enter the next command.

  o conf prerequisites_policy ask
  install Net::DNS
  quit

Some of you may have already wondered whether you could just do the same thing with the whole SpamAssassin install since it too, is a perl module. Well done! Go to the front of the class. Just remember that to get the latest and greatest, you should still visit the SpamAssassin website. At the cpan> prompt, you can just type these commands.

  o conf prerequisites_policy ask
  install Mail::SpamAssassin
  quit

With the installation complete, I needed to test it with a spam email message. A quick look at my /var/spool/mail/marcel file (my inbox) showed that one had just arrived. Well, well, well...I copied the message (your basic Nigerian billions of dollars scam spam) to a temp file.

  cp /tmp/spam.test

Then, I ran SpamAssassin against it.

  spamassassin -t < /tmp/spam.test > /tmp/spam.out

The -t tells SpamAssassin to run in test mode. That means it won't do anything in terms of your mailbox and simply directs the output to a file. You can also choose to leave out the redirect ("> /tmp/spam.out") and the whole thing will be displayed to your screen.

Have a look at the output and you'll discover some new and interesting mail headers.

  X-Spam-Status: Yes, hits=13.8 required=5.0
    tests=DONT_DELETE,DOUBLE_CAPSWORD,
          BILLION_DOLLARS,US_DOLLARS_3,
          US_DOLLARS,US_DOLLARS_2,LINES_OF_YELLING,
          SUBJ_ALL_CAPS,MISSING_HEADERS
    version=2.31
  X-Spam-Flag: YES
  X-Spam-Level: *************

If you don't happen to have a handy piece of spam, you can use the sample-spam.txt file included in the distribution. Now that you are comfortable with what is happening, how do you get SpamAssassin to start protecting your sanity? In other words, how do we make this real? Dealing with messages on a one-off basis is more trouble than it's worth. You want your whole network protected.

On a number of Linux distributions with a standard Sendmail install, you will find that procmail has been installed as well. Its job it to make it possible to pre-process mail coming into the system or individual mailboxes. The system-wide procmail configuration file is /etc/procmailrc. The user's home directory ($HOME) may also have its own configuration file, called .procmailrc. The file will be read and the rules within it applied as messages come in. By default, mail winds up in your inbox unaltered. These rules may be a set of instructions to deliver mail to pre-defined folders, redirect it to another mailbox based on the subject line, or any number of things. The procmail configuration file is where all this magic takes place.

To process incoming mail through SpamAssassin, we need to add a couple of simple lines to our procmailrc file (or the local ~/.procmailrc).

  :0fw
  | usr/bin/spamassassin -P

That is all there is to it. When messages come in, they are tagged and identified with a highly noticeable *****SPAM***** before the message. You can then use your email client's filter rules to automatically move these messages into a folder for later analysis. On my Kmail setup, I have a rule that automatically moves Spam into a folder called caughtspam (see Figure 1). Every few days, I take a rapid fire tour through the folder, happily scanning for valid messages (almost never happens) and deleting the offenders.

Figure 1Figure 1 Creating an email filter rule with Kmail.

Of course, some things that look like spam may not be. There are some things that you asked for and that you wanted to get. For instance, I get daily news updates from a variety of tech info sites. Along with the stories, these emails do contain some product advertisements. Since I still want to see them despite their commercial content, I must tell SpamAssassin to ignore them. In the /etc/mail/spamassassin directory, you'll find a file called local.cf, where you can do just that. By default, there is nothing but a couple of comments in the file. To flag an address as being from a "good" site as opposed to a "bad" site, use the whitelist_from parameter.

  # Add your own customisations to this file.
  # See 'man Mail::SpamAssassin::Conf'
  # for details of what can be tweaked.
  #
  whitelist_from     @protected_news_site.com
  whitelist_from     someuser@good_news_site.com

When SpamAssassin goes through your mail to decide whether something is Spam or not, it assigns a score to each item it finds and adds up the total. The default is to declare something as Spam if it hits a score of 5. I found this to be a little low (too much non-Spam being caught), and changed mine to 7. This is something you may wish to experiment with. This is the required_hits parameter, also from the local.cf file you modified above.

Within a few days, you will have pretty much taken care of all the messages that you really do want coming through. You can then relax and once again get used to that wonderful feeling of an inbox filled with messages from people you actually wanted to hear from. It is a great feeling.

Next time on the Walkabout, I'm going to take you into the hinterlands; the backwaters; the deepest, darkest corners of the Linux universe. While it may not be quite that scary, it's best not to take any chances. Better stock up on that programmer food, just in case.

Until next time, I bid you great Linux adventures!

  • + Share This
  • 🔖 Save To Your Account