[conspire] The Answer Gang have moved in next door

Tue May 24 14:01:33 PDT 2005

Of possible interest partly because of the better (than my earlier
attemp) explanation of SPF.

The context is that Linux Gazette's (http://linuxgazette.net/) mailing
lists have moved over to my mail server, which has begun virtual-hosting
them as "lists.linuxgazette.net" -- partly because of my machine's
spam-rejection abilities.  The LG mailing list for "The Answer Gang"
(the TAG list) had been overwhelmed by spam to the point of uselessness
at its old host.

Partly, the message below aims to remind reader of the (relocated)
mailing list that I never promised perfection -- especially since the
TAG list, which has to be maximally new-user friendly, can't be set to
hold mail from non-subscribed posting addresses (as this one does).

The latter is one major reason why my mailing lists are generally
spam-free, as y'all will probably have noticed.

Invitation:  Please consider joining The Answer Gang, if only to look in
on people's problems being solved.  ALSO:  Please invite people to send
difficult Linux questions to "tag at lists.linuxgazette.net", and The
Answer Gang will help!

----- Forwarded message from Rick Moen <rick at linuxmafia.com> -----

Date: Tue, 24 May 2005 13:44:26 -0700
To: tag at lists.linuxgazette.net
From: Rick Moen <rick at linuxmafia.com>
Reply-To: The Answer Gang <tag at lists.linuxgazette.net>
Subject: Re: [TAG] Gouranga

Quoting Neateye (nitaigouranga at aol.com):

[three lines of spam]

Just a reminder:  I never promised that the new lists host's MTA setup
was spam-proof, especially given that this mailing list accepts mail
from non-subscribed addresses, for Answer Gang-policy reasons.  The MTA
just refuses _most_ junkmail (spam, virus mail...) on grounds of either
RFC violation or high "spamicity" as measured by Spam Assassin during the
SMTP transaction.

This particular mail got through because it was fully RFC-compliant
_and_ (mostly because of being very short) had low spamicity (SA score
of about 3.6).  Being "RFC-compliant" includes the traits -- very rare 
among spammers -- of using a deliverable sender address, accepting DSNs
and mail to the postmaster & abuse accounts, etc. 

There are three things I can reasonably do (to my knowledge) to improve
matters further:

o   Keep SA on the cutting edge.
o   Keep training SA's bayesian filters.
o   Try (again) to introduce checking of SPF records.

Improving SA:
------------

The machine's SpamAssassin setup is pretty current:

~ $ dpkg -l spamassassin
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name           Version        Description
+++-==============-==============-============================================
ii  spamassassin   3.0.2-1        Perl-based spam filter using text analysis

The cutting-edge Debian package (Debian-unstable branch) appears to be 
spamassassin 3.0.3, so there's not much to be gained there.

The SpamAssassin Wiki houses a potpourri of drop-in filter
"enhancements" people have written: http://wiki.apache.org/spamassassin/ 
(and in particular http://wiki.apache.org/spamassassin/CustomRulesets).
The drawback is that a serious case of "caveat emptor" applies:  You
can't merely assume that J. Random PerlMonkey's filter even works and
avoids blowing up, let alone that he's attempting something sensible.

I may (cautiously) try some of those rules, but I don't like playing
around with a production mail server more than necessary.

I also have not enabled some of the optional checks in my
Exim4/Eximconfig setup (see:  http://www.jcdigita.com/eximconfig/).  
In particular:

o  greylisting (the drawback being the delaying of mail)
o  Flood protection / duplicate message detection / repeat failed
   delivery detection (requires storing and checking hashes of recent
   messages in MySQL)
o  Checking escaped and BASE64 content for spamicity (requires embedded
   Perl support and MySQL)
o  Scanning specifically for viruses (requires ClamAV)

I haven't implemented those because I don't like system complexity to
spiral out of control, and am wary of diminishing returns.

Training the Bayesian Filters:
-----------------------------

SA's bayesian recogniser does fairly complex pattern matching of message
text that looks like prior spam.  To keep this on track, you're supposed
to occasionally feed it mails (or mboxes) you tell it consist solely of
spam, and if possible also mail or mboxes that you tell it consist solely 
of non-spam ("ham").

I've done a bunch of that in the past, but not lately.  Again, one
imagines diminishing returns applies.

(Our example three-line spam would probably have been hard to match,
even with well-trained filters.  There just wasn't much of it.)

[Re-] Implementing Checks of SPF Records:
----------------------------------------

If SPF-checking worked right, it _would_ have rejected this spam.  This
was the string in the prior Received header:

   ps.189.53.236.dial.global.net.uk ([80.189.53.236]:3305)
                                      ^^^^^^^^^^^^^

That IP address is the actual delivering MTA host, and the hostname to
the left is what my machine got, as the corresponding hostname, from a
reverse DNS lookup.  So, the claim that the mail came from
"nitaigouranga at aol.com" was fraudulent (because it didn't come from an
AOL IP address) -- and that's _exactly_ the situation SPF is intended to
fix.

To review, Sender Policy Framework (http://spf.pobox.com) is an addition
to one's DNS records in which you can specify which IP addresses are
_solely_ authorised to be outbound MXes for (deliver mail from) your DNS
domain.  In SMTP terms, that means checking the connecting-socket IP
against the domain declared in the connecting SMTP process's MAIL FROM line
(which in this case would have been "aol.com" as the DNS domain declared in 
"MAIL FROM: Neateye <nitaigouranga at aol.com>").  This type of forgery is
called a "Joe Job", for reasons detailed in the relevant entries on
http://linuxmafia.com/kb/Mail/ .

Domain aol.com publishes such records (so does my linuxmafia.com domain).
You can bet that "80.189.53.236" is _not_ on AOL's authorised list.

There's a hook in Exim4 to have it call out to invoke an SPF-checker Perl
script, during incoming SMTP sessions, and then (configurably) accept,
reject, or accept-but-mark-suspect the incoming mail based on the
results of that check.

Unfortunately, the last time I enabled SPF-checking, there were some
brain-damaged, embarrassing results:  Apparently, at least at that time,
the SPF-checking Perl script (Debian package libmail-spf-query-perl)
_seems_ to have checked the connecting IP against the domain in the
message-internal "From:" header, rather than that in the MAIL FROM (aka
the "return path" address).

E.g., Heather Stern <star at starshine.org> sent mail to the
blw at baylisa.org mailing list that we're both on, which then tried to
send a copy of her mail to my mail server.  My mail server then rejected
the baylisa.org machine's delivery attempt on grounds that that machine
is not an authorised MX IP in the _starshine.org_ domain's SPF records.

No!  Bad SPF-checker.  No biscuit!  (It should have checked the
connecting IP against _baylisa.org_ SPF records, if any.  Wrong domain,
dammit; wrong header to check, entirely.)

That was such awful breakage that I didn't even trust myself to inquire
with the Perl script's maintainer, lest I say something rude.  And I've
been reluctant to try again.

I've actually just inserted a regex into the TAG list's spam headers to
discard all future mail matching ".*nitaigouranga at aol\.com" as sender --
but that's really kinda hopeless, since the "From:" address was
completely fraudulent and could have been anything on earth.

+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy.  The TAG list has also been copied.  Please send
all replies to tag at lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
              Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG at lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag

----- End forwarded message -----