[conspire] Mailing list servers and the spam problem

Tue Feb 24 02:04:55 PST 2015

Scott DuBois, I note with interest and approval your project at EBLUG to 
construct a mailing list server for that group.  That included a working
meeting in Fremont this past Wednesday:  I'd have come over, except 
Deirdre and I were driving down to Los Angeles for SCALE that evening.

You (EBLUG) are using the Postfix MTA, which is a very good piece of
software.  I strongly suspect you will also be using GNU Mailman.

Aside:  Deirdre's made an interesting and worthwhile point for user
groups that are _not_ prepared (or no longer prepared) to host their own
mailing lists:  If you must outsource mailing lists, there's an
alternative that's _much_ better than Google Groups in most ways:
Mailchimp.

o  no advertising
o  free of charge for up to 1000 subscribers
o  slick administrative control panel includes useful functions
   like advance-scheduled mailings (e.g., for meeting reminders).
o  can have multiple admins with access optionally validated by 
   two-factor authentication.

Google Groups broadcasts 'targeted advertising' at subscribers as
additions to the body text of people's mailing list postings, with a net
effect that can only be described as tacky at best.

Also, to the extent I'm willing to help Internet companies spy on me, I
at least want to spread information around thinly.  Google, Inc. needs
to get less tracking information, not more.

On the arguably-mostly-minus side:

o  Mailchimp does a number of things to track subscribers.  All 
   postings are sent as Multipart-Alternative, with the HTML portion
   getting 1x1 pixel 'Web bugs', and all URLs in transmitted messages 
   get transformed, Constant Contact-style, into ones with hash strings
   individual to each recipient.  The (entirely overt) intent is to 
   help the mailing list admin with copious statistics about how many
   (and which) subscribers opened the mails, whether they clicked on
   URLs in them, etc.  Mailchimp is a very effective tool for 
   relatively small marketing campaigns.

The Mailchip v. Google Groups comparison came up because, in revising
BALE, I've notice a _huge_ trend of Bay Area technical groups ceasing
to operate mailing lists, and that's been mostly a rush to Google Groups
(except where groups outsource _everything_ to Meetup, Inc., instead).

I've already put a footnote on BALE for every reference to Meetup,
linking to this information page:
http://linuxmafia.com/faq/Essays/meetup.html  I hope to soon have a
similar one for Google Groups that includes how to join one without
being pushed into using Gmail for it.

End of digression.

Scott, getting back to the EBLUG project, as I warned you in advance, 
antispam is the only actually difficult part.  If you'd used Exim4, you 
could have used the Eximconfig docs as a detailed punchlist.  Since
you're using Postfix, that's not possible, but you _can_ rely heavily on
those docs as a general guide.

I would recommend you do that.

In this posting, I wanted to put the problem in proper context.  Mailing
List Manager (MLM) software is a specialised type of SMTP forwarder.
Other examples of SMTP forwarder:

o  Backup MXes (mail exchangers)
o  Mail aliases implemented using /etc/aliases or similar
o  Autoresponder mailbots
o  SMTP relay hosts

All varieties of SMTP forwarders have taken a pounding, i.e., collateral
damage,  from the spam wars.  Practically anything that receives and
retransmits SMTP traffic has either been deliberately attacked, cast
under suspicion as a spam source (and downmarked by automated spam
checkers), or both.

It is now much, much rarer than previously to have mail relayed by
intermediate hosts on the way to the destination domain -- the one major
exception being mailing lists.  And they're feeling the pinch.  One
reason is header forgery.  All current MLMs rely on a type of header
forgery to work at all.  Consider as an example these selected headers
from a posting transmitted by the main SVLUG list:

 From svlug-bounces+rick=linuxmafia.com at lists.svlug.org Sun Feb 15 06: 8:07 2015
 Envelope-to: rick at linuxmafia.com
 Return-path: <svlug-bounces+rick=linuxmafia.com at lists.svlug.org>
 Delivery-date: Sun, 15 Feb 2015 06:18:07 -0800
 X-SA-Exim-Connect-IP: 157.22.20.227 
 X-SA-Exim-Mail-From: svlug-bounces+rick=linuxmafia.com at lists.svlug.org 
 Received: from mail.svlug.org
        ([157.22.20.227] helo=svlug.org ident=Debian-exim)
        by linuxmafia.com with esmtp (Exim 4.72)
        (envelope-from <svlug-bounces+rick=linuxmafia.com at lists.svlug.org>)
        id 1YN01K-00023y-9D
        for rick at linuxmafia.com; Sun, 15 Feb 2015 06:18:07 -0800
 Received: from localhost ([127.0.0.1]:50589 helo=svlug.org)
        by svlug.org with esmtp (Exim 4.44 #1)
        id 1YMzn4-0003h0-87
        for <rick at linuxmafia.com>; Sun, 15 Feb 2015 06:03:18 -0800
 Received: from linuxmafia.com ([198.144.195.186]:39079)
        by svlug.org with esmtps
        (Cipher TLS-1.0:RSA_AES_256_CBC_SHA:32) (Exim 4.44 #1)
        id 1YMzmY-0003SI-30
        for <svlug at lists.svlug.org>; Sun, 15 Feb 2015 06:02:50 -0800
 Received: from rick by linuxmafia.com with local (Exim 4.72)
        (envelope-from <rick at linuxmafia.com>) id 1YN00m-00023i-KB
        for svlug at lists.svlug.org; Sun, 15 Feb 2015 06:17:28 -0800

 Date: Sun, 15 Feb 2015 06:17:28 -0800
 From: Rick Moen <rick at linuxmafia.com>
 To: svlug at lists.svlug.org
 Subject: Re: [svlug] John Goerzen on Linux losing its way

I put a gap line between the first group which are in a way
pseudo-headers (in that they are not part of the message but rather
external commentary about its handling), and the rest.  Of those, the
first is called the 'envelope header', that being where the receiving
SMTP host stores metadata recorded during the delivering SMTP
conversation.  In this case, the SVLUG MTA told my MTA during the
delivery conversation that the mail was from user
'svlug-bounces+rick=linuxmafia.com at lists.svlug.org'.  This so-called
'From' sender is distinct from the 'From:' sender (note colon) specified
in the _real_ SMTP headers, those shown below the gap.

'Envelope-To" is yet another pseudo-header inserted by my receiving MTA,
and again that is the recipient SVLUG MTA specified to my MTA during the
delivery conversation.  The 'Delivery-Date' pseudo-header tells the exact
timestamp that conversation took place.  The two 'X' headers are just 
tracking information recorded by my MTA, again, about the SMTP delivery 
conversation.  And the 'Received' pseudo-headers are hop-to-hop tracking
information.  These are supposed to be chronological from bottom to top.

My point is that this mail-handling looks suspect to battle-hardened
antispam software.  The internal 'From:' header claims it's sent by
linuxmafia.com, which is IP 198.144.195.18 -- but my MTA received it
from lists.svlug.org (the forwarding host).  Comes across as a forgery.
Forgeries are usually spam; thus, anything that seems like one gets the
hairy eyeball treatment.

Acceptance Policy

In setting up your mailing list host's MTA, you want to be as picky as
possible in what it'll accept vs. what it will say '551 Die spammer die'
to.  (The 5xx SMTP result codes are permfail, 4xx are tempfail, 2xx are 
success, 3xx are 'OK so far', 1xx are connection failure.  There are
also extended SMTP error code numbers like 5.2.1 designed to clarify
the generic three-digit ones by being more specific.)

Let's say you are reading the Eximconfig docs and read that you can
reject an awesome amount of spam with little system burden by making
your MTA immediately reject mail being delivered by an IP address that
refuses to accept return mail to either the null sender or to
postmaster.  This heuristic works really well _because_ spammers
characteristically ignore RFC rules governing what all SMTP hosts are
required to do.  They instead rely on the cheap-retailer's mantra:
volume, volume, volume.  The malware they infest unwary Windows users'
desktops with to build their botnets doesn't accept return mail to the
null sender or postmaster because, if your SMTP port rejects the
rapid-fire blast of spam from botnet hosts, there'll be a thousand other
SMTP ports that will stupidly accept it.

So, you configure Postfix to do a 'callout' check of the delivering IP
to verify that it accepts return mail to null sender and postmaster --
and received spam takes a huge dive, but then there's _one guy_ on your
mailing list whose sending mail servers violate the RFCs and complains
about his mail being rejected.

This is a real ongoing concern at SVLUG, where Mark Weisler's ISP is one
whose outbound MTA hosts are RFC-noncompliant.  I do my best to collect
lists of IP addresses from him to add them to the 'callout' whitelist,
but the ISP keeps changing IP addresses, so, most months, Mark cannot
send the svlug-announce message, most months, because lists.svlug.org
rejects his mail as to spam-suspect.

Anyway, you will be caught in the middle of this.  There is no easy
answer.

In point of fact, I actually had to patch J.P. Boggis's rulesets for
Exim4 because they were _so_ overzealous about detecting sender forgery
that they considered every single mailing list post to fail SPF
validation.

SPF (sender policy framework) is one of several competing schemes to 
programmatically detect forgeries.  In its case, it provides a means for
a receiving SMTP host to verify that the envelope information shows a 
source IP that is a real SMTP issuer for that domain instead of a
third-party IP forging the domain.  However, Boggis included in
Eximconfig some Exim4 canned rules that check _not just_ the envelope
'From' header but also the internal SMTP 'From:' header.  (The way this
scheme works is that a domain owner, like me with linuxmafia.com,
publishes specially formatted DNS records advising the world that
so-and-so IP addresses and no others should be considered legitimate
senders of mail claiming to be from linuxmafia.com -- that mail from any
other IP can fairly be considered a forgery.)

As you will notice from the example headers, the 'From:' header is _not_
where the mailing list is sending the mail from, but rather where the
message composer sent it _to_ to mailing list header from -- which is
why I said all mailing list traffic looks a little suspect.

I informed J.P. that his Exim4 rule for vetting headers against SPF 
DNS records is too strict - that he should _not_ test the 'From:' 
domain.  He argued back that mailing lists shouldn't work that way --
which I guess means he doesn't know that all mailing lists _do_ work
that way and probably always will.

Users Shooting You in the Foot

On most mailing lists, several times a year, some user of one of the
free webmail services (Gmail and Yahoo predominating, but it's mostly
Yahoo) suffers some spammer stealing the credentials of his/her webmail
account and sending out financial-fraud mail (419 spam) and/or regular 
spam, that purports to be from the user and thus will tend to
successfully go out to subscribers (as it's not from an unsubscribed
address).  This is often a consequence of Windows malware infestation.
Those subscribers' antispam nanny software will then complain back to
you, the listadmin.  (If you're really dumb and make the error of
Reply-To munging, those complaints may go to the list address.)

And, in addition to the nanny software, you will also get complaints
from the actual subscribers, who in some cases think you have God-like
ability to prevent spam completely.  The best you can do, of course, is 
to either unsubscribe or set the 'moderated' flag on the user, and try
to politely get his/her attention to the compromise of his/her security.

There was also the interesting incident that happened immediately after
balug.org was moved to Dreamhost.  (This was _before_ Michael Paoli was
involved.)  I noticed that 100% of mail processed through balug.org's
mailing lists was suddenly testing as RFC-noncompliant, specifically
because the domain's sending IP address (at Dreamhost) suddenly was no
longer willing to accept return mail to postmaster at lists.balug.org.  
(As mentioned, key RFCs defining SMTP require that any domain sending
SMTP e-mail accept return mail to several required mailboxes.  These
include postmaster, the null sender, and abuse.)  

The customer who _then_ was paying Dreamhost for balug.org's hosting
replied:  He claimed that there was no way to make Dreamhost accept mail
to postmaster, and thus requested that I whitelist the domain.  I did
so, not wishing in any way to make his life difficult.

About a year later, it emerged that turning on acceptability of mail to
the postmaster@ mailbox was a checkbox item in the customer control
panel at Dreamhost, and the guy just hadn't clicked it.

A lot of this problem space is like that.  People swear up and down to
things that are actually just not true.  They're overwhelmingly not
trying to pull something.  They're just misinformed.

The Blizzard of Other Spam

By contrast, almost all _other_ spam is from forged addreseses with 
no attempt whatsoever to make the forged sender be a valid sending
subscriber on your mailing list(s).  So, _if_ incoming spam successful 
evades your MTA filters and gets to GNU Mailman, addressed to the
advertised mailing list address, it'll land in the Mailman admin queue
and stay there until it ages out.

Obviously, the higher a percentage of spam you're able to
programmatically reject 'ab initio' right at the time of receipt, the
less survives to clog Mailman's admin queue.  Nonetheless, expect a 24x7
ongoing slow blizzard of spam in your queues.  Deal with this by setting
a retention period of maybe 5 days.  Do _not_ leave Mailman at its
per-list default setting of infinitely long queue retention.  That way
lies madness.

You will also quickly pick up the knack of recognising 99.999% certain
spam by its (forged) senders and Subject headers in Mailman's daily
'admin action needed' summary-of-held-mail reminder mails.  You'll skim
those mails, see nothing that has more than the faintest prayer of being
non-spam, and not even be tempted to visit Mailman's administrative
WebUI to view the held messages' body text to be sure.  You will learn
to be sure just from the (forged) sender and Subject header -- and be
right every time.

Don't bother clearing spam out of the held queues, unless you're
visiting the queue for some other reason.  It's a waste of time.  Just
let it age out and disappear.  (Someone who didn't understand this once 
volunteered to help listadmin SVLUG panicked after 1 day and quit. 
She seriously thought she was obliged to visit the WebUI and check
mails all the time.)

Do _not_ bother adding _any_ (forged) spam senders to Mailman's roster
of 'List of non-member addresses whose postings will be immediately held
for moderation.'  Sometimes I check mailing lists third parties
administer on linuxmafia.com and find that roster clogged with entries
like 'candy1436543922 at marketing-triumphs.com'.  Folks, that was a 
_forged sender_.   As in, made up by some script for the occasion.
There's no reason -- zero -- to think that any additional spam would
ever be arriving that claimed to be from that sender, as there's a
near-infinite choice for them to use.  Don't waste your time, and don't
waste my machine's CPU cycles.  

(When I find that sort of junk in Mailman's rules for any list, I
retroactively remove it.)

Here's an amusing bit:  My MTA's rejection of spam is good enough that
most Mailman summary-of-held-mail reminder mails _don't reach me_ -- as
the summaries trigger filters, seeming way too spammy.  About once every
two weeks, SVLUG's Mailman instance decides my 'bounce score' for those
reminder mails is so high that it suspends delivery and advises me I'll
cease getting them in 14 weeks if I don't reconfirm that my mailbox is
still deliverable -- because it perceives each of my MTAs '551 Die
spammer die' rejections as a 'bounce'.

I could prevent this by whitelisting Mailman's sender at the level of
my MTA, but instead just re-enable delivery occasionally.

Anyway, summary:  You as the designer and operator of a machine not only
doing full-service SMTP but also running an MLM are going to get caught
in the crossfire of the spam war.  Expect it.  Be prepared to be
flexible in some cases (whitelisting) but in other cases to draw the
line and say 'no'.  And know that all antispam tricks have both costs
and limits in their effectiveness.

Welcome to the trenches.  Try not to get shot by either side.