[conspire] SpamAssassin and BAYES_99

Sun Jan 14 08:06:51 PST 2018

On 14Jan2018 06:21AM (-0800), Don Marti wrote:
> https://lwn.net/Articles/173910/
> "SpamAssassin, out of the box, assigns 3.5 points to BAYES_99. Since five
> points are required, by default, to condemn a message, the bayesian filter
> can never do that on its own. 
[...]
> I can sort of see why SpamAssassin would ship with a cautious score for this
> -- you don't know how well the users are going to train the filter.  I don't
> have a problem with training, so I'm thinking I should increase the score
> for BAYES_99 to at least 4.5, which would make the difference on a bunch of
> my current false negatives that made it to the inbox.
> 
> Any other free-range mail server postmasters have data on this?

I took the same rough theory and chucked BAYES_99 up to 9.0 at some
point and never looked back.  But I also have a sort of
communally-trained bayesian db shared across my users:

    1. The act of replying to or saving a message within mutt on
       frotz.zork.net trains the message as non-spam.
    2. Disused or deliberately spam-trappy addresses go into an inbox
       that is automatically trained as spam.

It's kind of funny that I originally had to actually argue with users
never to reply to spam.  It's been a while, but over a decade ago it was
still some people's instinct to try appealing to spammers to stop
sending them things.

The auto-training addresses only get better with time.  Every time I see
a bunch of backscatter horror coming from an address on my mail server,
I make it an alias to the "train on this as spam" bucket.  Usually
within a day or so I see my personal spam filtering improve measurably.

If you ever hosted mailman mailing lists on your domain, you may find
lots of -owner and -bounces addresses being abused by spammers who found
your archives and think they're the cleverest bits of code on the net.
Thank them silently for their contribution!