[conspire] Antispam heuristics and MTA sites (SMTP servers)

Fri Feb 26 14:08:39 PST 2021

This was in response to an offlist inquiry from someone on the Skeptic
mailing list, but I figured it might be of general interest.

Date: Fri, 26 Feb 2021 10:30:12 -0800
From: Rick Moen <rick at linuxmafia.com>
To: [a subscriber]
Subject: Re: [skeptic] Admin note to GMail users: you missed recent postings

Quoting [the subscriber]:

> Thank you, and holy crap, why is Comcast so damn picky about IPv6?  
> Or am I stumbling into a hornet's nest of stuff?
> 
> Don't worry about responding, I don't want to waste your time with
> under-informed questions...

No worries.  I had a good night's sleep (finally), and don't mind
explaining.

For decades and increasingly, the biggest problem affecting SMTP mail is
the spam one.  And, in consequence, gradually loose practices by SMTP
server administrators that were formerly tolerated have been either
officially or unofficially deemed no longer OK.  The new _lack_ of
tolerance gets implemented by most of your SMTP server peers configuring
their servers to reject your server's mail unless it toes the line on
technical best practices.  That is, you'll find that, when your server
opens an SMTP socket to that other guy's server on port 25/TCP and
carries out an SMTP-protocol conversation to exhange your mail over to
the other guy's, during the middle of the SMTP conversation the far end
spends some time checking your compliance with best practices, and, if
you fail any of them, gives SMTP response '550 Reject' followed by
somewhat less terse justification for the refusal.  

With me so far?  OK, the next step is what are the best practices in
question, and why.  

(1) Your server must not be a public relay.  Back through around 1995, 
it was still a common courtesy to the general public that _my_ server
could offload mail from rick at linuxmafia.com onto almost any other SMTP
server, like, say, UC Berkeley's big 'ucbvax' machine, addressed to
imazer at cruzio.com, and ucbvax would be kind enough to carry out the
onwards delivery to cruzio.com -- called 'relaying'.  This was
super-useful in the old days, e.g., because your machine ('host') might
not have been a full Internet peer able to reach out to anywhere, but
knew how to reach ucbvax, so you could get ucbvax to be your deliveryman
for free.

Spammers loved open relays, for obvious reasons, so this courtesy
service had to die.  After about 1995, if your server was an open relay,
sites running "DNS blocklist" service would discover them and list your
IP as deemed bad for that reason, and more and more other servers
checked the DNS blocklists against all arriving attempts to deliver
mail.  Your IP having a bad blocklist reputation meant you got lots of
"550 Reject" and fewer "200 OK" responses.

(2) Your server must not be on an IP address range reserved for dial-up 
PPP / SLIP connections (in the old days) or other _dynamic_ IP
assignment.  No actual RFC (the Internet standard-definition documents)
ever prohibited siting an SMTP server on such an IP, _but_ the problem
of virus-infected MS-Windows boxes cranking out huge amounts of spam to
remote 25/TCP ports became such a problem that DNS blocklists arose to
catalogue all IP ranges specified by ISPs for dynamic IP assignment, and
advising SMTP servers consulting the blocklist that those IPs have bad
spam reputation and should be refused delivery.  So, refusing mail from
dynamic IP netblocks is one example of an unofficial antispam heuristic.

(3) Your server must accept return mail to the postmaster@ user, return
mail to the abuse@ user, and mail from the null sender.  These
requirements are all stated in the RFCs, but it's characteristic of
spam-broadcasting software (like on those virus-infected Windows
machines0 that it doesn't give a damn about RFC requirements and tries
to cut corners, so checking in real time for RFC compliance is a very 
useful antispam heuristic.  And, yes, there is a DNS blocklist, called
rfc-ignorant, that catalogues IP addresses known to be bad actors in
this area.

(4) Likewise, server must meet various other RFC requirements imposed on
all SMTP hosts.  There's a variety of other sleazes that the spammers
try to pull, like invalid 'pipelining' of many SMTP commands in a single
network packet rather than waiting for error responses and syncrhonising
the SMTP conversation at certain points.  Many SMTP servers detect this 
bad behaviour and cut it off with "550 Reject".  Likewise, the RFCs
require ending the conversation with an explicit "QUIT" directive rather
than just dropping the connection.  SMTP hosts that try that tend to get
their mail dropped or regarded with greater scrutiny and maybe the IP 
added to a list to be autorejected in the future.

Likwise, the RFCs require that delivering SMTP hosts wait for the
receiving system's welcome message before trying to offload mail.  Any
host that doesn't is probably a spammer, so will again be treated with
suspicion or outright refused.

(5) Last, it is extremely common for the receiving system to check 
that the delivering system's IP address reverse-resolves in the public
DNS to some valid fully-qualified domain name.  This is an example of
an unofficial best-practices standard, since no RFC _requires_ that 
SMTP hosts have reverse DNS.  However, because spammers often cannot be
arsed to have one (on, say, the virus-infected home or office Windows
machines they cause to mass-output spam), refusing such mail is a valid
antispam heuristic.

_Now_, as I posted in my Conspire post, I pointedly do _not_ have an
IPv6 address on my linuxmafia.com server, only an old-school IPv4
address, 96.95.217.99.  In December 2019, before my cherished ADSL
provider Raw Bandwidth Communications shut off service and I was forced
to move my server over to the house's other uplink, Comcast Business, 
I opened a trouble ticket with Comcast Business requesting that they
add this reverse DNS entry to the DNS for their IP space:

99.217.95.95.in-addr.arpa  86400  IN  PTR   linuxmafia.com.

That says: 'The IP address 96.95.217.99 shall be deemed to
reverse-resolve to "linuxmafia.com.", and this information may be
validly held in DNS cache for up to a full day (86400 seconds).'

It takes about 24-48 hours for Comcast Business to act on any such
requests, and they have no concept of 'Do this on a priority basis
because my outbound mail will be failing until you do.'  So, I made a
point of getting that PTR (reverse DNS) record in place before the Raw
Bandwidth cutoff date when I would be forced to switch over to Comcast
Business.  Apparently, 'PTR' is from the word 'pointer'.  It is the 
reverse counterpart to the most basic DNS record type, the A record,
which defines forward lookup from a fully qualified hostname to an IP
address.  'A' is doubtless from the word 'address'.

What I did _not_ do, at that time, was ask Comcast Business to create 
a similar PTR record for an IPv6 address on my server, for the simple
reason that I deliberately did not -have- any IPv6 address on my server.

So, it came as a very unpleasant surprise on Wednesday when I found that
GMail had been refusing all mail outbound from my server for about 24
hours on grounds that its IPv6 address failed to reverse-resolve in the
public DNS.  I was a bit stunned, like, _what_ IPv6 address?  I was
quite blown away to check my server and find that it suddenly had one
(following the first restart since December 2019).  There is nothing 
in the server configuration assigning an IPv6 address to ethernet port
eth2 (or to anything else).  There _is_ an explicit directive to bind
96.95.217.99 to eth2 at startup time.

Friends have belatedly told me on the Conspire mailing list that the
numbnuts who designed IPv6 made it autoconfiguring by default.  To 
prevent this from happening requires explicit intervention by the
administrator to say 'No, I don't want my network interfaces
automagically assigned IPv6 addresses, either.'

For the time being, I have taken a more drastic measure that was my
first resort upon discovering the problem:  I explicitly disabled all
IPv6 in the network stack.  As I like to joke, this is applying my
"Facebook remedy":  Friends ask me how I deal with various Facebook
problems, and my ha-ha-only-serious answer is "Simple:  No Facebook; no
facebook problems."  Likwise:  No IPv6; no IPv6 problems.

My friends advise in the longer term against using the big hammer in
this case, and I'll get around to that:  They point out that it's
possible to leave IPv6 functionality enabled in the host's network
stack, but switch off _only_ IPv6 autoconfiguration.  

This is of course smarter.  At the time, I reached for the big hammer 
because it made that problem go away (and I wasn't even aware that IPv6
autoconfiguration was even a thing).