Challenge-Response Anti-Spam Systems Considered Harmful

You're probably receiving this because I've received a challenge-response (C-R) message from your mail system. If you're receiving this, that is....

Spam is a growing, heck, exploding problem. No doubt. Regardless, C-R is a flawed tactic, for the following reasons.

0. Weak, and trivially abused, verification basis.

Even where used, C-R systems are readily bypassed by spammers.

The 'FROM:' header of e-mail can be, and routinely is, spoofed. It offers no degree of authentication or evidence of identity.

C-R uses the "From:" header (with implementation-specific variations) as an authentication key. While a given key is going to have a relatively low likelihood of being cleared by a given user, there are keys that will have a high likelihood of being cleared. Off the top of my head, @microsoft.com, @aol.com, @ebay.com, @*.gov, and other major commercial, financial, and governmental institutions, would be likely to be cleared by a large number of users. Similar "social engineering" tactics are already used by spammers.

C-R moves you back to square one of the fact that SMTP can't provide authentication of e-mail headers. At the very least, contextual analysis of headers (as Alan admits) is necessary. If you're already taking this step, heuristic and Bayesian methods are a low-overhead next step, which have proven to be highly effective and accurate.

By contrast, systems that utilize multiple metrics — sender, header integrity, content, context, Bayesian analysis — provide a broader, deeper, richer set of metrics on which to gauge spam. While such filters may incorporate the 'From:' header, they do so in context of additional data for stronger validation.

1. Mistaken interpretation of anti-spam goals

The intent of a practical anti-spam system is not to ensure, at all costs, that no spam should darken the reader's inbox at any cost. If that's the goal, then unplugging your computer is the simplest fix.

At a practical level, the goal is to minimize the amount of spam received, while ensuring no (or the very minimum) of legitimate mail is lost. Inconveniencing spammers is a plus. It is currently possible to achieve rates of a very small handful of spam messages per week via a mix of whitelisting and content-filtering systems, with Bayesian filters attaining very high and accurate rates.

C-R systems in practice achieve an unacceptably high false-positive rate (non-spam treated as spam), and may in fact be highly susceptible to false-negatives (spam treated as non-spam) via spoofing.

2. Misplaced burden.

Effective spam management tools should place the burden either on the spammer, or, at the very least, on the person receiving the benefits of the filtering (the mail recipient). Instead, challenge-response puts the burden on, at best, a person not directly benefitting, and quite likely (read on) a completely innocent party. The one party who should be inconvenienced by spam consequences — the spammer — isn't affected at all.

Worse: C-R may place the burden on third parties either inadvertantly (via spoofed sender spam or virus mail), or deliberately (see Joe Job, below). Such intrusions may even result in subversion of the C-R system out of annoyance. Many recent e-mail viruses spoof the e-mail sender, including Klez, Sobig variants, and others.

There is a positive side to this. C-R system users who blindly send challenges to all incoming mail without validating headers may find themselves added to spamlists. See:

Welcome to spamcop!

To: tmda-users@tmda.net Subject: Welcome to spamcop! From: Lou Hevly <soc@visca.com> Date: Tue, 26 Aug 2003 03:25:04 +0100 Hi folks: Here's a good one; my TMDA challenge is being used as proof that I spam! I got the following virus-accompanied message sent to a TMDA-protected account: [...] And apparently webmaster@glendaleaz.com was not amused, because he reported me to spamcop and now I'm on their list! With the TMDA challenge as proof of my spamming!!

SpamCop reports that this user submitted not once, but twice. And that's only the people who bothered (or were able) to report him.

3. Privacy violation.

A record of our correspondence is being maintained by a third party who has no business knowing of the transaction. Many people will refuse to respond to C-R requests for this reason.

Virtually all C-R systems must be implemented on the mail server — putting them effectively out of the immediate reach of the casual home e-mail user, and putting critical information on the e-mail habits of both yourself and your correspondents in the hands of a third party.

Most of the general discussion (that is, outside this mailing list) has concerned service-model enterprise models in which C-R is provided and hosted by a third-party, which is then acquiring a rather interesting database of communications patterns, which must be maintained on a persistent basis. Not the sort of thing I'd like to have available to an arbitrary subpoena request.

4. Less effective at greater burden than receiver-side whitelisting.

A C-R system is essentially an outsourced whitelist system. The difference between a C-R system and a self-maintained whitelist is that the latter is:

Maintained and controlled by the mail recipient, rather than a third party service provider.
Is the responsibility of the mail recipient, rather than the sender.
Places the burden on the recipient to add new addresses to allow/deny lists.

I might add that I myself use a mix of whitelisting and spam filtering (via SpamAssassin) to filter my own mail with a very high level of accuracy, in terms of true positives, true negatives, false positives, and false negatives. Namely: better than 98% true positive (filtered spam), less than 2% false negative (unfiltered spam), 99.98% true negative (unfiltered non-spam), and less than 0.02% false positive (filtered non-spam). While some C-R proponents claim filtering doesn't work, it clearly does.

5. High type II error (beta).

Because of numerous issues in sender compliance with C-R systems, C-R tends to a high false positive rate. This is known as type II error, in statistical tests, and is denoted by beta.

The mechanics of C-R systems lead to a fairly high probability that users of such systems will find themselves missing an unacceptably high rate of non-spam (AKA "ham") mail, possibly with very high significance (e.g.: client, commercial prospect, or family communications).

In a staggering display of transrational behavior, C-R proponents frequently and vociferously blame this failure of C-R on the unwillingness of bystanders to be drawn into the misguided system.

C-R systems assume all mail to be spam until proven otherwise. A rational system assumes mail to be of unknown quality, until determined to be spam or non-spam. If mail processing can't determine the mail's quality, it is treated as "grey". Such "greymail" generally amounts to a small handful of messages daily, even for heavy mail users, and can be readily evaluated, with whitelists and spam filters trivially updated.

For a description of statistical type II errors, see:

6. Potential "Joe-job" denial of service.

C-R systems can be used intentionally or otherwise in a denial-of-service or "Joe Job" attack on an innocent third party. In fact, this is likely to start happening shortly, as C-R becomes more widespread.

How? Simply: Spammer spoofs a legitimate sending address (this is already commonplace). C-R systems then send out a challenge to this address. With only 1% penetration of C-R, the victim of the C-R/Spam attack is deluged with 100,000 challenge e-mails. This could likely lead to lawsuits or other legal challenges. As an example, one large California university campus e-mail system received over 500,000 copies of Sobig.F, an e-mail-borne virus that spoofs its headers. Had these triggered C-R challenges, the university would have effectively have transmitted a half-million spam mails, to innocent bystanders spoofed by the Sobig.F virus.

C-R thus offers unauthorized access to user and system-level accounts, for the purposes of transmitting mail.

Even in its less severe form, the number of C-R challenges received by users from spoofed mail — spam, viruses, and the like — will likely cause C-R challenges to be viewed as a major annoyance.

7. C-R - C-R deadlock

This is almost funny. While it doesn't affect all C-R systems, there are those that are vulnerable.

How do two C-R system users ever start talking to each other?

User A sends mail to user B. While user B's address is then known to A, user B's C-R server's mail is not.
User B's C-R system sends a challenge to A...
...who intercepts the challenge with A's C-R system, which sends a challenge to user B's C-R system...
Rinse, wash, repeat....

No, I didn't think this one up myself, see Ed Felten's "A Challenging Response to Challenge-Response"

Bypassing this deadlock then opens an obvious loophole for spammers to exploit.

Again, while some C-R systems may avoid this particular pitfall, current experience with vacation responders and spam-notification filters provides strong empirical evidence that a significant number of C-R systems will in fact not get this right.

This and several following issues are often countered with "But a well-designed C-R system won't do that". Unfortunately, there will be, and are, many poorly-designed C-R systems.

One of the early proponents of C-R, Brad Templeton, has written a set of "proper principles" for C-R systems. While one C-R system adheres relatively closely to these, there are many that do not. Such systems will pollute the public awareness by their bad habits. Even so, Templeton fails to consider the issues of Joe Jobs and spoofed 'From:' lines resulting in spurious challenges sent to innocent users.

Even Templeton is aware of C-R's limitations:

C-R may, over time, lose its utility if most spammers try to target it directly. However, it still has several years of life. It can also be combined with other techniques. For example, if you have a good spam filter, you might decide to challenge only messages with high spam scores or other reasons to suspect they are spam, and let through other mail.

Still, Templeton's list does provide a basis for identifying C-R systems that are not merely broken by design, but fail even to offer protections against readily forseable errors.

8. Potential integration into spam e-mail harvest systems.

One commonplace piece of advice for avoiding spam is to not respond to opt-out, AKA e-mail validation testing, requests.

C-R spoofing on the part of spammers would simply hijack a presumption that C-R requests are valid to provide spammers with higher-quality mailing lists. See the current rash of identity theft / CC theft scams based on "updating your account information". This isn't an attack on users of C-R systems, per se, but on those who've become habituated to responding to C-R requests.

One likely consequence is that, as C-R becomes more commonplace, its use as a spam-harvesting system will increase, leading to a reduced response rate to C-R mails as people avoid spam harvesters, and find that most C-R challenges come from spammers....

C-R at best promotes bad personal identity protection practices.

9. Likely consequences: C-R messages and users blacklisted or spamfiltered

The C-R user is likely to find their own address added to blocklists by many users and/or mailing list administrators burned by malformed, or simply unwanted, C-R requests. Simply: people who receive such requests are very likely to just add the sending address, or user corresponding to the request, to their own personal blacklists. This is my own current M.O. with C-R requests, and anecdotal evidence suggests it's a common practice.

This factor is entirely outside the bounds of the C-R system; it is a reflection of the independent response of individuals and organizations to receiving C-R challenges. C-R definitionally cannot accommodate this.

Another possibility is that, due to user consensus, spam filters simply tag C-R messages as spam, either with a direct rule or as a result of Bayesian weighted scoring.

Beyond any semiotic arguments of what spam is or isn't, if the operational reality is that SpamAssassin reflects the opinion of SA users and developers and treats C-R transactions as spam, it is, for all intents, spam.

10. Mailing list burden.

C-R systems typically misfunction on mailing lists in one of two ways, neither of which is acceptable:

The C-R sends a challenge to the list for messages received.
The C-R sends a challenge to each individual list member for the first post received.

In both cases, the burden is placed on a party who couldn't care less about the benefits of the C-R system. Several lists of my acquaintance have taken to permanently banning any users who exhibit use of misconfigured C-R systems.

11. Fails to address techno-economic underpinnings of spam.

Spam exists for one reason: it's profitable.

It's profitable because technology allows the costs of sending a large number of mail messages to be lower than the revenues available from doing so.

Any effective spam remedy must attack one or the other side (or both) of this equation: raise the costs or reduce the technological effectiveness, on the one side, or reduce revenues on the other.

C-R, as with most recipient-side filtering systems, imposes negligible incremental overhead on the spammer. A delivery is made, the spam server moves on, and the cost is a single SMTP connection for a fractional second. Collateral costs are high: for legitimate senders, spoofed reply addresses, mailing lists, and retaliatory actions on the C-R user.

A truly effective spam defense must attack the technical and economic aspects, in as unobtrusive a manner as possible.

The one system that seems to best fit this requirement is the Teergrube — the spam tar-baby. FAQ at:

http://www.iks-jena.de/mitarb/lutz/usenet/teergrube.en.html

A teergrubing mail server costs a spammer multiple SMTP connections, an inherently finite resource, for possibly hours. Workarounds on the part of the spammer are possible, but all result in higher costs, reduced delivery, or both. The net effect is essentially a delivery payment requirement, though the payment is in the form of time and configuration on the part of the spammer. Collateral damage is low — if a teergrube does unintentionally filter a legitimate sender, the only cost is a single (or very small number of) delayed delivery. This and other issues are covered at the FAQ above; read it before posing hypothetical problems.

Hall of Shame

The following are some C-R systems known to behave poorly. Rules for matching the challenge messages are included.

Active Spam Killer (ASK)

Author: Marco Paganini

Identifier: Contains the header: "X-AskVersion: 2.2 (http://www.paganini.net/ask)". A wildcard match on "X-AskVersion" should be sufficient.

Response: reply w/o modifications.

Faults: Sends challenges to innocent third-parties as a result of spoofed headers.

[RM adds: Further writings on the matter from third parties:

http://tardigrade.net/challengeresponse.html
http://static.samspade.org/spamarrest.html
http://www.politechbot.com/p-04746.html
http://spamlinks.port5.com/filter-cr.htm ]