Who am I?
Programmer, systems/network administrator, freelance tech writer, Free Software advocate, and general technology wonk. I frequently post to discussions on GNU/Linux; Debian; copyright, patent, trademark, and licensing; data processing and analysis; online rights and privacy issues; and related topics. I'm interested in and work on collaborative discussion topics including email, Usenet, weblogs, Wikis, spam, and filtering methods. I've designed, deployed, documented, or contributed to several of these, with more work in process.
What I've Been Doing Lately
(January, 2006)Um. Time seems to have happened and I'm back in Palo Alto, doing Linuxy stuff for XenSource, enjoying the civilized world, and breaking in got new wheels on Skyline and Hwy 1. Need to hit the snow before it's too late while we're at it.
(14 November, 2004) My day job is running a computer lab at a youth center. Admitted GNU/Linux and Free Software bigot that I am, running a bunch of legacy MS Windows boxes grates a bit, not to mention the security and maintenance headaches involved. One of the first modifications I'd made to the systems was to install Mozilla Firefox, and encouraged its use over MSIE. With additional security issues piling up over the summer, I disabled external MSIE access to all but a small number of sites. Experience over seven+ months? A great success, very few access issues, and no browser-based security issues with Firefox.
(19 September 2004) if you found me through the New York Times AdWare article, I've written a technical companion addressing a few points that couldn't be worked into the piece, titled Spyware, Adware, Windows, GNU/Linux, and Software Culture, which you might enjoy reading.
I used to actively fight and track spam, and report spam statistics. Now mostly historical: I found that ASNs (autonomous system numbers) and CIDRs are highly useful for aggregating spam activity. I've posted some plots of spam by ASN and cumulative spam by ASN, as well as monthly ASN spam rankings, periodically updated. Online now are current month to date, updated every few days (if I'm consistent), January 2004, February 2004, March 2004, April 2004, May 2004, June 2004, and (July and August are included in September's report) September 2004.
Note that the data here are for one not entirely average Joe's personal ISP-based email address. This reflects my own well-known and long-established email address, the fact that I LART aggressively, and my ISP's own spam filtering, of which I'm only partially aware. There are also some mild overcounting issues in my reports. But the trends should be clear: spam is VERY highly concentrated in a VERY small number of VERY poorly managed networks. For more on ASNs there's the TWikIWeThey Spam by ASN page. I also recommend Joe St. Sauver's University of Oregon website. My recommendation is that you evaluate your own spam email patterns and deterine your own ASN breakdowns for spam/ham (non-spam) volumes. The advantage of ASNs is that even a relatively small spam archive generates very clear and actionable patterns. See the links above for methods -- a reverse-DNS lookup on an IP will give you both ASN and CIDR, using asn.routeviews.org.
My own spam filtering system consists largely of SpamAssassin, with both remote and Bayesian checks enabled, and a locally maintained whitelist of known correspondents. Combined I get better than 95-97% filtering of spam, with < 1% false positives.
If you want to add an X-ASN header to your mail via a procmail rule, I've created the following recipie, procmail-asn-header. Bayesian classifiers such as SpamAssassin and other should pick this up and start classifying ASNs by spamminess, automatically.
Last updated 2005/08/27 02:40:14