Who am I?
Programmer, systems/network administrator, freelance tech writer, Free Software advocate, and general technology wonk. I frequently post to discussions on GNU/Linux; Debian; copyright, patent, trademark, and licensing; data processing and analysis; online rights and privacy issues; and related topics. I'm interested in and work on collaborative discussion topics including email, Usenet, weblogs, Wikis, spam, and filtering methods. I've designed, deployed, documented, or contributed to several of these, with more work in process.
What I've Been Doing Lately
(January, 2006)Um. Time seems to have happened and I'm back in Palo Alto, doing Linuxy stuff for XenSource, enjoying the civilized world, and breaking in got new wheels on Skyline and Hwy 1. Need to hit the snow before it's too late while we're at it.
(14 November, 2004) My day job is running a computer lab at a youth center. Admitted GNU/Linux and Free Software bigot that I am, running a bunch of legacy MS Windows boxes grates a bit, not to mention the security and maintenance headaches involved. One of the first modifications I'd made to the systems was to install Mozilla Firefox, and encouraged its use over MSIE. With additional security issues piling up over the summer, I disabled external MSIE access to all but a small number of sites. Experience over seven+ months? A great success, very few access issues, and no browser-based security issues with Firefox.
(19 September 2004) if you found me through the New York Times AdWare article, I've written a technical companion addressing a few points that couldn't be worked into the piece, titled Spyware, Adware, Windows, GNU/Linux, and Software Culture, which you might enjoy reading.
(5 September 2004) Been doing some housecleaning on the site, mostly splitting off bits that got too big. Let me know if it rocks/sucks. Oh, and my desktop's back, happy, happy. Shout out to the good folks at CappuccinoPC for coming through.
I'm getting more involved in GNU/Linux in Education projects (True LIEs I call it), and encourage you to look at Schoolforge for a jumping off point. More materials here soon. I'm doing consulting for local businesses, nonprofits (NPOs / NGOs), and government agencies on Free Software initiatives. More information to be posted here. I'm a big fan of Knoppix, the live bootable GNU/Linux desktop, and burn spools for distribution regularly. I've found it's a huge hit at PC user groups (yeah, the legacy MS Windows side of the house -- surprising numbers of restless natives). I've prepared CD labels for Knoppix (PS) suitable for the Fellows / Neato label stock (also glabels source file).
I moderate the Free Software Law Discussion list (aka fsl-discuss), and strongly encourage people to become informed and involved on civil rights issues online and elsewhere, mostly descibed in my rights page.
Running TWikIWeThey, a TWiki largely focused on technology and Free Software, is another project. Current (June 2003) major interest is the ongoing legal dispute between Caldera (d/b/a SCO) and IBM over GNU/Linux. Updated frequently, contribute if you can. Our interest is broader than just this, and aims to provide general collaboratively-developed technical information on GNU/Linux and Free Software.
I'm actively fighting and tracking spam, and reporting spam statistics. I'm finding that ASNs (autonomous system numbers) and CIDRs are highly useful for aggregating spam activity. I've posted some plots of spam by ASN and cumulative spam by ASN, as well as monthly ASN spam rankings, periodically updated. Online now are current month to date, updated every few days (if I'm consistent), January 2004, February 2004, March 2004, April 2004, May 2004, June 2004, and (July and August are included in September's report) September 2004.
Note that the data here are for one not entirely average Joe's personal ISP-based email address. This reflects my own well-known and long-established email address, the fact that I LART aggressively, and my ISP's own spam filtering, of which I'm only partially aware. There are also some mild overcounting issues in my reports. But the trends should be clear: spam is VERY highly concentrated in a VERY small number of VERY poorly managed networks. For more on ASNs there's the TWikIWeThey Spam by ASN page. I also recommend Joe St. Sauver's University of Oregon website. My recommendation is that you evaluate your own spam email patterns and deterine your own ASN breakdowns for spam/ham (non-spam) volumes. The advantage of ASNs is that even a relatively small spam archive generates very clear and actionable patterns. See the links above for methods -- a reverse-DNS lookup on an IP will give you both ASN and CIDR, using asn.routeviews.org.
My own spam filtering system consists largely of SpamAssassin, with both remote and Bayesian checks enabled, and a locally maintained whitelist of known correspondents. Combined I get better than 95-97% filtering of spam, with < 1% false positives.
If you want to add an X-ASN header to your mail via a procmail rule, I've created the following recipie, procmail-asn-header. Bayesian classifiers such as SpamAssassin and other should pick this up and start classifying ASNs by spamminess, automatically.
I've been investing most far too little of my time on
working on The Gestalt
System, an open source framework for analytic and
reporting software, true to my geek nature. So look there if you
want to be impressed (by my ambition).
Home
mail: karsten@linuxmafia.com
Last updated 2005/08/27 02:40:14

