[conspire] Puzzle: How do you sort IP address lists?

Rick Moen rick at linuxmafia.com
Tue Nov 7 12:16:25 PST 2006


There's a maintenance task I have to do occasionally, that is very much
The Wrong Thing over the long term, but necessary in the sort term:  
I keep a blocklist of IP addresses that my SMTP server shouldn't accept
mail from.   SVLUG's server, on which I'm interim sysadmin, has a list
just like it.  Since I maintain both lists, it's logical to combine
them, run them through "uniq" (to eliminate duplicates), and sort the
result -- to benefit both sites.  

That's where the "puzzle" bit comes in.  But first, why it's The Wrong
Thing:

Security author Marcus J. Ranum has a dictum that "enumerating badness"
is dumb (http://www.ranum.com/security/computer_security/editorials/dumb/):

  Back in the early days of computer security, there were only a
  relatively small number of well-known security holes. That had a lot
  to do with the widespread adoption of "Default Permit" because, when
  there were only 15 well-known ways to hack into a network, it was
  possible to individually examine and think about those 15 attack
  vectors and block them. So security practitioners got into the habit
  of "Enumerating Badness" - listing all the bad things that we know
  about.  Once you list all the badness, then you can put things in
  place to detect it, or block it.

  Why is "Enumerating Badness" a dumb idea? It's a dumb idea because
  sometime around 1992 the amount of Badness in the Internet began to
  vastly outweigh the amount of Goodness. For every harmless,
  legitimate, application, there are dozens or hundreds of pieces of
  malware, worm tests, exploits, or viral code. Examine a typical
  antivirus package and you'll see it knows about 75,000+ viruses that
  might infect your machine. Compare that to the legitimate 30 or so apps
  that I've installed on my machine, and you can see it's rather dumb to
  try to track 75,000 pieces of Badness when even a simpleton could track
  30 pieces of Goodness.  [...]

So, in keeping blocklists of IP addresses that have been zombified and 
used for mass-mailed spam, 419-scammail, etc., I'm aware of doing
something a bit _dumb_.  It's a losing stategy.  I'm doing it on
linuxmafia.com because the site is badly short on RAM and disk space
in the short term (still need to migrate to that VA Linux 2230), and
so software upgrades are deferred.  Similarly, the SVLUG host has a
scarily broken package system, and is therefore to be migrated rather
than worked on in place, as well.  So, we limp by on both machines with
some long-term losing anti-spam methods because they're short-term
palliatives.


Getting back to the puzzle, you'd think that GNU sort would be easily
adaptable to a list like this, right?  Consider this 11-address chunk
of linuxmafia.com's blocklist:

4.3.76.194
8.10.33.176
10.123.189.105
12.30.72.162
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162

Just "sort" as a filter with no options does this:

10.123.189.105
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
12.30.72.162
4.3.76.194
8.10.33.176

Hmm, fine up until the last three lines, but then it becomes apparent
that "sort" is using strict ASCII order.  So, you hit the manpage.
"-n" for "compare according to string numerical value" seems promising, 
as does "-g" for "compare according to general numerical value".  Those 
get you:

4.3.76.194
8.10.33.176
10.123.189.105
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
12.30.72.162

and

4.3.76.194
8.10.33.176
10.123.189.105
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
12.30.72.162

No cigar.

Personally, I played with these things for a while, gave up and switched
to awk, and had the problem mostly solved with a rather ghastly script
when I thought "Wait a second!  That's absurd.  We _should_ be able to
do this using just GNU sort.  If it can't sort IP addresses, what the
hell good is it?"

So, I went back and eventually figured it out -- and I'm wondering if
any other subscriber has either already solved this problem or cares to
take a crack at it.

(I'll also really admire someone's elegant solution in, e.g., Python,
Perl, or Ruby -- but I'm just boggling at how non-obvious my "sort"
solution seems, and want to compare notes.)

-- 
Cheers,
Rick Moen                                    Ita erat quando hic adveni.
rick at linuxmafia.com




More information about the conspire mailing list