[conspire] Puzzle: How do you sort IP address lists?
Rick Moen
rick at linuxmafia.com
Tue Nov 7 12:16:25 PST 2006
There's a maintenance task I have to do occasionally, that is very much
The Wrong Thing over the long term, but necessary in the sort term:
I keep a blocklist of IP addresses that my SMTP server shouldn't accept
mail from. SVLUG's server, on which I'm interim sysadmin, has a list
just like it. Since I maintain both lists, it's logical to combine
them, run them through "uniq" (to eliminate duplicates), and sort the
result -- to benefit both sites.
That's where the "puzzle" bit comes in. But first, why it's The Wrong
Thing:
Security author Marcus J. Ranum has a dictum that "enumerating badness"
is dumb (http://www.ranum.com/security/computer_security/editorials/dumb/):
Back in the early days of computer security, there were only a
relatively small number of well-known security holes. That had a lot
to do with the widespread adoption of "Default Permit" because, when
there were only 15 well-known ways to hack into a network, it was
possible to individually examine and think about those 15 attack
vectors and block them. So security practitioners got into the habit
of "Enumerating Badness" - listing all the bad things that we know
about. Once you list all the badness, then you can put things in
place to detect it, or block it.
Why is "Enumerating Badness" a dumb idea? It's a dumb idea because
sometime around 1992 the amount of Badness in the Internet began to
vastly outweigh the amount of Goodness. For every harmless,
legitimate, application, there are dozens or hundreds of pieces of
malware, worm tests, exploits, or viral code. Examine a typical
antivirus package and you'll see it knows about 75,000+ viruses that
might infect your machine. Compare that to the legitimate 30 or so apps
that I've installed on my machine, and you can see it's rather dumb to
try to track 75,000 pieces of Badness when even a simpleton could track
30 pieces of Goodness. [...]
So, in keeping blocklists of IP addresses that have been zombified and
used for mass-mailed spam, 419-scammail, etc., I'm aware of doing
something a bit _dumb_. It's a losing stategy. I'm doing it on
linuxmafia.com because the site is badly short on RAM and disk space
in the short term (still need to migrate to that VA Linux 2230), and
so software upgrades are deferred. Similarly, the SVLUG host has a
scarily broken package system, and is therefore to be migrated rather
than worked on in place, as well. So, we limp by on both machines with
some long-term losing anti-spam methods because they're short-term
palliatives.
Getting back to the puzzle, you'd think that GNU sort would be easily
adaptable to a list like this, right? Consider this 11-address chunk
of linuxmafia.com's blocklist:
4.3.76.194
8.10.33.176
10.123.189.105
12.30.72.162
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
Just "sort" as a filter with no options does this:
10.123.189.105
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
12.30.72.162
4.3.76.194
8.10.33.176
Hmm, fine up until the last three lines, but then it becomes apparent
that "sort" is using strict ASCII order. So, you hit the manpage.
"-n" for "compare according to string numerical value" seems promising,
as does "-g" for "compare according to general numerical value". Those
get you:
4.3.76.194
8.10.33.176
10.123.189.105
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
12.30.72.162
and
4.3.76.194
8.10.33.176
10.123.189.105
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
12.30.72.162
No cigar.
Personally, I played with these things for a while, gave up and switched
to awk, and had the problem mostly solved with a rather ghastly script
when I thought "Wait a second! That's absurd. We _should_ be able to
do this using just GNU sort. If it can't sort IP addresses, what the
hell good is it?"
So, I went back and eventually figured it out -- and I'm wondering if
any other subscriber has either already solved this problem or cares to
take a crack at it.
(I'll also really admire someone's elegant solution in, e.g., Python,
Perl, or Ruby -- but I'm just boggling at how non-obvious my "sort"
solution seems, and want to compare notes.)
--
Cheers,
Rick Moen Ita erat quando hic adveni.
rick at linuxmafia.com
More information about the conspire
mailing list