[sf-lug] [on-list] site up, http[s] down: Re: Wierd problems trying to access linuxmafia.com

Rick Moen rick at linuxmafia.com
Tue Dec 11 02:44:40 PST 2018


Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):

> And, what the heck, why not, peeking a bit ...

You might find useful new-ish addition /usr/local/bin/ls-httpd .
Revealed here:

:r /usr/local/bin/ls-httpd


#!/bin/bash
# Usage
# ls-httpd type count
# Eg: 
# ls-httpd url 1000
# will find top URLs in the last 1000 access log entries
# ls-httpd ip 1000
# will find top IPs in the last 1000 access log entries
# ls-httpd agent 1000
# will find top user agents in the last 1000 access log entries

type=$1
length=$2

if [ "$3" == "" ]; then
  log_file="/var/log/apache2/access.log"
else
  log_file="$3"
fi

if [ "$type" = "ip" ]; then
  tail -n $length $log_file | grep -o "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" | sort -n | uniq -c | sort -n
elif [ "$type" = "agent" ]; then
  tail -n $length $log_file | awk -F\" '{print $6}'| sort -n | uniq -c | sort -n
elif [ "$type" = "url" ]; then
  tail -n $length $log_file | awk -F\" '{print $2}'| sort -n | uniq -c | sort -n
fi



'ls-httpd ip 1000000 ./access.log.1' implicates several IP addresses
with no rDNS, plus one that reverses to
'37-9-87-228.spider.yandex.com.', one that reverses to
'wl7.bl.semrush.com.' and then a couple of dozen other *.semrush.com
(SemrushBot) hosts, one that reverses to
'ip-213-127-110-27.ip.prioritytelecom.net.', then 'crawl10.exabot.com.',
and so on (with decreasing IP hits).

Seems like maybe my lowest-hanging fruit is to spank 46.229.168.0 and
46.229.161.0 (the semrush.com IPs), 37.9.87.228 (the Yandex IP), and
-- especially -- the worst-by-far offenders, who don't have reverse DNS
at all (64.62.252.163, 64.62.252.174, 66.160.140.183, 64.62.252.176,
66.160.140.188) via iptables banishment.  

Longer-term, I need to find some more-automated way of throttling, as
playing whack-a-mole for the world's sociopaths doesn't appeal.


Some places on the Web claim that SemrushBot is considered a 'good' bot
that observes robots.txt but 'takes up to two weeks to discover changes
you make to robots.txt'.  Eh, somehow I'm underwhelmed, and am inclined
to spank, considering the bot's behaviour indistinguishable from DoSing.
Ditton Yandex and the five IPs with no rDNS.  Screw 'em.





More information about the sf-lug mailing list