[sf-lug] [on-list] site up, http[s] down: Re: Wierd problems trying to access linuxmafia.com
Rick Moen
rick at linuxmafia.com
Tue Dec 11 02:03:26 PST 2018
Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):
> Taking it back on-list, because ... well, why not? ;-)
Works for me!
> Yes, ... I noticed something of this on Sunday, when I was at
> BerkeleyLUG. And from what I recalled from being at CABAL on
> Saturday*, just a day earlier, I didn't find this exceedingly
> surprising. First bit I noticed ... "connection refused" - and
> from that, I then thought hmmm, ... ICMP connection refused,
> host up, nothing listening on TCP port 80 (and/or 443 but I think
> I was mostly using and/or first noticed it on 80). I seem to
> recall ping looked fine, port 22 was open, ...
> I have ssh login access, so logged in on TCP port 22.
> A trace of looking around ... what TCP ports are listening - and
> to Internet addressable IPs (either explicitly, or wildcard) ...
> yes, 22, ... 80 and 443 not showing (but 8080 was, but I didn't poke
> to see if that was also web server). And I also noticed 53 (DNS),
> and perhaps others.
[cutting to the chase: I have the Web server process deliberately down,
while narrowing in in who's sociopathically grabbing all of my household
bandwidth by behaving badly grabbing excessive bandwidth via the
linuxmafia.com Web server.]
> *Most notably that from the CABAL Wi-Fi, access to The Internet was,
> uhm, "quite slow" ... sure, it's not a high-bandwidth connection,
> but most notably, latency was very high, but packets weren't being
> dropped (at least for the most part). This is commonly seen on a
> saturated (or nearly saturated) connection - typically queues on the
> sending and/or receiving side of ISP router get filled up, and, though
> throughput is high (or about as high as it can be), latencies are
> very high ... e.g. seeing ping times over 3000ms and to over 5000ms
> (but not up to or over 6000ms).
Indeed. And I've basically lowered the boom on that. *crack*
> >Yeah, you know? I've had Apache httpd stopped recently because
> >extravagantly large levels of Web requests, which I tentatively
> >guesstimate to be from Web-spidering bots and/or extremely inconsiderate
> >individuals recursively fetching everything on the site, have so
> >thoroughly clobbered my aDSL line that almost nothing else can get
> >through.
>
> And that, I don't find surprising.
Ye olde 'tragedy of the commons' ([tm] Garrett Hardin ;-> ).
> And, what the heck, why not, peeking a bit ...
Why thank you!
> First I looked for the largest file in said directory,
> then I stripped it to the IP address and User-Agent string (and with the
> quotes (") around it, as Apache logs it ... just 'cause I was lazy and
> that was simpler and faster:
> :%s/ .*\("[^"]*"\)$/ \1/
> )
> 37799 198.144.195.190 "Mozilla/5.0 (Macintosh; Intel Mac OS X
> 10.12; rv:60.0)
For the record, this is actually Web browsers within my household, IP
addres 198.144.195.190 being a WAP inside my house.
> 22845 66.160.140.183 "The Knowledge AI"
> 20277 66.160.140.182 "The Knowledge AI"
> 14982 46.229.168.71 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 11990 64.62.252.164 "The Knowledge AI"
> 4323 64.62.252.163 "The Knowledge AI"
> 3770 52.23.177.140 "MauiBot (crawler.feedback+wc at gmail.com)"
> 3763 34.204.61.93 "MauiBot (crawler.feedback+wc at gmail.com)"
> 2604 141.8.143.129 "Mozilla/5.0 (compatible; YandexBot/3.0;
> +http://yandex.co
> 2332 64.62.252.169 "The Knowledge AI"
> 2297 46.229.168.75 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 2033 46.229.168.83 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 2028 46.229.168.78 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 1958 46.229.168.80 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 1937 46.229.168.84 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 1937 46.229.168.81 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 1935 46.229.168.79 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 1923 46.229.168.73 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 1915 46.229.168.82 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 1905 46.229.168.85 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 1897 46.229.168.69 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 1886 46.229.168.66 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
> 1883 46.229.168.74 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
These asshats are indeed my primary problem, along with some others.
And it's becoming obvious that _some_ of them, at least (predictably) do
not honour robots.txt, which (famously) is advisory only, hence at the
mercy of bots that have no manners. Michael, you _might_ find it
amusing to track my comments in /var/www/robots.txt about the bots I
decide I have to spank by nullrouting them because they are sociopathic.
(I haven't yet started doing that; perhaps tomorrow.)
When I'm done with it, my http://linuxmafia.com/robots.txt file (for
obvious reasons, not currently publicly accessible) might become a minor
classic of vituperation. ;->
(I am not committing to that. Plans are evolving.)
More information about the sf-lug
mailing list