[sf-lug] [on-list] site up, http[s] down: Re: Wierd problems trying to access linuxmafia.com

Rick Moen rick at linuxmafia.com
Tue Dec 11 02:03:26 PST 2018


Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):

> Taking it back on-list, because ... well, why not?  ;-)

Works for me!

> Yes, ... I noticed something of this on Sunday, when I was at
> BerkeleyLUG.  And from what I recalled from being at CABAL on
> Saturday*, just a day earlier, I didn't find this exceedingly
> surprising.  First bit I noticed ... "connection refused" - and
> from that, I then thought hmmm, ... ICMP connection refused,
> host up, nothing listening on TCP port 80 (and/or 443 but I think
> I was mostly using and/or first noticed it on 80).  I seem to
> recall ping looked fine, port 22 was open, ...
> I have ssh login access, so logged in on TCP port 22.
> A trace of looking around ... what TCP ports are listening - and
> to Internet addressable IPs (either explicitly, or wildcard) ...
> yes, 22, ... 80 and 443 not showing (but 8080 was, but I didn't poke
> to see if that was also web server).  And I also noticed 53 (DNS),
> and perhaps others.

[cutting to the chase:  I have the Web server process deliberately down, 
while narrowing in in who's sociopathically grabbing all of my household
bandwidth by behaving badly grabbing excessive bandwidth via the
linuxmafia.com Web server.]

> *Most notably that from the CABAL Wi-Fi, access to The Internet was,
> uhm, "quite slow" ... sure, it's not a high-bandwidth connection,
> but most notably, latency was very high, but packets weren't being
> dropped (at least for the most part).  This is commonly seen on a
> saturated (or nearly saturated) connection - typically queues on the
> sending and/or receiving side of ISP router get filled up, and, though
> throughput is high (or about as high as it can be), latencies are
> very high ... e.g. seeing ping times over 3000ms and to over 5000ms
> (but not up to or over 6000ms).  

Indeed.  And I've basically lowered the boom on that.  *crack*

> >Yeah, you know?  I've had Apache httpd stopped recently because
> >extravagantly large levels of Web requests, which I tentatively
> >guesstimate to be from Web-spidering bots and/or extremely inconsiderate
> >individuals recursively fetching everything on the site, have so
> >thoroughly clobbered my aDSL line that almost nothing else can get
> >through.
> 
> And that, I don't find surprising.

Ye olde 'tragedy of the commons' ([tm] Garrett Hardin  ;->  ).

> And, what the heck, why not, peeking a bit ...

Why thank you!

> First I looked for the largest file in said directory,
> then I stripped it to the IP address and User-Agent string (and with the
> quotes (") around it, as Apache logs it ... just 'cause I was lazy and
> that was simpler and faster:
> :%s/ .*\("[^"]*"\)$/ \1/
> )
>   37799 198.144.195.190 "Mozilla/5.0 (Macintosh; Intel Mac OS X
> 10.12; rv:60.0)

For the record, this is actually Web browsers within my household, IP
addres 198.144.195.190 being a WAP inside my house.

>   22845 66.160.140.183 "The Knowledge AI"
>   20277 66.160.140.182 "The Knowledge AI"
>   14982 46.229.168.71 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>   11990 64.62.252.164 "The Knowledge AI"
>    4323 64.62.252.163 "The Knowledge AI"
>    3770 52.23.177.140 "MauiBot (crawler.feedback+wc at gmail.com)"
>    3763 34.204.61.93 "MauiBot (crawler.feedback+wc at gmail.com)"
>    2604 141.8.143.129 "Mozilla/5.0 (compatible; YandexBot/3.0;
> +http://yandex.co
>    2332 64.62.252.169 "The Knowledge AI"
>    2297 46.229.168.75 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    2033 46.229.168.83 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    2028 46.229.168.78 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    1958 46.229.168.80 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    1937 46.229.168.84 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    1937 46.229.168.81 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    1935 46.229.168.79 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    1923 46.229.168.73 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    1915 46.229.168.82 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    1905 46.229.168.85 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    1897 46.229.168.69 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    1886 46.229.168.66 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem
>    1883 46.229.168.74 "Mozilla/5.0 (compatible; SemrushBot/2~bl;
> +http://www.sem

These asshats are indeed my primary problem, along with some others.
And it's becoming obvious that _some_ of them, at least (predictably) do
not honour robots.txt, which (famously) is advisory only, hence at the
mercy of bots that have no manners.  Michael, you _might_ find it
amusing to track my comments in /var/www/robots.txt about the bots I
decide I have to spank by nullrouting them because they are sociopathic.
(I haven't yet started doing that; perhaps tomorrow.)

When I'm done with it, my http://linuxmafia.com/robots.txt file (for
obvious reasons, not currently publicly accessible) might become a minor
classic of vituperation.  ;->

(I am not committing to that.  Plans are evolving.)




More information about the sf-lug mailing list