[conspire] fine if you aren't noticing symptoms: Re: (forw) [BALUG-Admin] Weekly cron job to check on my domains' nameservers

Michael Paoli michael.paoli at cal.berkeley.edu
Sat Sep 9 10:56:30 PDT 2023


On Fri, Sep 8, 2023 at 11:38 AM Rick Moen <rick at linuxmafia.com> wrote:
> Anyway, that incident reminded me that "Everything must be fine if you
> aren't noticing symptoms" is always a bad idea, and I wrote the

Reminds me of the too often encountered:
Host is dead, or severe I/O problems on filesystem(s).
"Of course" it's production (well, far too often).
Start digging and checking, ah, lovely, all protected with RAID-1 great!
Uhm ... except ... the first drive of the RAID-1 died N months/years
ago, and no monitoring was put in place, or it's been entirely ignored.
And now the only other drive in that RAID-1 pair has failed ...
lovely ... ugh.  Oh well, at least we have backups ... oh ...
nobody ever bothered because ... RAID-1 ... or that started failing
N months/years ago - but nobody's monitoring or the monitoring has been
ignored.

Heck, even rather common occurrence in, e.g. company that occasionally
blows up a neighborhood or causes horrific conflagration, goes
bankrupt multiple times, pleads guilty - as a company - to multiple
felony charges, etc.  Oh, but they care about safety ... uhm ... rather
they care about managing their perceived image of safety.  They put
signs around corporate headquarters telling their workers not to
jaywalk.  Oh, but this is the only place they tell their workers not to
jaywalk, because how would that look, right?  Meanwhile, in production,
they've got, e.g. stuff running on pre-Y2K HP hardware that was EOL
probably around 2007 or much earlier, but in 2014 they're still running
it in production ... but hey, they've got RAID-1 ... oh, but that other
drive died years ago, and no, can't get replacements ... can't get any
of that old hardware serviced or fixed or replaced - just can't find
that old stuff anymore and it's totally unsupported, and there is no
redundant system.  Oh, but they manage those perceptions of safety.
Every single meeting they check and make sure there's someone there
that knows CPR, someone who knows where's the AED is and knows how to
use it, and who will  call 911.  Remind me again how that keeps
the millions of customers safer?  Yes, lovely safety and security.  I
find a system, open to The Internet, of course ... email ... it takes
in email, runs a program on that ... yikes, that's very insecure ... no
input validation nor safe handling ... certain email contents from
anywhere on The Internet can do quite arbitrary things with production.
So of course I duly report the issue.  They don't care.  They tell me
essentially "nobody would ever do that", and they don't and won't have
it fixed.  Whee!  And yes, multiple occurrences of last drive in RAID-1
died there ... "oops"!  As I oft said of that company - I've never seen
IT that screwed up on that large a scale.  One of my coworkers,
upon touring one of their data centers for the first time, "Are we
running a data center here, or a museum?"  Whole helluva lot of deep
deep systematic problems there.  Oh, and to give you the nice warm 'n
fuzzy feeling, that company also operates nuclear power plant(s).
Oh, but security - I can't touch that stuff at all, because I don't have
the clearance ... uhm, ... but the staff there, clueless about operating
system, calls me and has me walk them through their major OS upgrade ...
without any validation checks on what I'm telling 'em to do.  Oh, but
anyone goes 1 MPH over the speed limit at that site, they'll get a very
stern talking to.

"Everything must be fine".



More information about the conspire mailing list