[conspire] 737 MAX story keeps getting more fractally bad

Michael Paoli Michael.Paoli at cal.berkeley.edu
Wed Jul 3 06:05:35 PDT 2019


So, ... how do we "fix" it?  Or more generally, these types of problems.
I tend to (at least some fair bit) look at issues like these as
systemic problems, rather than problem(s) in isolation.
The issues/problems don't just magically pop out'a nowhere.
They're manifested - in certain environments and conditions,
... and chronically so in some environments/conditions.

Also has me thinking too, engineering/development, and
"blameless" environments.  So, ... I think there are at
least some pros and cons too.  "Blameless" can work rather
to quite well, to help well encourage getting the information
out there, and openly exploring what happened/happens, etc.
It ("blameless") tends to reduce the suppression of information.
On the other hand, *no* consequences(?) ... maybe "blameless" can
be taken *too* far(?).  Things would run quite amok on, e.g.
Wall Street - with (far too much) "blameless", eh?
So, what's optimal approach - to quite
sufficiently get the information out there, be able to well investigate,
etc., determine "root" causes and relevant contributing factors.
And also ... appropriate incentives (carrot ... stick?) to
appropriately reduce the problems/errors.  I mean, if there
are *absolutely zero consequences*, then what are the motivations
to prevent the problems?  Oops?  So, ... optimal, ... somewhere
between the extremes?  And what is that optimal point?

E.g. I can think of a $work environment, that included a particular
systems administrator, let's call them $Doe.  $Doe wasn't competent.
$Doe regularly and seriously broke things in production, and not only
did so, but way beyond $Doe's capabilities to correct and fix the damage
done by $Doe - generally necessitating dragging in more to quite sr.
and competent sysadmins to fix the damage done by $Doe.  $Doe was
very repeatedly told how not to break things.  $Doe was well advised,
such as, "You've got a perfectly good $operating_system workstation on
your desk - try it out there first before trying it in production" ...
"again, why did you break it in production yet again, why didn't you
test that on your workstation first - it would've been obvious the damage,
and on your workstation it wouldn't have screwed over production as
you've done yet again."  Anyway, "blameless" ... or ... way too close
to it(?).  The day $Doe was finally terminated from this $work (albeit for
completely unrelated reasons), we all breathed a collective sigh of
relief and said/uttered, "Finally ... about damn time!"

And, ... maybe some parallels can be drawn / questions raised:
White October Events: Who Destroyed Three Mile Island? - Nickolas  
Means | The Lead Developer Austin 2018
https://www.youtube.com/watch?v=1xQeXOz0Ncs
Not that I necessarily agree 100% with the presenter's take/perspective
and arguments, etc., but regardless, a good informative analysis of
decisions and "bad" decisions, ... and ... the environments/history
behind them, and "blameless" investigation and findings.

So ... how do we better prevent these - in many cases systemic - problems?

references/excerpts:

> From: "Rick Moen" <rick at linuxmafia.com>
> Subject: [conspire] 737 MAX story keeps getting more fractally bad
> Date: Tue, 2 Jul 2019 16:40:57 -0700

http://linuxmafia.com/pipermail/conspire/2019-July/009871.html
http://linuxmafia.com/pipermail/conspire/2019-June/009862.html
http://linuxmafia.com/pipermail/conspire/2019-April/009786.html
etc.




More information about the conspire mailing list