[conspire] Risks of automation

Thu Aug 16 15:51:57 PDT 2018

I recently dealt with a perplexing diagnostic problem, which I tackled
stupidly for a few days, and then on account of being suddenly cluebatted 
solved it.  Even though this has little to do with Linux, I think the
story may be of interest.  

This is also a cautionary tale about the downside of automation:
Automation when it fails will cheerfully, quietly, and efficiently kick
us in the shins, every time.

For those who haven't been here, my family lives on a large, 1/3 acre
lot in West Menlo Park, my childhood home.  Since moving back in 2006, 
I've reclaimed much of the lot from wildness, have a large vegetable
garden, and are growing many fruit trees.  As California's drought
developed, though, and we were suddenly put under water rationing a
couple of years back, I realised that our haphazard maze of soaker hoses
would never do.  In a huge rush, I replaced those with a drip irrigation
system.  Cutting corners to save time, I put all of the side and back
yards (except the rear lawn) on a single watering run.  The second
watering run is the front yard, and the third (of three) is the back
lawn.  All three watering runs are turned off and on by standard 24VDC
solenoid valves.

The crowning piece of the system was/is an Arduino-based watering
controller with open-source firmware, OpenSprinkler
(https://opensprinkler.com/).  Arduino boards are very basic, very
low-power computers, not intelligent enough to run Linux, but extremely
suitable for controlling analogue devices such as watering systems (and
many more things).  OpenSprinkler packages an Arduino board in a neat
plastic enclosure with LCD display and control buttons, with connectors
for 24VDC to control standard watering systems.  You communicate from a
real computer to your OpenSprinkler's admin WebUI to set up when and how
much to water on each connected run, and to configure optional features
like one where it gets weather reports from commercial site Weather
Underground and uses that information to lengthen or shorten watering
duration.  It's all really cool -- and invites setting and forgetting.
It's been working beautifully for some years, and among other things 
seemed to end the depressing syndrome of plants dying because a member
of my family promised to water them and then flaked out.

The latest filip added to the system was a rain detector, which is just
a cheap plastic gadget you mount on the edge of your roof that collects
a sample of rainfall when it occurs and alters the state (on or off,
i.e., closed circuit or open) of its long 24VDC wire back to the
controller computer -- signaling either 'it's rained recently' or 'sure
is dry lately'.  This rain detector is a bog-standard Rainbird unit from 
Home Depot.  Specifically:
https://www.homedepot.com/p/Rain-Bird-Wired-Rain-Sensor-CPRSDBEX/203829203

A rain detector accessory helps prevent the absurdity of automated
watering persisting during the November-April rains, so it's A Good Thing 
-- isn't it?  

The above sets the scene:  Automation: yay.   Because consistent
watering gives better plant growth, less accidental plant murdering.

About the end of July, both my mother-in-law Cheryl and I noticed that
two trees in the back yard were suddenly looking shockingly parched, a
fig tree and a calamansi tree.  This was very puzzling because they were
established, several-year-old trees that had been doing well.  Also, in
the side yard just outside the cross-fence, a hydrangea was suffering
badly, and a nearby fern looked as if it might be dead.

I was also vaguely aware that a plant in my front yard was dropping
badly and one in a pot on my front porch (Helichrysum petiolare aka
licorice plant) appeared to be suddenly dead.  These were on a separate
watering run from the others, which would have been a big, fat clue if
I'd stopped to think clearly.  Also, the spearmint plant on the back
porch seemed to be dying back, which seemed really odd, because hardly
anything can bother a mint plant.

Because one of the rules about diagnosis is:  distrust coincidence.
It's much more common for there to be a signle underlying cause of a
problem, rather than two different things suddenly going wrong at the
same time.

_Failing_ to apply that rule, I fell back on Drip Irrigation 101,
concentrating on the distressed or dying plants/trees in the back.  
First, I checked that the drip-irrigation emitters weren't clogged,
which one does by running a manual watering sesion for that run, and
then making sure water is dripping out.  I also started unburying the 
main drip irrigation runs from accumulated leaves and debris, and
checking from end to end, to make sure silt hadn't infiltrated the
watering system and clogged things.  Every time I started manual
watering, water was duly being delivered everywhere.

I also got out a bucket and watered the distressed plants/trees twice a
day, since _something_ was obviously going wrong, and it seemed to
involve shortage of water, maybe, so pouring some around their rootballs
appeared to be a good idea.

Cheryl urged me to double the watering times everywhere.  I resisted
this notion for a couple of reasons.  (1) Cheryl notoriously always
wants to increase the water to any plant in any kind of distress without
bothering to determine whether that is useful.  (2) Finally, I started
applying systems-level logic, what is crucial to diagnosis:  _Why_ would
there be a sudden need for a lot more water?  This summer wasn't hotter
and drier than last year's.  If anything, it was less so.

(Around this time, Cheryl, to give hear credit, pointed out that the
rear yard's lawn was looking parched.  I shrugged at the time, replying
that it was normal in California for a not-very-good antique lawn to
look bad in the hot days of summer, and that I didn't want to waste
water on this one looking green and lush, going into August.)

Suddenly, I thought about the larger problem:  If not a lot-wide summer
need for greater water, then what would account for all this?  Any
solution would need to explain problems on two separate watering runs, 
three if you include the lawn.  I wasn't coming up with a qualifying
hypothesis.  The fact that manual watering sessions consistently worked
fine seemed to create a baffling set of facts.  The notion of _manual_ 
watering working great, but _automatic_ watering being disabled in error
didn't occur to me.  After all, automation yay.

I was stumped for several more days.  Every time I tested the watering
system (with a manual session), it worked.  So, it was fine -- wasn't
it?

Then, I was logging into my OpenSprinkler admin webUI yet again, and
this time I looked more closely.  On the main screen was a small
ribbon banner, red, with the text 'Rain detected'.  Um, whoops?  In
early August?  Near Stanford?  Something was very wrong.

I tilted the rain detector gadget down.  Had it somehow, in early
August, gotten clogged with something wet?  Nope.  Dry.

Refreshing my memory about such things, I remembered that rain detector
widgets work in either of two ways:  Either detected rain causes them to
_open_ the 24VDC circuit (no current allowed to flow) or _closes_ the
circut (current flows).  The widget's package specifies which mode of
operation applies, and when you connect it to the controller (such as my
Arduino-based OpenSprinkler), you configure the latter to know whether
the detector's going to say 'open' for rain or 'closed' for rain.

Before really thinking that through, I experimentally disconnected the
rain detector from its terminal on OpenSprinkler, and rebooted
OpenSprinkler.  From my laptop, I logged back into the admin webUI.
The red 'Rain detected' banner was still there.  WTF?  Oh, right.  This
rain detector was one of the 'set an open circuit to indicate recent
rain' variety.  Therefore, if you simply disconnect the rain detector,
but don't also update OpenSprinkler's settings to say 'You no longer
have a rain detector', naturally OpenSprinkler will think it's raining,
because nothing connected == open circuit.

I disabled the OpenSprinkler setting for 'You have a rain detector'.
The red 'Rain detected' warning banner went away.

Damn, I thought:  Obviously, the rain detector's circuitry failed 
in such a way that it got stuck showing an open circuit, the device's 
corpse inadvertently lying and saying 'It's still raining.'  Wow, that's
the worst possible hardware failure mode.  I should have bought the
other type of rain detector, I thought.  But then, I realised that
_either_ type of detector could die in a way that makes it false claim
it's always raining.  The other type could die in a permanently
closed-circuit position.

The problem with reliable operation is that you tend to rely on it.
For a long time, I'd had no reason to login to the admin webUI:  Why
should I?  The weather-responding automated watering was working
beautifilly.  The parts of the system I _did_ spend time worrying about
were the mechanical bits: leaks cause by rodents' knawing, clogged or
detached drip emitters, places I'd stupidly nicked a water feed pipe
with a shove, that sort of thing.

If I _had_ looked in the admin webUI, I'd maybe have noticed the red
thin banner at page bottom -- or looked at the device logs and seen that
sessions kept being skipped in late July on account of rain.  But I
hadn't, because automation yay.

The losses: 

o  We're pretty sure the calamansi tree's a goner -- the worst of this
   calamity.
o  Same with the fern.
o  And the licorice plant on the front porch pretty clearly isn't coming
   back, even though that species is drought tolerant.
o  The other plant in the front yard (which is planted in the ground) 
   recovered.
o  The fig tree is showing ongoing small signs of life, e.g., the buds 
   at the ends of branches are active, even though the tree lost all
   leaves (as did the calamansi).
o  The spearmint is springing back.
o  The lawn looks greener.

Rainbird's $23 + tax junky little plastic rain sensor failing in the
worst possible way at the worst time of year cost me a treasured fruit
tree and a couple of nice plants -- because I relied on it, relied
on the automation, and didn't check.

And I re-learned that, when you're trying to diagnose a problem, you
need to stop and collect all the symptoms, and not be happy with a
candidate explanation unless it can explain them.  All of them.

As with diagnosis of computer hardware or software problems, expertise
is not actually required.  Observation and carefully keeping track _is_,
as is consistent use of logic.  Those alone can pretty much always reach
the right answer.