[conspire] linuxmafia.com disk drives & (md) RAID-1 ... uh oh ; -}

Rick Moen rick at linuxmafia.com
Thu May 23 21:22:10 PDT 2019


Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):

> Hmmm, I thought you(/we/I) had earlier fixed that?

No, I'm kind of caught between the devil and the deep blue sea.  
Here's the dilemma, and feel welcome to tell me that my objective of
caution would be better served by making the opposite decision:

As you may recall, the qty. three single-ended SCSI hard drives were
pulled from my very last VA Linux model 2230 when its very last Intel
L440GX+ 'Lancewood' PIII suddenly died.

Unresolved question:  What killed the motherboard?  What killed some of
its identical predecessors, across a number of different supposedly
identical VA Linux chassises and PSUs?  In fairness, I got extremely
long service out of my leftover VA Linux gear, and probably ought to
just not sweat that question.  But it lingers in the background, as to
the rest of this.

When that happened, before resorting to biting the bullet and doing a
total do-over, you and I tried a thing.  Although I had no more spare
Lancewood, I did have one non-Lancewood PIII box, that old fleabitten
and dubious-looking Rackspace 2U.  So, on a what-the-hell-let's-see
basis, we had the idea of seeing if the Rackspace would boot my SCSI
drives.

The SCSI drives were:
o  one 73GB drive
o  a pair of 18GB drives operated RAID1 using MDraid

Yes, really.  _That_ small, because we're talking about scrounged
hardware that was already obsolete when I scrounged it (like, at the
time, picking up used server-class SCSI hard drives for $5 each because
you've been in survival mode after getting laid off by VA Software
Corporation right into the opening crescendoes of the Dot-Bomb tech
depression -- thank you, Larry!), and in addition, it's now scary-old.

So, there we were, pondering shotgun-marrying my hard drives to the
dubious server, and I was hairy-eyeballing it's particularly dubious
PSU.  And thinking, urp?  What's the nastiest SPoF hardware risk on a
typical server?  Answer: The PSU.  And how do you mitigate that risk?   
Answer:  You assume the PSU is crap and replace it with one that you
know is not crap.

But, for various reasons, this was not done at that time, so caution
lead me to think:  Assume a weak PSU.  (Noted in passing:  Notoriously,
when PSUs fail, e.g. because you tried to draw more current than they
are able to reliably serve, they show an uncanny instinct for burning
out any and all attached hard drives as they spike out.)

OK, so weak PSU.  The way you baby along a weak PSU is to not burden it
with any more attached electronics (drawing current) than absolutely
necessary.  I stared at the PSU, I stared at the three drives, and I
thought:  The smallest possible electrical load while still test-booting
all of my filesystems would be achieved by omitting one of the 18GB
RAID1 drives.  So, that's what we did, connecting two of the three
drives, omitting half of the RAID1 pair.

And it booted.

I did a little dance of joy.  The server was back live.  In due course,
I made triple-sure that I absolutely had good and complete backups, and
have continued to do so, periodicaly, ever since.

At any given time, I _could_ choose to spin down the ratty Rackspace
box, cable in the third drive's data connector and the Molex power
connector, rub my lucky rabbit's foot, take a deep breath, and power the
whole assemblate of string and chewing gum back _up_.

I could do that.  But I'd feel really stupid if, within a day or two,
the unit let its magic smoke out and fried the attached hard drives
because the crap PSU could drive two enterprise-class SCSI drives but
not three of them.

I'm unconvinced that Rackspace ever intended the small El Crappo case to
house three SCSI drives, and therefore uncertain the dubious PSU can
reliably power that many.  I've seen a lot of 2U designs that _are_
intended for three such drives, and this thing doesn't look like it has
the grunt.

Through today, I've guesstimated that the risk of that degraded-RAID
hard drive failing is less severe than the risk of three drives
destroying everything by clobbering the PSU and the PSU shooting
everything else in the foot.  Consider:  If the half-RAID drive dies,
that's annoying, but recovery is then easy and obvious:  I remove the
failed drive, replace it with the long-estranged other drive, boot the
system back up, and update the half-RAID partitions' file contents from
backup.  Done.  All I lose is an hour of fiddling and changes since my
most recent backup.  Provided I do frequent backup-updating, the
loss-exposure is very small and not worth worrying about.

I do without RAID1 continuity-of-operation in the face of _some_ hard
drive failures.  What I gain is substantially reduced risk of total
electronics destruction from an overstressed PSU.

Which choice do _you_ think better?


Personally, I'm disinclined to do hardware changes other than
decommissioning the drives, the box, the whole kit'n'caboodle, 
Because my watch says it's not 1997.



> Are you in need of drive?

Well, yes and no, but mostly no.

I actually own a lot of unused hard drives.  Many of them are
spectacularly better than those three.  But that's asking the wrong
question, really.

The optimal target (among those in my possession) is the pair of Samsung
128GB SSDs I bought for the CompuLab IntensePC.  I just have to get
there.  Getting there is complicated.  Not a complaint, but time talking
about the matter on mailing lists doesn't make that happen sooner, and
arguably does the opposite.




> Do I recall correctly it uses [P]ATA/IDE?

So close, and so wrong.  ;->

Back then, before the convergence of SAS with SATA, I'd have died of
shame if I'd ever been guilty of building a server on PATA.  I used to
work at VA Linux Systems, not Bob's Toy Shop, Pet Obedience School, and
Pet Taxidermy Service.

 
> Oh, and CABAL, Saturday ... I'll probably be there (thus far intending
> to).

Terrific!  I'm still pondering the what-to-cook thing.  Maybe pizza
again, as that's a sure-fire.

> Do also have hardware if you might be interested (even picked up some
> more today I probably need to add to list):
> https://www.wiki.balug.org/wiki/doku.php?id=balug:offered_wanted_hardware_etc
> Holler if there's something offered there I have that you want (and may
> want me to bring to Cabal)

Thanks.  As the kids say, 'I'm good.'

> Anyway, let me know if you need/want assistance, or want me to look
> into the md situation further (and/or correct if feasible).

Probably more useful to spend initial time on the CompuLab.




More information about the conspire mailing list