[conspire] and more, etc.: disk: one of those "fix" stories

Sat Jan 18 17:23:10 PST 2020

Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):

> SMART data - good to look at that once in a while, etc.  It won't
> necessarily tell you when your drive will likely fail soon, but
> sometimes it will effectively indicate that ... sometimes will also tell
> you that hey, despite the drive testing fine end-to-end with repeated
> r/w testing, your drive is headed for serious trouble and likely to hard
> fail on you at about any time and without further advance warning

A few words about that will be helpful for two groups of readers: 
the large majority who've never heard of SMART or the smartmontools
package (furnishing the smartctl utility and the smartd daemon).
Quoting Debian's description for smartmontools:

  control and monitor storage systems using S.M.A.R.T.

  The smartmontools package contains two utility programs (smartctl and
  smartd) to control and monitor storage systems using the
  Self-Monitoring, Analysis and Reporting Technology System (S.M.A.R.T.)
  built into most modern ATA and SCSI hard disks. It is derived from the
  smartsuite package, and includes support for ATA/ATAPI-5 disks. It
  should run on any modern Linux system.

Starting in the late 1990s HD manufacturers made the electronics in drives
(HDDs, and now also SSDs and similar) collect, store, and report in a
standard format data about the condition of the underlying physical
drive.  Querying and interpreting that data can help a host
administrator spot patterns of progressive drive problems before they
snowball into drive failure or data corruption.

Not perfectly.  There has been, over the years, some indication of drive
manufacturers making their drives' reporting to the SMART layer be an
unrealistically rosy picture of the drives' health, to make the company
look good relative to competitors.  Be that as it may, SMART data is
massively better than nothing.

https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis,_and_Reporting_Technology

Short of doing ongoing data scrubbing
(https://en.wikipedia.org/wiki/Data_scrubbing), it's your best indicator
of impending drive problems.  _But_ it's important to know that it's up
to you as a computer owner to acquire and run software to check, and
present to you for your interpretation, the SMART data gleaned from your
drives.  Otherwise, it just gets logged to non-volatile storage in the
drive electronics and never brought to your attention for any reason.

And that's where smartmontools comes in.
https://en.wikipedia.org/wiki/Smartmontools
Here's a good 2004 run-through by the maintainer, Prof. Bruce Allen:
https://www.linuxjournal.com/article/6983

> Someone gave it to me (as a failed drive) ... tests out "perfectly
> fine" ... until I look at the SMART data ... then it's a very scary
> looking drive). ... that drive is thus far still available if someone
> wants it - have a look at:
> https://www.wiki.balug.org/wiki/doku.php?id=balug:offered_wanted_hardware_etc

As a cautionary tale about how free sometimes is too costly?  ;->

You've done due diligence about the risks in the right-hand column for
this Seagate 1TB drive, so anyone who uses it has been warned, but some
hardware sees its best and highest use as landfill, and that's where 
I'd shuffle this one off to, myself.

> So, how 'bout that automagic fixing/remapping of hard (spinning rust)
> drives?

Both good and sometimes disconcerting.  I've been meaning to circle back
to an antique subject, the much-debated matter of
pseudo-low-level-formatting.  Ages ago with long-vanished pre-ATA drive
attachment technology, particularly in the SCSI world, you could use
routines, e.g., ones built into a SCSI controller, to lay down all new
tracks on a hard drive ('low-level formatting' = LLF), sometimes curing
nagging bad-sector problems that weren't really bad sectors at all.  The
notion of doing this persisted in users' minds, even as drive interfaces
and drive-integrated electronics changed / became more complex, and the
addressing of physical tracks became mediated through translation
layers, making it not really possible any more for multiple reasons
including the 'cylinders, heads, and sectors per track' presented to the
operating system often bearing no resemblance to the one at the drive's
physical layer (as you stress elsewhere in your post).

But because technical users still were interested in getting as close to
that old model as the newer drives supported, many HD manufacturers still put
something _like_ the old LLF routines into their (proprietary,
binary-only, secret-sauce) hard drive diagnostic and repair utilities --
albeit the details about what the hell they actually did became more and
more vague as time passed.

I've been meaning to circle back and investigate the state of such
things, but just haven't had time and opportunity.  Perhaps it's madness
to spend much time and energy on such matters, it being a better value
proposition to just trash suspect / failing drives and move on to newer,
larger, quieter, cooler, faster replacements.

> 5th and subsequent drives/LUNs - DO NOT PARTITION! :-)

In such a use-case, I could also make an argument that such physical
drives ought to remain powered down until you have a use for them 
-- because powered-on drives emit heat (and suck power from the wall),
and (in the case of HDs) suffer wear.

You might be able to send the individual drives instructions to spin 
down, as a halfway measure.  I'd have to investigate how.

> Anyway, some folks don't like complexity.  Well, I'd quite argue
> sometimes, appropriately used / sprinkled about, it's well worth it.
> E.g. in this particular case, it made it quite easy to isolate the
> issue, move the data to elsewhere on disk, and fix the issue in the
> partition.

The LVM abstraction layer facilitated one-stop moving of emperiled data
using pvmove, which was certainly convenient and of some benefit.  The
people who 'don't like complexity' are assigning that complexity an
implicit _cost_, and then it becomes a matter of individual judgement 
as to how high one values the benefit of pvmove, and how high one values
the cost of added system complexity from interposing an LVM abstraction
layer between you and your filesystems.

As a very, very general rule, avoidable system complexity makes me
twitchy and makes me worry about emergent effects and about the
possibility of something bad happening because I didn't understand
something fully or forgot something important or neglected some detail
or something was non-obvious from initial examination by a frazzled and
possibly fatigued sysadmin.  I've come to place pretty high value on 
stark simplicity and clarity, where it does enough of what is wanted.

Equally, I put higher faith in what is not merely simple and clear but
also familiar, especially if I'll be dealing with it in critical
situations under some amount of pressure.  Where integrity of large
numbers of files is at issue, and significant downtime is threatened, 
you want to be very clear on what you're doing, what you've done so far,
and where you're going -- and use highly reliable tools and procedures 
in ways that are self-evidently familiar.

Early in the summer of 2001, the Enron summer or rolling blackouts, 
I advance-planned in my head a migration of my ext2-based Debian server
system in Menlo Park, over a projected four hours of weekend downtime,
to the then-best journaled filesystem, XFS.  (Why?  Because workday power 
glitches were repeatedly leaving my server stuck at a manual fsck prompt,
while I was stuck at work, far away.)

I had to make from source code a 2.4.x kernel in order to get an XFS
filesystem driver, set that up as an alternate kernel in LILO, test-boot
it, success.  That was the first step.  Then, get the xfsprogs package,
which includes mkfs.xfs.  Then, schedule a block of downtime over the
following weekend, and:

Boot to single-user maintenance mode.  Make a temporary XFS filesytem in
previously unallocated space, as a temporary holding area for files.
Use rsync (IIRC) to move the contents of one of my filesystems to the
temporary partition.  Take really good notes.  umount the original ext2
filesystem.  Remake it as XFS.  rsync the data back.  Repeat these steps 
for each data-bearing filesystem.  At the end, adjust /etc/fstab and
/etc/lilo.conf to reflect new reality.  Re-run /sbin/lilo -v .  Reboot
into regular operating mode.

There was a little more to it, and I wrote a comtemporaneous account
here:  http://linuxmafia.com/faq/Filesystems/xfs-conversion.html
But the point is:  This was a deliberately _conceptually_ simple
operation, iteratively using highly reliable tools in a dirt-simple way
to move file colllections around as I rebuilt the underlying
filesystems from ext2 to XFS, and then to adjust the boot-up
instructions to match.  Having virtual-disk abstration layers can
certainly, as you say, make it quicker to migrate data around,
add/remove LV 'extents', resizing PVs, and so on -- but at a cost in
system complexity.  Meanwhile, moving data around the old way, by
actually moving it, still works fine, and also has its advantages albeit
it's obviously slower.

And that 'but I didn't leave enough unallocated space' excuse wears
thin, especially when it's usually so easy to temporily hand an extra
drive off a system to provide temporary holding areas, even if it has to
be on (ugh) USB, worst-case.

-- 
Cheers,                              Lost my car phone.
Rick Moen                                -- Matt Watson (@biorhythmist)  
rick at linuxmafia.com                 
McQ! (4x80)