[sf-lug] filesystem for a 3TB external USB drive
rick at linuxmafia.com
Mon Jan 2 16:50:43 PST 2012
Quoting Ian Sidle (ian at iansidle.com):
> Indeed, I've had similar incidents myself. I've also seen a few cases
> where the power supply failed and it spiked the voltage which then
> damaged the controller on the hard disk, preventing it from spinning.
> From the forms that I have read...
> ...about people using ZFS and its error-correcting capability, it
> becomes rather apparent when there is hardware problems (bad disk
> controller, bad memory, etc) because it is able to detect data
> inconsistency while traditionally those errors in the data were
> silently processed and saved back to disk.
Just to stress the point one more time, by far the most common cause of
writing of garbage data isn't _bad_ hardware but rather hardware
(especially HBAs and the circuitry in hard drives) that does random and
erratic things during the fraction of a second after losing power.
> Ironically, this in a way increases the hardware requirements to use
> ZFS "properly" because you want to use ECC memory, which now requires
> a Server/workstation system rather then a mere "desktop" without ecc
> memory. Otherwise, there is the possibility that an error might sneak
> into the RAM, which would get passed to disk and then when the
> information is pulled back up an parity error is detected.
But then there are the times when bad sticks of ECC RAM pass all
conventional tests. Happened to me in 2006 -- two bad sticks of 512MB
ECC RAM out of four, on a high-end server motherboard.
The only obvious sign of the bad RAM was a suspicious pattern of
occasional spontaneous reboots 'and one 'NMI: Dazed and confused but
struggling to continue' console message, suspicious enough for me to
finally run 256 parallel compilation processes of the 2.6.16 Linux
kernel overnight with the console screen blanker disabled, leading to a
freeze-up, and _then_ intermittent POST errors that I was able to
isolate to two of the four sticks by enabling maximal extended-memory
testing and swapping sticks around.
Anyway, ECC memory, even if defect-free, does nothing to address the most
common hardware cause of filesystem corruption.
More information about the sf-lug