[conspire] Backup

Rick Moen rick at linuxmafia.com
Mon Feb 5 12:59:08 PST 2007


Quoting Roger Chrisman (roger at rogerchrisman.com):

> Is it risky to run your backup drive in the same server it backs up? How 
> risky, technically? (Lets pretend these are SCSI drives and without 
> RAID.)
[...]
> what is your opinion of a small but growing commercial web host that
> relies on backup drives (SCSI in this case and no RAID) that spin in
> the same server they backup, even if they do say they plan to
> implement rsync'ing those to separate servers in a few months?

To understand risk, you need to plot out threat models.  That is, you
need to think like a sysadmin, and imagine all of the various credible
ways in which things can go wrong.  Then, you figure out your recovery
or mitigation strategy to meet those threats.

What are the risks to your data in the "ThePlanet.com" data centre?
With it being a data centre, your physical risks (theft, fire, flood)
are minimal.  The AC power feed is probably relatively clean, so you're
less at risk from surges and spikes than are most of us.

That leaves three chief threats to the data content:  Hardware failure,
security failure, and you.  (Sysadmin error is a major threat to
systems.)

Data loss from catastrophic hardware failure tends to involve either
outright hard drive failure (they're mechanical, after all), runaway
heat problems, itself often but not always caused by mechanical failure 
(fans being mechanical), or PSU failure.  Funny thing:  People spend
huge amounts of money on hard drives or arrays of hard drives, and then
power them with weak, shoddy PSUs -- which are known to often take out
all attached hard drives simultaneously, when the PSUs fail.  Or your
HBA (host-based adapter) circuitry that connects to the drives develops
a short-circuit and fries all attached hard drives as _it_ fails.
That's much less likely, statistically speaking, than the PSU failure
mode (which one sees depressingly often), but has the same consequence:
You're left feeling really foolish having your "safety copy" of the data
clobbered by the same disaster that takes out the primary copy.

Obviously, the same outcome is possible with security compromise or
sysadmin error, for the same reason:  One threat hits both sets of data
through the same method at the same time.

This is why having periodic replicas to a distant machine with no common
security tokens and minimal access for any other purpose are a popular 
safeguard.  This is also why tape backup remains ubiquitous despite many
detractors:  You have a testably valid, complete, catalogued data set
that is physically stable over a period of many years into the future,
exists offline, can be bought in quantity, cheaply, and can be
physically write-protected.

Back in the 1980s when I worked at Blyth Software, I ran two sets of the
weekly full backups:  One went out for a pre-planned stay in offsite
storage at Datasafe in San Francisco.  The other went with a complete
installable set of the backup software under the driver's seat of my
car, which I parked in the shade just far enough from the office
building that, if the building fell over, it wouldn't hit my car.

(Yes, we did actually have plans for how to bring the entire company
back into operation in one day, if the building were somehow destroyed
overnight.)

So, the tapes have the attraction of being physically protectable,
distant in space from threats to the original, and distant in time from
threats to the original (i.e., are periodic snapshots rather than
something kept updated in real time).  That lets you potentially recover
from even a threat that wipes out all of your files that are online at 
some particular moment.  You should ideally look to, if possible, gain
some of the same advantages through whatever data-protection and
recovery regime you adopt.

> My bull---t bell clangs, "Why weren't they doing that originally?" 

I have a different question:  Why have you been relying on your hosting
service to do your data backups for you?  I'd personally not be able to
justify that, for any filesets I care about.

If they screw up at performing that task, you might be able to sue them
for your losses (depending on what they formally promised) -- but that
won't get your data back.





More information about the conspire mailing list