Linux Backups mini-FAQ
Hardware
There are several alternatives for backup hardware. In roughly descending order of my preference, they are:
- DDS tape (a/k/a SCSI DAT tape)
- CDR-Writable or CDR-Re-writable media
- DLT and other high-performance tape solutions
- Magneto-Optical
- QiC / Travan tape
- Networked storage
- Removable storage (120 MB Superfloppy, Zip, Jaz, Floppy)
- Ancillary storage (Additional onboard HD storage)
DDS tape
I purchased an HP SureStore 2000 DDS2 drive in October of 1997. It's been used for semi-regular backups of my home system for the past three years. No problems; I can recommend HP Surestore SCSI storage strongly.
In comparing costs of SCSI to other tape drive types, you'll want to take into account both drive and media costs. At a count of about ten media units (tapes), the SCSI became the cheaper option -- ~$9 per 4 GB tape rather than ~$35 for Travan/QIC cartridges. SCSI DAT is also a time-tested and highly dependable technology. Pricewatch lists current costs as HP media at $3/unit, Travan 8 GB media run $20-24. DAT is solid, dependable, proven technology, and the media are cheap and reusable. Just what you're looking for in a backup.
What's nice about SCSI tape is that in three years of admittedly light use -- say 1-8 times/month, I've never had a write error. While B/U's take some time, they only need to be run once. My current backup script hits key system files, announces (via wall) its progress, then rewinds and verifies the tape, and finally rewinds and ejects it. Minimal fuss. I've been backing up more frequently in the past six months or so -- every couple of days if I can help it.
The downside is that tape capacity, relative to today's drive sizes, is limited. The system costing me about $400 new, provides ~4GB compressed storage, which works for me, but you'll have to look at higher capacity tape drives for your 9-40GB disks out now. A comparably priced drive today would have about 10-20 GB capacity, which should suffice for most common HD sizes.
The following pricing data is woefully out of date (try Froogle or NexTag for more current comparisons. The relative pricing should be pretty close though.
Comparative pricing of 4mm SCSI DAT units ----------------------------------------- Capacity Vendor raw/cmpr Cost Media ------------------------------------------- HP 2/4 $120 $3 HP 4/8 $185 $6 HP 4/8 $185 $6 HP 12/24 $603 $11 * HP 20/40 $810 $27 ** ------------------------------------------- Notes: * Other vendor pricing starts at ~$320. ** Other vendor pricing lower. Source: www.pricewatch.com, October, 2000 -------------------------------------------
I'm citing Hewlett Packard largely as they have a good name in quality, both anecdotally and in direct personal experience for work and private use. Sony's 12/24 and 20/40 drives are about half the cost of the equivalent HP drive. At the upper end of the range, tape changers start appearing, with compound capacities into the hundreds of GB.
A brief note: DDS (digital data storage) is frequently refered to as DAT (digital audio tape), a close but not identical cousin. Both are nearly always accessed via a SCSI interface, so you'll hear the term "SCSI tape" as well, though this can refer to other media (DDR, and even Travan). These terminology imprecisions have even been known to apply to me.... ;-)
CDR / CDRW
Of the remaining alternatives, CDR/CDRW and QiC/Travan are probably the most popular in current systems, and might be considered a necessity. CD-RW drives start at about $90, with good branded drives running $100-$160. See Pricing Information for more information.
While I don't have specific experience with CDs, my understanding is that they're sensitive to buffering, and it's often helpful to create an online image file, then cut to media, requiring additional online storage. Media size, at 650 MB, is significantly limited relative to tape -- it would take 31 CDs to match the capacity of one 20 GB tape. There are also reliability issues: CDRWs must be tested before they're considered good, and may not function in all drives.
My recommendation would be to use CDR/CDRW if you have it and are satisfied, but to explore a tape solution if your needs aren't met.
Other Media
DLT and other high-performance tape solutions are more likely to be found in professional or commercial settings. While generally reliable, flexible, and fast, they're beyond the pale for the typical home user.
Magneto-Optical has had a rough life, though it's a fundamentally solid technology. For removable random-access rewriteable storage, it's strongly recommended, though it is both expensive, slower than pure magnetic media, and typically offers less immediately accessible storage -- a 1 GB MO disk has two 512 MB sides.
QiC / Travan tape, as indicated above, is not cost-effective when media counts exceed 10-15 units. SCSI is recommended instead. If you already have such a drive, you don't need to replace it, though you may want to evaluate your storage needs and decide that it may be more effective to so so.
Networked storage is the practice of saving local files on other systems in your local (or remote) network. This can be an effective solution, though you may lack the flexibility and redundancy possible by inexpensive removable media backups.
Removable storage media, typically magnetic, random access (as opposed to serial access, as with tape) media, are generally strongly discouraged for backup of all but the smallest or most sensitive data, on the basis of cost, reliability, and convenience. Traditional 3.5" 1.4 MB floppy disks are actually one of the most expensive storage formats available, in $/MB. L-120 superfloppy and Zip disks are reasonably good ways to store mid-sized archives of ~100 MB, though reliability may be an issue. For full-system backups, they are simply not an option. I simply cannot recommend Jaz disks. The only question in their use has been when, not if, both disks and drives fail. When they do so, you lose data in expensive multiples of 2 GB. I've been through three drives and ten disks in about 18 months of use before I threw in the towel. Winchester storage and removable media are mutually exclusive concepts.
Ancillary storage. Additional onboard HD storage is actually one of the cheapest ways of storing data -- at current IDE disk prices, storage is about $4 per Gigabyte. The solution is fast, flexible, and convenient. What it is not is reliable -- you are limited to a single storage unit, and if storage loss is related to an event directly affecting the system, including loss, theft, physical, or electronic damage, you have no backups. I'd recommend instead looking at a RAID or mirroring solution to provide additional redundancy instead, combined with an offline storage alternative. Removable drives (disk caddys) are another solution which may provide a good mix of cost-effectiveness, reliability, and speed. While more expensive than tape over multiple storage units, ease of access and rate of transfer are attractive.
Pricing Information:
As tech hardware prices are constantly in flux (usually downward), current information is hard to provide in a static document. See a site such as Froogle, NextTag, or Pricewatch, for latest information, or auction/community sites such as Craigslist or eBay.
Software
On this. Unless you have specific requirements to meet (eg: management can't keep from mucking with a technical decision), I'd choose the simplest backup methods possible. My own local backup "system" is just a short shell script which tars and verifies a list of directories.
tar isn't the sexiest thing out there (honey is <g>), but damned if it doesn't work, and if the tools for accessing archives aren't available on every flavor of Unix, and most lesser operating systems, not to mention boot, rescue, and minimal installations of Linux. You will be able to get at your data.
Other general recommendations -- dump, cpio, apio, and rsync. I'd generally avoid using an integrated backup management solution -- far less portable, and you may not be able to get at your data, unless you are part of a large and well-supported organization. You get some pluses -- usually a searchable index or other log of what was archived, but it costs you in terms of flexibility.
There are some advanced alternatives as well. among these are push mirroring the system used by the Debian project to maintain its global mirror repository network, and snapshotting, which can be done when using logical volume manager (LVM).
For an archive of various backup utilities, see: http://linuxmafia.com/pub/linux/backup/
The advantages of various alternatives:
- dump: operating on whole filesystes, it creates a
directory of archives and an access interface. It's possible to navigate
through this when restoring from backup. Advantages include built-in
dump-level specification -- you can create level 0 (full), or 1-9
incremental levels, allowing for fine-grained control of archive
generation. At a given dump-level, all files modified since the most
recent dump of a lower level are archived. Downside: some filesystem
formats aren't supported, you may not be able to access your archives
from another system. Filesystem-oriented backup mode may not meet
particular needs -- you're sacrificing flexibility for
convenience.
- cpio: greater internal consistency and integrity
controls than tar, backward compatibility with tar and
other formats. May handle types of files which aren't supported by
tar Downside: I have to read the man page and
Linux in a Nutshell every time I want to use it. If you
thought tar was nonintuitive, try cpio.
- afio: yet another advanced archive manipulation
utility. Similar in regards to cpio. Downsides: I know even less
of afio than cpio. Nonstandard is not good WRT
backups.
- rsync: less an archival system of itself than a means of duplicating data across directories, filesystems, or networked computers. You'll still want to store data in one of the archival formats listed above, unless you're merely mirroring data. Part of a solution, not a solution of itself.
Note on Arkia Network Backup Security Backdoor
On February 21, 2004, Slashdot noted "Anyone able to connect to TCP port 617 can gain read/write access to the filesystem of any host running the Arkeia agent software. This appears to be an intentional design decision on the part of the Arkeia developers." The vulnerability is apparently a design decision on the part of Arkeia. A full description of the exploit is available at Metasploit.
This sort of issue is unfortunately all too common with proprietary software, and should serve as a cautionary tale for those who would prefer a "professional" solution to "that free stuff".
What to Back Up
The general rules are this:
- You want to back up that which you can't readily restore from other sources.
- You don't want to back up that which you can readily restore from other sources. Or more importantly, should restore from other sources to assure integrity (e.g.: following a system security exploit).
- You don't want to back up that which you aren't interested in preserving.
My own backup script (/usr/local/sbin/system-backups) which I run weekly (or weakly).
Generally speaking, you're not interested in:
- /tmp
- /usr (except for /usr/local)
- bits and pieces of /var
You absolutely want:
- /home
- /etc
- /usr/local
You probably want:
- Bits and pieces of /var
- (probably) /root (Hmmm...I should add that to my script).
- (possibly) /boot
- Other local filesystems outside the FSB.
...the philosophy being that you can reconstruct your distribution from package information (and would probably benefit from an upgrade anyway). You can't recover localized data and system configurations, from a generic image, CD, or net archive.
Protect what's valuable to you.
It might also make sense to create archives of your disk partitions
(fdisk -l /dev/your-device-here
) and related hardware
information.
I like tar because of its universal access -- I can retrieve these archives from any system, anywhere. Not just Linux, not just Unix. Other backup/recover tools offer greater functionality, but generally reduce the flexibility of access.
Other Backup Issues: Compression, Encryption, and Security
There are three fundamental problems with backups. They're time-intensive, they're big, and they're a security risk. Time can be addressed through incremental backups, see the references cited below for several such schemes. I strongly advise against using compression or encryption on entire archives.
Why? Because backups are intended to server for emergency restore capability for your system. Anything you do to your backups which reduces the viability or probability of being able to make a successful restore directly reduces the value of the backups themselves. Yes, your backup tapes are a security liability, and should be subject to appropriate physical security measures. You may want to encrypt individual files on disk or in the archive. However, either encrypting or compressing an archive as a whole greatly increases the probability of a very small data glitch leading to the failure of an entire archive.
Some tools have added safeguards against this -- these are principle benefits of cpio and afio, for example. Most suitable tape drives offer hardware compression. If you need to address archive size and security, do so in a way which does not directly compromise the value of the archives themselves.
A few notes on security. Backups are a possible risk. Tapes, removable media, and networked storage all offer additional access to data. Unsecured physical access to data is the equivalent to full access to all system data. There's also the more subtle issue of backups and data deletion: in the course of restoring files from archives, it's quite possible to unintentionally restore files which had been deleted or moved elsewhere. Because it's impossible for your backup system to know what files have been deleted or moved intentionally or unintentionally, the effects of rm and mv commands cannot be considered by the system. To do so would require full logging of all filesystem actions, integrated with the backup system. Not likely.
When to back it up
Early and often.
There are complex "Tower of Hanoi" backup schedules designed to provide maximum backup coverage while minimizing use of tape and time involved in backups. You can find these documented in a good system administration text (see below for recommendations). For a typical single-user system, periodic full archives on a set of rotated tapes should be reasonably sufficient. My own schedule is to perform full backups once or twice a week.
Restoring from backups
Over the years I've had this FAQ up I'd thought it should be pretty clear, but I do get the occasional "how do I restore from backups?" email.
Simple: your backup is a tape with one or more tar archives. You untar the appropriate archive. Use 'mt' to scan to the appropriate file, and either untar the full file, or a list of files you wish to recover from it.
It's at this point that this method is a bit more cumbersome than a managed backup solution. You don't have an automated index to your backups (though you can log the creation output which can help). So you may have to hunt for the specific file you want.
If, say, you want to recover the file "/home/karsten/.bashrc" from the fourth file on tape, you'd run something like:
mt rewind /dev/nst0 # Remember: scan forward one file less than the one you want mt fsf 3 /dev/nst0 # Remember that tar strips leading slashes: tar xvf /dev/nst0 home/karsten/.bashrc mt rewoffl /dev/nst0
If necessary, reaquaint yourself with the tar man and/or info pages.
Further Reading
Evi Nemeth, Garth Snyder, Trent R. Hein, UNIX System Administration Handbook, Third Edition, Prentice Hall, (c) 2000, ISBN 0-13-020601-6
AEleen Frisch, Essential System Administration, O'Reilly & Associates, (c) 1995, ISBN 1-56592-127-5 http://www.ora.com/catalog/esa2/
M Carling, Stephen Degler, James Dennis, Linux System Administration, New Riders Press, (c) 2000, ISBN 0-56205-934-3
Curtis W. Preston, Unix Backup and Recovery, O'Reilly & Associates, (c) 1999, ISBN 1-56592-642-0 http://www.ora.com/catalog/unixbr/
VA Linux Systems: "How do I copy directory trees / partitions (Ref. #010228-0000).
A discussion of various filesystem/directory archival and data transfer tools.
Sample backup script
My system backup script follows. It backs up a series of directories using tar, verifies the archives, and shouts frequently to all open terminals what's going on.
I'm not saying it's the pinnacle of backup scripts, but it works for me. Typically run to 'batch' a couple times a week, runs unattended for several hours.
The recovery strategy for use with this script will vary on what you're recovering from. To recover a single file or small set of files, load your backup tape, scan to the appropriate archive (file mark), and untar the files, typically to a "restore" directory, not directly into the target directory.
For a full system restore, you'll want to install a base Debian system, pull your package list from archive and restore packages to the current versions over a network or from source media, then restore your archived partitions from backups.
You might also want to, say, test for existence of media with "mt status", testing exit value, and bailing if you've got no tape.
#!/bin/bash # Create backups of /etc, /home, /usr/local, and... PATH=/bin:/usr/bin backupdirs="/etc /root /boot /home /usr/local /var/backups /var/lib \ /var/log /var/www" mt rewind for path in $backupdirs do echo "System backup on $path" | wall tar cf /dev/nst0 $path 1>/dev/null sleep 2 done echo "System backups complete, status: $?" | wall echo "Now verifying system backups" | wall mt rewind for path in $backupdirs do echo "Verifying $path...." | wall if tar tf /dev/nst0 1>/dev/null; then echo "$path: verified" | wall echo "$path: verified" 1>&2 else echo "$path: error(s) in verify" 1>&2 echo "$path: errors in verify" | wall fi mt fsf 1 done mt rewoffl echo "Please remove backup tape" | wall
Acknowledgements
Thanks to Michael C. Toren and Michael Zawrotny for pointing out issues with scripts.
© 2000-2006 Karsten M. Self (kmself@ix.netcom.com)
Written: Saturday October 7, 2000
Last updated 2006/02/03 22:19:00
Distribution terms: To be determined, but leaning generally toward GNU GPL, the GNU Free Documentation License. Most likely not the Open Publication License. Input welcomed.