[conspire] Parititioning revisited briefly

Rick Moen rick at linuxmafia.com
Fri Oct 15 17:17:32 PDT 2010

I'm flattered by Mark claiming my earlier post was useful, but that's
frankly an aspect where I thought it felt short.  I mentioned (rather
vaguely) some of the concerns people sometimes try to address through
partitioning, and gave four brief examples of tidbits from my server --
but I fear that it might have puzzled people more than it enlightened.
Let's come at the problem from a slightly different angle.

Here's /etc/fstab's entries for mass storage, rearranged in disk & cylinder order.

# <file system> <mount point>   <type>  <options>       <dump>  <pass>

## sda is (obviously) the boot drive.  73 GB SCSI.
/dev/sda1       /boot           ext2    defaults        0       2
/dev/sda5       none            swap    sw              0       0
/dev/sda6       /var            ext2    noatime,nodev,nosuid 0       2
/dev/sda7       /               ext3    defaults,errors=remount-ro 0       1
/dev/sda8       /recovery       ext3    defaults        0       2
/dev/sda9       /usr            ext2    nodev,ro        0       2

## sdb and sdc are RAID1 mirrored, except for swap.  Each is 18 GB SCSI.
/dev/md0        /var/www        ext3    nodev,nosuid    0       2   #sdb5,sdc5
/dev/md1        /var/lib        ext3    nodev           0       2   #sdb6,sdc6
/dev/md2        /var/spool      ext3    defaults        0       2   #sdb7,sdc7
/dev/sdb8       none            swap    sw              0       0
/dev/sdc8       none            swap    sw              0       0
/dev/md3        /home           ext3    defaults        0       2   #sdb9,sbc9
/dev/md4        /usr/local      ext3    defaults        0       2   #sdb10,sdc10

Let's talk about the mirror pairs.  Linux's software RAID ('md' =
Multiple Drive driver) is a wonderful thing, that's become extremely
reliable over the years.  md's RAID1 mirror pairs have extreme speed,
simplicity, and reliability going for them.

RAID saves you from having to revert from your last backup if a
data-bearing hard drive fails.  This does _not_ of course protect from
all loss modes, which is one reason why redundancy and backup are
different things.  E.g., the lightning-caused power surge that destroyed 
my 1998-era VA Research model 500 in March 2008 fried everything
including both 9 GB SCSI drives.  If those had been mirrored, it's dead
certain everything still would have been fried.

When I assembled the replacement system, starting around 2006, I was
still pretty poor from dot-bomb underemployment, so I scrounged
dirt-cheap hard drives rather than buying any at realistic prices.  
And, at that time, I had only one spare 73 GB drive (the largest I had),
_but_ I wanted all important data trees to reside on RAID1 filesystems,
so I could enjoy MD redundancy for the first time.

So, I scrounged some more, and my next-largest drives were a pair of 18
GB SCSI drives.  

(Back then, I had little money but lots of free time.  Now, I have
spare funds, but little free time to get things done in.[1]  You can't win.)

It's nice if you can have your entire system RAIDed, but I lacked
sufficient space on the pair of 18 GB drives; thus the 73 GB boot drive.

There used to be a time when having a RAIDed boot drive was a problem, 
and possibly it still is.  My motherboard's BIOS would still try to 
boot from /dev/sda (set first in boot order), and might never get to
trying its mirrored cousin.  And, even if everything's the same on both
drives _and_ the BIOS intelligently falls over from the regular boot
device to its mirror, you still have to ensure that the secondary drive 
has a suitable bootloader configuration that gets updated correctly
every time you update the primary drive's bootloader.

However, even at that, you still win (when one drive of the mirror pair
fails), in that you don't lose everything after your latest backup, 
even if you might suffer a little downtime.

Let's say you have a pair of drives you want to use for RAID1
filesystems, like my sdb/sdc pair.  It's possible that there's more
than one approach, but the one I chose was to use /sbin/fdisk to make
'Linux raid autodetect'-type filesystems (partition type 'fd' hex) for
each slice of sdb and sdc I wanted to raid.  Then, separately, I used
the mdadm (MD administration) tool to pair-up those partitions, thus
creating md0, md1, md2, md3, and md4.

You'll notice that I carefully did _not_ make sdb8 or sdc8, which I 
intended to use for swap, be of type 'Linux raid autodetect'.  Instead, 
I created those in /sbin/fdisk as ordinary type-82 hex swap partitions.
Why?  Because mirroring one's swap space doesn't really make sense.
It's a significant amount of overhead for no gain, and also prevents the 
kernel from optimising swap access performance between those stretches of 

Notice what is and isn't on the RAIDed space:

RAIDed:    /var/www   System HTML and ftp trees
           /var/lib   all data files for Mailman, MySQL, some others
           /var/spool System SMTP spools, NNTP spools, SpamAssassin files
           /usr/local The only part of /usr that isn't packaged software
NonRAIDed: /boot      Obsolete.  See below.
           /var       The rest of /var, which is replaceable.
           /          Everything not in one of the other subtrees.
           /recovery  A maintenance partition with a small Debian inst.
           /usr       Installed files from software packages

The basic objective is to ensure that you will not lose data upon
failure of any single hard drive.  By 'data', here, I mean in particular
data you will miss, i.e., cannot replace easily.  Consider, for example,
the worst-case scenario:  sudden failure of boot drive sda.

Lost are the root directory, the root user's /root tree, the installed
bootloader program, a bunch of dispensible logs, etc., in /var, and all
installed software.  There are also a few other miscellenous things that
get lost along the way, but they're things that change infrequently, so
I make a point of snapshotting them occasionally, and backing them up:

1.  sda's partition table.  To snapshot:  
    fdisk -l /dev/sda > partitions-sda-$(date +%F) 
2.  Package selections.  Technically, these are safely inside
    /var/lib/dpkg/status, but I find it convenient to write out a 
    list in a more-useful format occasionally from the running system.
    To snapshot:
    dpkg --get-selections "*" > selections-$(date +%F)  
3.  Contents of /etc.  Really vital.  To snapshot:
    tar cvzf etc-$(date +%F).tar.gz /etc
4.  CGIs for Apache HTTPd that get, perversely, dumped into /usr/lib/cgi-bin .

To rebuild the system from loss of sda:

1.  Replace failed hard drive with a blank one.
2.  Perform minimal Debian installation from trusted media.  Mount the
    leftover RAID partitions.
3.  Fix /opt:

    cd / 
    mkdir /usr/local/opt  #In this case, not needed because it's on md4
    ln -sf ./usr/local/opt

4.  Reinstall previous packages:

    dpkg --set-selections < selections-[date string]
    apt-get dselect-upgrade

5.  Copy back the /etc snapshots.   

I keep seeing people implement system-backup schemes that include
laboriously backing up copies of the gigs of files under /usr.  What a
waste of time.  Except for /usr/local and /usr/lib/cgi-bin (if appliable)
all of those vast stretch of files are available from software packages,
and thus eminently dispensible.

I mentioned that /boot as a separate filesystem is an anacronism.  That
habit dates back to when BIOSes were incapable of booting from data
files physically located outside the first 1024 logical cylinders of the
boot hard drive.  Making /boot be a separate filesystem at the very
first part of the drive (and thus on the outer tracks) ensured that
nothing within it was outside the 1024-cylinder horizon.  That
constraint is now long gone, and thus so is the only compelling reason
for a separate /boot partition.

[1] Like, for example, currently the physical drive serving up /dev/sdc 
has failed completely, which is not surprising as it was a cheap 18 GB
remaindered drive that Deirdre picked up for $5.  However, I've not
yet had time to rebuild the system and implement replacements.

More information about the conspire mailing list