[conspire] Parititioning revisited briefly

Rick Moen rick at linuxmafia.com
Fri Oct 15 21:19:47 PDT 2010


A brief followup in which I address some afterthoughts, and then rag on
underappreciated volunteers.  ;->  I wrote:

> Note that one must be careful not to shoot legitimate software in the
> foot, in planning these flags.  Notice, for example, that /var/lib
> lacks the 'nosuid' mount flag.  Doubtless, this is because I tried
> that and noticed something that broke.  (Oops.)  But it's difficult to
> know until you try.

At one point in the past, I remember having /tmp as a separate
filesystem (mostly in order to use ext2 for it for performance's
sake).  As part of the process of tightening down the system and adding
mount flags to prohibit file types with no legitimate reason to exist 
in particular subtrees, I tried something like this for /tmp:

   nodev,nosuid

I figured, there's no way anything other than a bit of malware would
ever want special device nodes in /tmp, or need to have the SUID bit in
tmp.  Right?

Wrong.  I believe there was some Debian package that turned out to have a
SUID binary in /tmp as part of the package-installation routines, or
something like that.  I can't remember the exact details, but it was
some surprising breakage, that turned out to be because of a
_legitimate_ need for SUID executables within /tmp.

Now, as Tony Godshall recently reminded us, it's currently very common
to use 'tmpfs' on Linux systems as the filesystem type for /tmp.  tmpfs
is another excellent innovation, introduced with the 2.4 kernel series,
and previously called 'shmfs' for shared memory filesystem.

The idea behind tmpfs is that RAM rather than disk is used as the
backing storage, but (unlike with a RAMdisk), the RAM is allocated
dynamically up to the limit you declare in the related 'size=' option[1]) 
and, when necessary to free up RAM, gets swapped out onto
virtual memory, i.e., swap space.  Thus, you get the speed advantages of
RAM for your tmpfs filesystems, but without needing to actually
_dedicate_ RAM to the purpose, as you do with RAMdisks.

If making a new Linux system, therefore, what you generally do is make
sure there's a generous allocation of swap storage (a good idea
generally anyway, unless you're desparately short of mass storage) and 
then specify it in /etc/fstab as the filesystem type for /tmp.  (If 
the idea of making deliberate entries in /etc/fstab seems startling,
then probably you've been the beneficiary of good distro installers that 
do a decent job of writing that file automatically for you.  Nothing
wrong with that, though it's still a fine idea to revisit that file 
and possibly improve its contents -- not to mention understanding them.)

tmpfs is also used for the /dev/shm virtual filesystem that the kernel 
uses to track POSIX shared memory for the benefit of GNU libc.
http://en.wikipedia.org/wiki/Shared_memory says:

  Recent 2.6 Linux kernel builds have started to offer /dev/shm as
  shared memory in the form of a RAM disk, more specifically as a
  world-writable [clarification needed] directory that is stored in memory
  with a defined limit in /etc/default/tmpfs. /dev/shm support is
  completely optional within the kernel configuration file. It is included
  by default in both Fedora and Ubuntu distributions.

I could easily implement /tmp as a tmpfs filesystem just by running a
simple mount command and adding the matching line in /etc/fstab, which
is what Tony suggested -- and it's a perfectly fine suggestion.  As I
said at the time, the reason I'm not doing that, for now, is that I long
ago got into the habit of using /tmp as temporary working storage.
E.g., if I start editing a text file for some project I expect to 
complete within a week or two, I will most often do it in /tmp.  

Why?  Long experience shows that, if you use your home directory as
scratch storage, you will build up, over the years, a fearsome
collection of junk files in it.  You'll tell yourself you'll eventually
get around to cleaning up your home directory, but it'll never happen.

What's distinctive about /tmp is that, traditionally on Linux systems,
it's periodically tidied up by a cronjob called a tmpreaper script
(or, for better security, a compiled binary dedicated to the job).
Debian systems, by default, have such a utility built in, that cleans
out excessively old and untouched /tmp files at every bootup.  Retention
period gets set in system configuration file /etc/default/rcS.  
Quoting from 'man 5 rcS':

       TMPTIME
         On  boot the files in /tmp will be deleted if their
         modification time is more than TMPTIME days ago.  A 
         value of 0 means that files are removed regardless of age.  
         If you don't want the system to clean /tmp then set  
         TMPTIME  to  a  negative value (e.g., -1) or to the word 
         infinite.

Debian defaults to setting TMPTIME to zero, but as a matter of local
policy I have:

  ## Time files in /tmp are kept in days.
  TMPTIME=30

So, _even if my system reboots_, I and all other local users can keep
our temporary working files in /tmp and know that they'll not get
automatically cleaned out under any circumstances as long as we've
touched them within the last 30 days.

The point is that, if you use tmpfs for /tmp, then the effect is 
exactly like setting TMPTIME to zero:  The tree gets cleaned out 
unconditionally at every boot, because it is volatile storage 
rather than persistent storage.

But, for people _used_ to /tmp getting unconditionally cleared out at
boot time, you really might as well use tmpfs and get improved
performance.


Now for the ragging on underappreciated volunteers bit.  Let's start
with the Debian project, which like most recent distributions has gone
whole-hog for UUID labels for partitions.  Compare my old-school
/etc/fstab snippet...


# <file system> <mount point>   <type>  <options>       <dump>  <pass>

## sda is (obviously) the boot drive.  73 GB SCSI.
/dev/sda1       /boot           ext2    defaults        0       2
/dev/sda5       none            swap    sw              0       0
/dev/sda6       /var            ext2    noatime,nodev,nosuid 0       2
/dev/sda7       /               ext3    defaults,errors=remount-ro 0       1
/dev/sda8       /recovery       ext3    defaults        0       2
/dev/sda9       /usr            ext2    nodev,ro        0       2

## sdb and sdc are RAID1 mirrored, except for swap.  Each is 18 GB SCSI.
/dev/md0        /var/www        ext3    nodev,nosuid    0       2   #sdb5,sdc5
/dev/md1        /var/lib        ext3    nodev           0       2   #sdb6,sdc6
/dev/md2        /var/spool      ext3    defaults        0       2   #sdb7,sdc7
/dev/sdb8       none            swap    sw              0       0
/dev/sdc8       none            swap    sw              0       0
/dev/md3        /home           ext3    defaults        0       2   #sdb9,sbc9
/dev/md4        /usr/local      ext3    defaults        0       2   #sdb10,sdc10


...with the same thing the way Debian recently wants to rewrite my /etc/fstab:


# <file system> <mount point>   <type>  <options>       <dump>  <pass>

## sda is (obviously) the boot drive.  73 GB SCSI.
UUID=96ce6da4-7b55-409e-a078-3572612b61c1       /boot           ext2    defaults        0       2
UUID=dded258a-f6ce-4a4c-9ee7-5d8076648080       none            swap    sw              0       0
UUID=728b737e-8420-437c-a43a-5d5a8f60fba5       /var            ext2    noatime,nodev,nosuid 0       2
UUID=0dc3dcff-3d63-4830-893f-6f9afd811875       /               ext3    defaults,errors=remount-ro 0       1
UUID=ed5d4608-6db8-40c0-9405-ba091b5f8a77       /recovery       ext3    defaults        0       2
UUID=5f702fae-9386-436e-86d2-90323b7f0857       /usr            ext2    nodev,ro        0       2

## sdb and sdc are RAID1 mirrored, except for swap.  Each is 18 GB SCSI.
/dev/md0        /var/www        ext3    nodev,nosuid    0       2   #sdb5,sdc5
/dev/md1        /var/lib        ext3    nodev           0       2   #sdb6,sdc6
/dev/md2        /var/spool      ext3    defaults        0       2   #sdb7,sdc7
UUID=336b0268-44ba-4302-834f-d51b55e53a6b       none            swap    sw              0       0
UUID=46b8d092-234f-4241-969b-a54fff38445d       none            swap    sw              0       0
/dev/md3        /home           ext3    defaults        0       2   #sdb9,sbc9
/dev/md4        /usr/local      ext3    defaults        0       2   #sdb10,sdc10



Yeah, like UUID=96ce6da4-7b55-409e-a078-3572612b61c1 is a _huge_ improvement 
on /dev/sda1.  Right.  This horrific ugliness gets introduced right into 
a file that I rely on being readable in order to understand and administer
my system.

There are functional advantages to Universally Unique IDentifier
partition identifiers.  The whole matter's been discussed to death, and
I feel no obligation to recap the discussion.  It's just... well... ugh.
That's a cure that makes the related disease look attractive by
comparison.  (I might feel different if my main storage were in whole or 
in part on USB drives, such that it might be _credible_ to talk about 
my filesystem unexpectedly changing from /dev/sda1 to /dev/sdc1.)



And now, I'm going to rag on the Filesystem Hierarchy Standard people.
The FHS (http://www.pathname.com/fhs/) is a sort-of standards document
that was maintained for many years by Dan Quinlan.  It aimed to coax all
the diverse Unix communities into standardising where different types of
files and subtrees go in the overall Unix filesystem tree, and
articulate the reasons why particular parts of the tree exist and what
should (and should not) go there.  In part, FHS was a reaction to
lingering proprietary-Unix madness where, e.g., main binary executables
could end up being any damned where, including /usr/lib and /etc -- both
of those being places where you might find, of all things, the Sendmail
binary on old-Unix systems.

I figure FHS reads as a somewhat cagily and bureaucraticly worded
document is that Quinlan was aware he had little to wield other than
moral 'suasion.  He couldn't tell the BSD communities 'Hey, stop being
stupid and putting main system binaries under /usr/lib !'  He had to 
cite reasons why it was deprecated and suggest better places.

In recent years, Quinlan has been joined by Rusty Russel and Christopher
Yeoh as maintainers, and first the 'Free Standards Group' and then the
Linux Foundation (when it emborged the FSG) became the effort's
sponsoring umbrella body.  

And, to my way of thinking -- and I may just be getting cranky and set
in my ways -- they've gone just a little goofy in the last revision, v.
2.3.  

My main and possibly sole (I'd have to think about that) complaint
concerns two rather gratuitous additions to the root-directory 
top-level trees:

/media : Mount point for removeable media
/srv : Data for services provided by this system

One general rule for Unix systems is that you don't screw around with
the root directory.  You keep it clean, and you don't create gratuitous
top-level directories.  FHS itself says:

  There are several reasons why creating a new subdirectory of the root
  filesystem is prohibited:

    * It demands space on a root partition which the system
      administrator may want kept small and simple for either
      performance or security reasons.
    * It evades whatever discipline the system administrator may have
      set up for distributing standard file hierarchies across mountable
      volumes.

  Distributions should not create new directories in the root hierarchy
  without extremely careful consideration of the consequences including
  for application portability.

The rationales cited for /media and /srv have been, to my mind,
unconvincing.  We're told that /mnt should be reserved for a
'temporarily mounted filesystem', whereas /media is for 'removable
media'.  Huh?  Why not use /mnt?  FHS claims this is the answer:

  Although the use of subdirectories in /mnt as a mount point has
  recently been common, it conflicts with a much older tradition of using
  /mnt directly as a temporary mount point.

Well, Don't Do That, Then.  Or, at least, it's only going to be the
sysadmin/root user who does it, which means he/she is well aware that
mounting /dev/[something] directly onto /mnt is going to prevent 
mounting anything else there simultaneously, and, oops, my bad, I guess 
I'll just 'umount /mnt; mkdir /mnt/1; mount /dev/[something] /mnt/1'.
Done.  So, why are we making new trees in the root?

/srv is a lot more defensible but not enough, in my personal view.  It's
to serve as a dedicated home for things like systemwide http, ftp,
rsync, and VCS trees offered to the public, and related things like
systemwide CGIs.  The justification is that nowhere else was ever quite
suitable.  E.g.,

o  /home/httpd was a bit dumb (and I'm talking to you, Red Hat Software,
   Inc. of old) because there wasn't a real human user called httpd,
   and it wasn't really a homedir tree.
o  /var/www or /usr/www were both inappropriate since the HTTPd tree
   didn't really meet the criteria for creating a second-level 
   subtree in either place.
o  /www was difficult to justify as a top-level Unix dir for reasons cited.

Debianistas Joey Hess or Sean Perry (I forget which) once gave me a
pretty good answer when I challenged the people on the SVLUG list to
tell me where in FHS terms Apache HTTPd's top-level index.html should
rationally go.  I think it was Joey, and that he said something like
/usr/share/apache/index.html .  He said that sub-parts of the system
HTML tree should go wherever in the Unix system tree the nature of their
contents and ownership dictated, and that they should be integrated into
the Apache document tree via Location directives.

I like that.  It was logical, and actually bothered to pay attention to
the reasons why FHS puts particular things in particular places.

_However_, this is where, IMO, we exceed the limits of FHS's rational
application to real-world system management.  In the real world, it's
the system administrator who rationally should decide where in the
filesystem the system HTTPd document root should go, and it should
whatever suits the local administrator's convenience and working style,
along with that of others who maintain the system-wide files.  In 
short, it should be wherever is most natural for the Web people in 
question.

If I were working with thousands of different people's HTTPd servers, I
might get tired of having to look at each company's HTTPd conffile to see
where the local sysadmins prefer their document root, but, let's face
it, they're going to put it where they want it anyway.


Oddly enough, two top-level directories now exist commonly on Linux
systems but are _not_ described in FHS, and thus by omission are
deprecated by it:

/sys
/selinux

The '/sys' tree is yet another abstract volatile filesystem for
internal-system recordkeeping.  Its reasons for being invented is a bit
comical:  /proc had become such an unruly mess that it really didn't fit
in there despite being fundamentally very proc-like, and /dev wasn't
quite right because the new data contained system data about the
mappings between devices and drivers (for the benefit of software such
as udev and HAL), instead of just device nodes.

As an abstract filesystem, /sys gets created at boot time as -- yes --
yet another filesystem type, sysfs (which was closely derived from
ramfs, mentioned below).


[1] There's also a related and very similar filesystem type, called
'ramfs', that's exactly the same as tmpfs except that it _does_ grow
dynamically without a pre-declared limit, and uses only RAM as backing
storage and never swap.





More information about the conspire mailing list