[sf-lug] sticky bit; directories

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sun Apr 4 11:34:30 PDT 2010


> Date: Fri, 2 Apr 2010 12:11:25 +0100
> From: Grant Bowman <grantbow at gmail.com>
> Subject: [sf-lug] What is the sticky bit?
> To: SF-LUG <sf-lug at linuxmafia.com>
>
> What is the sticky bit?  This question came up and started an
> interesting discussion, prompting me to take a closer look at exactly
> how ext2 does it's work.  A sticky bit is well described in a
> paragraph of the chmod man page. [1]

Well, it's not at all specifically ext2.  The sticky bit has quite a
long history, going back over 30 years into UNIX.  POSIX/SUS probably
best and most definitively covers currently what the sticky bit is - or
per standards should be.  Linux (e.g. LSB) heavily leverages the
POSIX/SUS standards.

> So what is a directory?  It is a special type of inode. ?Each block

A directory is "just" a file of type directory.  Historically, it just
contained the inode number and filename of the files it contained.  It
still does, though precise structure and where/how that's stored may be
rather to quite different now, depending upon the specific filesystem
type.  The operating system takes care of handling/interpreting
different file types differently.  E.g. the operating system handles
reads from, and writes to, a named pipe or device special file
differently than writes to an ordinary file.

Once upon a time ... better yet, simulated :-) here:
$ unix-v7

PDP-11 simulator V3.3-2
Disabling XQ
@boot
New Boot, known devices are hp ht rk rl rp tm vt
: rl(0,0)rl2unix
mem = 177856
# mkdir /tmp && mkdir /tmp/d && cd /tmp/d
# >f
# od -c .
0000000 237 007   .  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000020 240 007   .   .  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000040 236 007   f  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000060
# bc
ibase=8
7*400+237
1951
7*400+240
1952
7*400+236
1950
quit
# ls -afi .
  1951 .
  1952 ..
  1950 f
# rm f && od -c .
0000000 237 007   .  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000020 240 007   .   .  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000040  \0  \0   f  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000060
#

So, ... explanation of the above simulation ...
UNIX 7th Edition (circa 1979) - apparently about a 1988 image - run on a
PDP simulation.
So, I create directories /tmp (not present on that image) and then
/tmp/d and cd into /tmp/d
I then do a dump (od(1) - Octal Dump) of that directory, with the -c
option, ASCII characters will be written as their character or C style
octal escape sequences.  Non-ASCII will still be shown in octal.
I then create an empty file named f (by using >f).
I then do another dump of that directory, as done before.
We then do a little math, using bc(1) - Basic Calculator.  We set the
input base to be 8 (octal), and do a bit of math from our preceding od
-c output.  In that historic directory format, the first two bytes were
the inode number little-endian, so we convert those (400 octal=4*2^6=256)
to decimal.
We then do ls -afi - the -a option gives us all entries (includes those
starting with . - also redundant with -f but doesn't hurt), -f gives us
all entries, in directory order (the order they're written in the
directory itself), and -i shows us the inode numbers.
Note that our calculated inode values match those shown from the -i
option of ls.
Look again at the od output. 16 bytes per directory entry - two bytes
for inode, 14 for filename.  At the time, inodes were limited to two
bytes - thus a maximum of a bit under (2^8)^2=65536 maximum total files
(of any type, including directories) could be on any one filesystem.  At
the time, filenames were also limited to a maximum length of 14
characters.  In the directory, they were null padded to 14 bytes.
We then remove the file, and repeat our od -c .
Note that the only difference is the inode number data for that f file
has changed to 0.  Well, there is no inode zero - that's used in
directories to indicate unallocated.  Note that the operating system
doesn't squash the filename entry in the directory - it doesn't need to,
it's already deallocated and freed that inode - but it does mean the
name of the file that was in that directory is still in the directory.
Note also that the directory didn't shrink.  It was 32 bytes with just
. and .., then 48 after we added file f, but still 48 after "removing"
(unlinking) file f.  It's interesting to note also that the filesystem
call isn't remove, it is unlink(2).  It unlinks the file from the
directory, so it's no longer present by that name in that directory.
If that was the last link, and the file is no longer open, then it's
inode and any associated allocated blocks are freed. So the
file wasn't really removed, but more like deallocated - but of course
all the data bits that it was may get squashed by any subsequent
filesystem writes, and there's nothing on the filesystem that explicitly
says where that file's data or metadata was (of course in our case with
empty file, there were no data blocks - but there was allocated inode
data).  Anyway, that's how it was, once upon a time.  Things are quite
a bit different now, but there's also quite a bit that is more-or-less
the same.  Note that we did od directly on the directory - could have
used cat, or cat . | od -c, but that won't typically work on current
UNIX/LINUX/BSD/etc.  On non-ancient systems, one uses opendir(3) and
readdir(3) and also lstat(2) and perhaps readlink(2) and stat(2).

For something not nearly so ancient (e.g. non-ancient Linux):
$ mkdir d && cd d && >f
$ strace -fv -eall -s2048 -o ../.strace.out ls -1afi .
129068 .
116864 ..
129069 f
$

And trimming down our strace(1) output, we find we have:

open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
getdents64(3, {{d_ino=129068, d_off=2, d_type=DT_UNKNOWN, d_reclen=24,  
d_name="."} {d_ino=116864, d_off=17920, d_type=DT_UNKNOWN,  
d_reclen=24, d_name=".."} {d_ino=129069, d_off=17921,  
d_type=DT_UNKNOWN, d_reclen=24, d_name="f"}}, 4096) = 72
lstat64(".", {st_dev=makedev(58, 2), st_ino=129068,  
st_mode=S_IFDIR|0700, st_nlink=2, st_uid=1003, st_gid=100,  
st_blksize=4096, st_blocks=1, st_size=72,  
st_atime=2010/04/04-02:19:31, st_mtime=2010/04/04-02:18:55,  
st_ctime=2010/04/04-02:18:55}) = 0
lstat64("..", {st_dev=makedev(58, 2), st_ino=116864,  
st_mode=S_IFDIR|0700, st_nlink=6, st_uid=1003, st_gid=100,  
st_blksize=4096, st_blocks=1, st_size=368,  
st_atime=2010/04/04-01:45:54, st_mtime=2010/04/04-02:19:17,  
st_ctime=2010/04/04-02:19:17}) = 0
lstat64("f", {st_dev=makedev(58, 2), st_ino=129069,  
st_mode=S_IFREG|0600, st_nlink=1, st_uid=1003, st_gid=100,  
st_blksize=4096, st_blocks=0, st_size=0, st_atime=2010/04/04-02:18:55,  
st_mtime=2010/04/04-02:18:55, st_ctime=2010/04/04-02:18:55}) = 0

That's a bit lower-level than we might prefer to see, but ls(1) uses
opendir(3), readdir(3), and lstat(2), and the libraries make some
lower-level system (2) calls to satisfy those standard library (3) calls.
Note that opendir(3) doesn't provide inode numbers, but getdents64(2)
does happen include them - but ls(1) never sees them from
getdents64(2), as it is (and should be) using opendir(3) and
readdir(3), thus ls(1) uses lstat(2) to get the inode numbers which we
requested with the -i option (if we omitted the -i option, ls(1) may
have then skipped those lstat(2) calls).
Essentially, though the form and means of storage and access may have
changed, the directory still is file of type directory, it somehow
stores the inodes and names of the files it contains, and it requires
filesystem space (at least a bit, directly or indirectly somewhere) to
store that directory data.
Note also that it's still the case that for many(/most) LINUX filesystem
types (e.g. ext3, ext2), directories never shrink - but there are some
exceptions, e.g. tmpfs and reiserfs.

> Date: Fri, 2 Apr 2010 09:15:57 -0700
> From: Rick Moen <rick at linuxmafia.com>
> Subject: Re: [sf-lug] What is the sticky bit?
> To: sf-lug at linuxmafia.com
>
> Quoting Grant Bowman (grantbow at gmail.com):
> > What is the sticky bit?
>
> Ah, one of my standard interview questions.  ;->

Me too[1] :-)

> > So what is a directory?  It is a special type of inode. ?Each block
>
> Tip:  Play around with the "stat" command, and you'll learn some
> rather interesting things about what data are stored in inodes for files

Essentially:

a (plain/ordinary) file's data blocks contain the actual data stored in
file (sparse files may also "contain" nulls for skipped blocks).

directory contains (directly or indirectly) inode numbers and names of
the files it contains, directory is "just" a different type of file

other file types may do and/or store things a bit differently (e.g.
pipe, block or character special device, symbolic link).

various filesystem types may store/organize things a bit differently

various filesystem types may also have some additional data or metadata
for the filesystem itself and/or its files, e.g. ACLs, finer resolution
[acm]time data, UUID, LABEL, where mounted or last mounted, etc.

Other than the above, inode contains all (or nearly all) of the other
(meta)data about a file (of any type).  Namely: device, inode, type &
permissions, link count, uid, gid, device type, logical size, filesystem
block size, number of blocks allocated, atime, mtime, ctime

1. It's on the outline of stuff I form questions from.  I don't
    necessarily always ask, or always ask in the same way, but it's
    rather typical I'll ask at least some question(s) having to do with
    sticky bit (or where sticky bit may be the solution or answer to the
    problem scenario presented).





More information about the sf-lug mailing list