Date: Mon, 13 Mar 2000 22:27:22 -0700 (MST)
From: Miles Nordin (carton@Ivy.NET)
Subject: [CrackMonkey]softdep urls

The Monkey Master has read more papers on this stuff than I have, so what I have to contribute is more of a practical summary than actual reliable academic info. NetBSD has both FFS with soft updates and LFS, so it's a good community in which to do a comparison, decide which one is right for your job, u.s.w.--unlike some LUSERS around here, the NetBSD folk have actually used both technologies. HAHAHAHA!! so, all this rot is old news to us. FEH.

If you want to skip my poorly-informed ramblings, there's a bibliography at the end.

FFS-softdep and LFS are similar enough that they're not orthogonal. However, they're not exactly the same either.

The not-orthogonal part is that, even if you could add softdep to an LFS, there is no point, because the problem is already solved. An LFS inherently orders writes in a crashproof way, so it can write everything asynchronously with no consistency-danger. softdep is a rather complicated way of reordering the same pattern of writes that an ordinary FFS would make to achieve the same ends. So, you don't need both.

Interesting question: If your hard disk does write caching and write reordering, do FFS-softdep's assumptions still hold? How about LFS's?

If, in the Linux tradition, you're not so much interested in finding appropriate code for your particular job, as in developing The Ultimate Ueberfilesystem, well then the ueberfilesystem is LFS with a
file-coalescing cleaner. See Thor's explanation on the various shortcomings of each filesystem below.

This is getting a little confusing though. FFS with softdep is not faster than FFS with async writes. The closest analogy to ext2fs in the BSD world is FFS with the '-o async' mount option, that turns off synchronous metadata writes. Sync-metadata is turned on by default in BSD. That's why Linux tends to have such catastrophic filesystem damage if it crashes while it's hitting the disk hard. BSD doesn't do that, but the price is that it's slower. '-o async' is one way to make a BSD filesystem as fast (and as fragile) as a Linux FS. softdep is another--softdep is only slightly slower than '-o async', and is at least as robust as normal FFS.

If you use Linux, you're SOL. You have two options: async, or sync. BSD gives you three: async, sync-metadata, or sync-everything. sync-metadata is the default. And the booby prize, for flagrantly ignoring all the relevant literature, and reinventing the square wheel, goes to: Remy Card, Linus Torvalds, and the Linux community! woo-hoo.

LFS is even faster than FFS '-o async', for writing at least. But there's another speed problem. well, see Thor's email. He explains it a lot better than I would.

The consistency guarantees in normal FFS or FFS-softdep are slightly different from those of an LFS (or, what Linux is promising you, modeled not surprisingly after what NTFS does: a cheap "JFS" where the "log" gets slapped into a file on the disk).

LFS will, upon crash recovery, deliver to you a filesystem that matches exactly the state that the filesystem "should have been in", at some point in the past, say, like 46.2 seconds before the crash. An FFS with softdep's will give you (guarantee you) a fairly sane recovered filesystem. So will ordinary BSD FFS. but, not as precisely as the LFS. To illustrate, say you, in this order:

1. Created A
2. Renamed B
3. Deleted C
4. Appended to D

you might end up with, say, A created, C deleted, D appended, but B still has its old name. An LFS is more predictable. All your changes, 1 through n, will be committed in the recovered filesystem. And nothing else. Perhaps n != 4, but LFS enforces an ordering. If you're just mucking around with a bunch of text files, this is irrelevant. But, if you have some kind of filesystem-based locking protocol or something, where there are "sane" locking states and "broken" locking states, the LFS guarantees might make your life as a programmer a lot happier. If you are, for example, trying to recover a CVS repository. :) With FFS, you have to write a "fix" tool. With LFS, you merely have to check for a vanished lock holder, because you know for sure the filesystem is in a state you voluntarily put it in, some time in the past.

Now, delivering these consistency guarantees may require having the transaction roll-forward piece that Thor talks about--not sure 'bout that. NetBSD doesn't have said piece yet, so it may or may not deliver this promise. The point: in theory, an LFS can do this, and an FFS with softdep cannot.

Along the same lines, LFS's and JFS's can have a "cheap copy" or "snapshot" feature, which is perfect for making backups. It's extra work to implement it, but it can be done. FFS's can't do that, ever, with or without softdep.

Another feature--LFS's in particular are supposed to have some optimization for stripped RAID arrays that intentionally spreads traffic over all the disks in the array, instead of preferentially hitting one disk. Anyway, it can supposedly do this more evenly than statistics alone would suggest, as long as you tell your LFS how big the stripe size is when you make it.

NetApp's WAFS "write-anywhere filesystem" is heavily based on the LFS research.

FFS-softdep is in NetBSD, FreeBSD, and BSDI, since about half a year ago, and it's fairly stable and complete.

LFS is working in NetBSD, and it's finally getting to be fairly stable. Not sure who else has this--I think BSDI might, and FreeBSD doesn't. But, some pieces are missing. I think the fsck may still have some limitations. I mean, bad limitations, like, it cannot fix any given filesystem. Not to mention lack of the roll-forward agent. And of course we don't have a file-coalescing cleaner. perseant's still working on LFS, and will hopefully add some of this stuff. They tend to use the Wizard-in-the-Cave development model, as opposed to the Thousand-Monkeys model Linux likes to use. Consequently, things happen slowly, but they actually work when they're done.

The nice thing about FFS-softdep, which is why it's a win for most people, is that it works right now. Also, you can turn it on or off with tunefs. There's no need to backup and restore the filesystem. And it retains the spatial-locality properties of FFS, which tend to be right more often for high-bandwidth low-tech jobs (web server, mail server, ...), basically anything except a build.

IMHO, LFS will win eventually, after perseant finishes implementing the rest of it. :) FreeBSD will adopt it and then add a bunch of features like snapshots and ACL's and support for Microsoft FTL that make the NetBSD folk gag in disgust. (can you say, NETGRAPH! yech.)

Linux will probably end up using NTFS after Microsoft open-source's it, with an extension for symlinks. Probably Linux will abandon Unix file permissions all together and adopt NT/Posix ACL's, thus maintaining better synchronization with the authoritative Microsoft codebase. That'll be really cool--it'll make it a lot easier to dual-boot, and solve their horrible filesystem crash-corruption problems at the same time.

As for writing softdep's for ext3, why not contribute to the NetBSD or FreeBSD efforts rather than reinventing round wheels and slapping them onto a wooden cart? I think if you look at the respective Linux and BSD code, you'll see very obviously that one decision leads to Double Happiness and the other to Fiery Catastrophic Mental Insanity. What with NTFS, SGI XFS, IBM JFS, and ReiserFS all on the horizon, I think contributions to BSD would have a smaller learning curve, better longevity, and greater immunity to corporate architectural corruption. But hey, that's just me, speaking as a happy NetBSD user who's had Linux eat up tons of data and several pointless months of my life.

oh yeah, you get IPsec and IPv6 that actually works as part of the bargain. none of this patched-patch revision spaghetti. itojun is one of us now. mwuhuhuhuuhuaha!
-- Konrad Schroeder's LFS work-in-progress page for NetBSD
-- Thor's description of performance differences between
FFS-softdep and LFS
-- Two ref's to the original soft updates papers, both
broken but perhaps recoverable. friggin' Usenix can't
seem to stick with a URL.
-- Margo Selzer's papers page, which has a few LFS papers
-- John Heidemann's papers page, which is about stackable filesystems in BSD (which makes it less likely relevant--but this is, FYI, another of the big gripes BSD folk have with Linux: no stackable filesystem support.)

Also, as I was going through the NetBSD archives, a lot of people talked about some papers about LFS mentioned in the mount_lfs man page, but of course Linux users won't have that man page, so here they are:

Ousterhout and Douglis, "Beating the I/O Bottleneck: A Case for Log-structured File Systems", Operating Systems Review, No. 1, Vol. 23, pp. 11-27, 1989, also available as Technical Report UCB/CSD 88/467.

Rosenblum and Ousterhout, "The Design and Implementation of a Log-Structured File System", ACM SIGOPS Operating Systems Review, No. 5, Vol. 25, 1991.

Seltzer, "File System Performance and Transaction Support", PhD Thesis, University of California, Berkeley, 1992, also available as Technical Report UCB/ERL M92.

Seltzer, Bostic, McKusick and Staelin, "An Implementation of a Log-Structured File System for UNIX", Proc. of the Winter 1993 USENIX Conf., pp. 315-331, 1993.

Miles Nordin / v:+1 720 841-8308 fax:+1 530 579-8680
555 Bryant Street PMB 182 / Palo Alto, CA 94301-1700 / US