[sf-lug] filesystem for a 3TB external USB drive

Sat Dec 31 01:07:14 PST 2011

> Let me apologize first if the following seems flip, but based on my
> personal and practical experience with large filesystems, you were
> given pretty bad advice.
Based on the phrasing, I get the impression that you feel that every statement that I made in my email was incorrect.  

I must admit that to many, filesystems is as much personal preference as it is "science" and it is very easy to tread into flame wars over the subject, so I will attempt to tread carefully.

First, let's start with the original request

> Any recommendations on a 3TB Western Digital external USB drive? Came
> natively with a NTFS. Access will be via Linux only and will be used
> for backup.

The purpose of this drive's use is for a Backup and the request is for a filesystem that is appropriate for this particular situation (as opposed to a generalized discussion of file systems). 

Sameer Verma did not answer any of the (IMHO) fairly reasonable questions that  jim at systemateka.com posted, wanting further information about the data to be stored  and what kind of features were desired. (what number of files, their size, journalism, whether performance was a priority, cross-platform accessibility,etc). 

Lacking any of this information, I operated from what I feel was a fairly conservative set of criteria for my recommendation. 
Since he did not mention anything about a "business" or some other type of organization  I am going to assume this was going to be for personal use and this 3TB drive was going to be the sole backup location. 

#1 High data stability

#2 How many methods are available to recover information in the event of double-catastrophy (i.e.  where both the original computer and the "backup" drive are stored in the same room  and a fire/flood/other disaster effects BOTH copies of the data). True, it is better to have an off-site copy of the data in multipule-geographically and physically secured locations, but *very* rarely is that done practice by individuals at home. 

#3 Compatibility with other operating systems. True, the original posted said it was only necessary for it to work with linux, but situations may change in the future and one might need to be able to get to your data on a platform different from the one where you started.   

Next, lets go over the parts where we BOTH agreed.

> "BTRFS This means that it is currently possible to corrupt a btrfs filesystem..  Isn't that special?"
Swimming through your sarcasm, I got the impression you did not recommend the use of BTRFS. 
In my previous email, I said exactly " it's going to be quite a while before it is release quality".  
I would believe most people would consider calling something NOT release quality to be the opposite of a recommendation for it's use. 

> Reiser:
> Excuse the pun, but Reiser is effectively dead
I completely agree, and I felt the phrase "I don't expect there to be much support in the future" to convey a sentiment similar to what you were expressing, coupled with the phrase "I wouldn't trust it for much of anything" as a recommendation against it's use. 

> I don't have a whole lot of experience with EXT4.  So the only thing I can say about it is that it's new yet to be proven in production
Glad you agree with me again, since I would feel most people would consider saying it had "some reliability issues" as being a suggest AGINST it's use...

Now to where there was some dispute.

> ZFS (http://zfsonlinux.org/):
> "Please keep in mind the current 0.5.2 stable release does not yet
> support a mountable filesystem." - You've gotta ask yourself a
> question: "Do I feel lucky?"  If you want ZFS, do it on Solaris.

I did NOT recommend using the kernel-based patch for native ZFS support, and instead I stated you should use the "zfs-fuse" implementation, which some people believe to be a stable implementation (albeit not very fast) that is also linux compatible. 

"zfs-fuse is mature and stable enough to trust data to it"
http://www.virag.si/2010/06/zfs-fuse-0-6-9-on-ubuntu-lucid-lynx/

"From this, I conclude that ZFS-Fuse is pretty stable"
http://mindinthewater.blogspot.com/2010/06/zfs-fuse-reliability-report.html

"I've been using ZFS-FUSE for several weeks with many terabytes of data and I believe it is stable and ready for prime time.  "
https://bbs.archlinux.org/viewtopic.php?id=109536

As for recovery potential, no doubt one could use the free software available on OpenSolaris, Nexentia or even Oracle Solaris Express for data recovery by running it off a bootable CDROM. I wouldn't be surprised that there is companies that will provide data recovery services since there is considerable corporate backing. 

Now, if this was a situation where this file system was to be used for a server, then I would have recommended something else (or at least, a solaris derivative).

In the end, the choice of ZFS on linux is at best a compromise, and I certainly admit that.  The big feature for this activity is that it automatically provides checksums on the contents of the data blocks, and therefore of file contents. This provides to some extent a "self-healing" capability, but I must admit it isn't particularly useful unless you are using ZRAID or the copies=2/3 feature (that stores multiple copies of each file in the filesystem). 

> EXT2/3:
> "Are you crazy? The fall will probably kill you."
> 
> All else aside, just the fsck time alone on a large volume rules this
> out
True, fsck can be rather inconvenient with EXT2 but again this isn't a drive that requires high performance, high uptime or other specialized needs. 
Reliability of the filesystem and Data consistency is going to be way more important with a backup drive then anything else.  *if* the drive is not mounted cleanly consistency can sometimes be an issue but generally you don't want to keep your backup drive plugged in all the time anyway in case of power surges,etc. Even then, the vast majority of the time it can be fixed with fsck, it merely takes some time which isn't a big deal based on the suggested criteria. 

There is many programs that will provide data recovery for ext2, as well as readers for nearly any platform in existence. 

> The only filesystem that I personally for "large" volumes is JFS.
At least at one point (a number of years ago now, I must admit) most of the articles I had read that talked about JFS, described it as being a half-complete project that was abandoned by IBM. At the time, the performance was slightly better then XFS in some regards, and worse then others but reliability was an major issue since it was still labeled as "beta". 

According to wikipedia, XFS was included in the stable, mainline Linux kernel tree with version 2.4.0 [January 01, 2001]  as a stable release and JFS was included in kernel version 2.4.18 [Feb 25, 2002] and was appropriately labeled as a "experimental" feature.  For political reasons, pre-stable and early 1.0-ish releases were not part of the mainline kernel and therefore required manually patching the linux kernel until later. 

Depending on where you look, it's clear that JFS is not exactly flawless as you seem to imply

"We have found that the jfs filesystem is not reliable in Linux 2.6.24, however in 2.6.21 it is extremely robust."
http://www.embeddedarm.com/about/resource.php?item=459

Wait... so it  suddenly becomes significantly less stable with a /newer/ version of the kernel???

"I have a server with a JFS filesystem on it that's gotten corrupted."
http://linux.derkeiler.com/Mailing-Lists/Ubuntu/2009-04/msg00156.html

"I recently encountered JFS filesystem corruption on a system"
http://old.nabble.com/JFS-corruption-%22Stale-NFS-filehandle%22-td26287390.html

"JFS Corruption? Can't recover filesystem"
http://www.mail-archive.com/jfs-discussion@lists.sourceforge.net/msg01491.html

"JFS: long fsck time on large filesystem?" - 12+ hour fsck on a 7TB JFS filesystem
http://serverfault.com/questions/154736/jfs-long-fsck-time-on-large-filesystem

I'm not trying to say JFS is garbage (I am surprised to read how many positive reviews on it there is now) but it sounds like it is far from perfect and as I have learned, one line statements rarely convey the entire of any given situation. 

-----
> 
> XFS:
> XFS on Linux is not XFS on SGI.  My personal experience with XFS on
> Linux is that it's basically unusable.  

"In my personal experience, XFS is very unreliable".  That is the problem with personal experience - it can vary widely and  it's hard to prove anything without hard facts. 

I've used it on Desktops, Laptops and a Servers running XFS -  some were always shutdown cleanly and others where lab computers that were frequently unplugged resulting in very dirty file systems. With my personal experience I never had any trouble with XFS. XFS had gotten a bad rap because it will result in silent data loss if files were being worked on at the moment of a dirty unmount (i.e. power outage or crashes), since it keeps a significant amount of write cache in memory before writing it to disk. However, this is more of a design choice rather then a faulty design.

> I had a few years ago tried it again just to see, and there was silent data corruption on disk.
Silent corruption of data on a risk is impossible to eliminate entirely without using a filesystem that supports block-level checksums and repair parity data - period.  JFS does not support this feature, so nobody can claim JFS to be impervious to it. You could argue that one file system has a higher probability of having silent data corruption, but that wasn't a part of your arguments.  The only file systems available in OSS that provide block-level checksums is ZFS and BTRFS, and since BTRFS is still in the design stages, ZFS is the remaining options. 

For the record, XFS  make a series of design choices that many people thought were bugs. 

First, the file system works on the assumption it is running on a Server in an enterprise environment, with ample amounts of memory, backup sources of power and therefore it keeps large amounts of write data cached and only flushes it at much longer intervals then other file systems. If you had been in the middle of a lot of large operations and then the power goes out, you would loose the data that was stored in the RAM cache. To an extent, this is true with all filesystems and is always a risk, but XFS chose to be more aggressive then most in order to gain additional performance.

Secondly - when there is possibilities of data corruption, XFS will intentionally write a bunch of zeros over the blocks that it is  suspicious of because of after a hard power off. So, instead of having a file that has data that can potentially have data corruption in it and therefore the whole file is suspect but there is no way to know for sure which portion of the data is correct (as what happens with most filesystems), instead you have blocks that you know are bad (because they are all zeros') and now one knows it is time to recover those blocks/file from backups.  Depending on your perspective, this is "too aggressive" or "too conservative" depending on your personal philosophy.  It's certainly /different/ then most of the other file systems out there. 

This was a design *choice* made by the programmers because they made a lot of assumptions about the environment it would be running it in, what priorities the people running it would have and then optimized it for those build requirements.  That's probably why your experience did not change when you used it again later, because it was still operating "as designed". 

Now, stepping back and re-evaluating everything I've now said, maybe XFS wasn't the best of choices for this particular situation. If I was able to make the assumption that the drive would be turned on, mounted, used and then cleanly unmounted right after use then I feel it would still be a ok choice but if the person was going to be rough with it (regularly unplug it without properly unmounting it) then I would have recommended something else.  

> ..you might have to fsck an volume, then your data is
> effectively offline for hours upon hours.  I had tried an 8TB volume
> with EXT3 once and had to migrate the data off (the write performance
> had mysterious hit a cliff when we went above a certain number of
> files/directories).
Indeed. I was not recommending EXT3 to be used on a server that had a multi-terrabyte filesystem where downtime would be a problem...

> [EXT4] is unlikely ever to
> make it into the mainline kernel.
According to wikipedia, EXT4 has been included in the mainline kernel On 11 October 2008, version 2.6.28.... 
I know a number of distributions (including ubuntu &  centos ) have used it as their default filesystem for some time...

 However, even the developer of EXT4 admit's that it is a "stop gap solution", and I know (at least initially) there was a lot of problems with ubuntu installations when it first came out because of file system problems.

> 
> NTFS-3g:
> "NTFS-3G supports partial NTFS journaling, so if an unexpected
> computer failure leaves the file system in an inconsistent state, the
> volume can be repaired." - from http://en.wikipedia.org/wiki/NTFS-3G.
> Again, how comfortable are you with regard to possibly losing your
> backup data?

I've used NTFS-3G for years, on both Linux and Mac OS X computers. Primarily for accessing my ~300GB USB drive, that I use it for recovering/transfer different users data between computers & operating systems. It's never been particularly fast I must admit, but i've never had any major reliability problems with it. Every once in a while I have to run a disk check, but that can easily be just as much the windows NTFS implementation as it could be  NTFS-3G's.  That's just my personal, anecdotical experience though.  

It's pretty clear that the makers of NTFS-3G are confident in it's ability to repair the system on the fly
"Historically NTFS-3G had very rigid sanity checks and this won't change  in the future" and even though on the page they released a offline recovery tool, they implied that they felt it would rarely be necessary in this PR release
http://article.gmane.org/gmane.comp.file-systems.ntfs-3g.devel/678

In the FAQ, they mention a couple of bugs in the /Microsoft/ implementation that is a problem in windows that  does not effect NTFS-3G
http://www.tuxera.com/community/ntfs-3g-faq/#questions

"NTFS-3g is mature enough to rely on these days."
http://forums.whirlpool.net.au/archive/1211829

"use NTFS-3G...[this] project [is] free, open-source and mature."
http://superuser.com/questions/45130/cross-platform-file-system

"I’ve been using NTFS-3G in Linux for many years and haven’t had any problems with it.."
http://blog.thewheatfield.org/tag/osx/

"NTFS-3G develops, quality tests and supports a trustable, feature rich and high performance solution for hardware platforms and operating systems whose users need to reliably interoperate with NTFS."

"The driver is used by millions of computers, consumer electronics devices for reliable data exchange, and referenced in more than 30 computer books. Please see our test methods and testimonials on the driver quality page at www.ntfs-3g.org."

True, this might be just a lot of marketing speak, but I've known a ton of people who use it and it is fairly stable for them. 

There is countless recovery programs for the filesystem and every "drive savers" type of company known to man kind supports the filesystem. That was the primary reason why I even suggested it as an option.  If the disk platter on a drive running JFS/XFS failed, then it would be pretty slim pickings on recovery software/companies to choose from (if any at all). 

I also specifically said it was an "OK" choice, implying it wasn't my *first* choice and I clearly stated performance "isn't awesome" with it. 

Again, I wouldn't use NTFS for a production server and I felt the context of the email conversation made it clear this was for personal, archival purposes only. 

My original email was a quick 5 minute response  that skipped a lot of details because I was hoping to avoid having to spend a few hours describing things into details (as I am now doing).   I honestly thought a lot more people would pipe up in the conversation which would provide more opportunity for discussing the finer details, In particular Sameer  who would have provided more details about his particular needs and more deeper conversation could have been had.  I now realize that was a pretty big assumption to make.

I didn't think I was writing a email that was going to be the "end-all be-all" guide to filesystems, and I felt I had been expressing this as part of a discussion, not a definitive guide. 

I hope at least some of this information is useful to those of you out there, and that at least some of this will at least explain my through process to why I had made the "pretty bad advice" that I did. 

No single data storage implementation is EVER 100% safe. The best you can do is take precautions to decrease the odds of data loss/corruption (backup, error correction code at the file/software/database/filesystem level,  quality hardware, ecc memory on ALL components/cards/controllers/buffers/caches, conditioned power lines, RF shielded building, etc,etc) but freak accidents can and DO happen. 

The best you can do is decide how important your information is and take steps to bring the probability within a  level of probability (and cost) you are comfortable with.  

thanks,
Ian