[conspire] external storage recommendation

Rick Moen rick at linuxmafia.com
Sat Sep 25 13:40:18 PDT 2021


Quoting Paul Zander (paulz at ieee.org):

> Raid 1 uses 2 drives with all data written to both.   Protection of
> loss of 1 drive, but only get storage area of 1 drive.  For RAID to be
> "efficient" 4 or more drives are needed.    Did I get that right?

Head of the class, sir!  (Technically, RAID5 requires at least _3_
drives.)

The significant advantage of RAID1 mirroring over RAID5 & similar is 
simplicity and lower computational overhead (especially, but not
exclusively, during a rebuild cycle following failure and replacement 
of a drive).  So, in many scenarios, those RAID1 advantages are
compelling and make the 50% "loss" of capacity of little consequence and
easy to justify.

With (say) RAID5, you have around "loss" of capacity equal to the 
capacity of one of the constituent drives.  

Here is a FAQ:  https://www.vantagetech.com/faq/raid-5-recovery-faq.html


I alluded earlier to ZFS (and its slightly lackluster Linux-native
imitator, btrfs) as being in a different category relative to regular
*ix filesystems like ext4, in ways that make it much more desirable for
servers.  I should elaborate.

ext4 is an excellent, high-performing, conservatively designed *ix
filesystem.  There is nothing wrong with it.  Things it doesn't do, and
doesn't aspire to do, include:

o  background fsck and auto-repair / self-healing during normal system operation
o  checksumming and vetting of every byte written, data and metadata
o  data snaphots and replication
o  native volume management
o  native handling of RAID
o  automatic rollback, in the event of detected error or inconsistency
o  native data compression and de-duplication
o  a lot more.  see https://en.wikipedia.org/wiki/ZFS#Summary

A regular *ix filesystem, even a really good one like ext4, has none of
those things built-in.  As filesystem sizes and data collections 
(e.g., A/V files) get bigger and bigger, the risks from bitrot and 
data corruption silently occurring and accumulating over time become
more worrisome, and also the risk of having multiple days of system
downtime just because of need to correct filesystem errors during a
reboot-driven fsck -- because you had 10 terabytes of files.
  
The whole idea of a file _server_ on your network is supposed to be for
it to be a _reliable_ place to house your files, which is where the
first two ZFS advantages listed really shine.  On the minus side, ZFS is
fairly RAM-thirsty.  So, arguably worth the RAM overhead.

ZFS was one of several ground-breaking Sun Microsystems projects that
gave Solaris and several compatibly licensed BSDs a functionality edge
over Linux.  The others were dtrace (which is kernel-level debugging),
Solaris Containers aka Zones (which are a better implementation of
chroot jails), a Kernel-based Virtual Machine, and OpenSolaris Network
Virtualization and Resource Control aka Crossbow (a set of features that
provides an internal network virtualization and quality of service).
All of those are C-coded in the Solaris kernel under CDDL terms, which
clashes with Linux's GPLv2 terms, hence the code, even if ported to the
Linux kernel, cannot be lawfully distributed in binary form, as that
would be copyright violation.[1] 

btrfs is an independent attempt to implement most of ZFS's features in
Linux.  However, even now, 14 years after its debut, it still has some
problems.  Here, this runthrough will tell you about that, better than I
could:
https://arstechnica.com/gadgets/2021/09/examining-btrfs-linuxs-perpetually-half-finished-filesystem/





More information about the conspire mailing list