[conspire] rsync for backups?
Michael Paoli
Michael.Paoli at cal.berkeley.edu
Mon May 27 09:57:04 PDT 2019
rsync for backups? (query/conversation came up recently)
Sure, it can be used for that,
however, I generally tell folks to carefully review the options;
notably as there are some default behaviors of rsync I don't like,
and I don't consider the *default* behaviors acceptable/sufficient.
However, with careful consideration and use of relevant options,
rsync can be quite lovely (and often very efficient) for
backups ... or more/most specifically, updating a target
to match a source.
What hazards by default? Consider the following:
$ echo foo > foo && echo bar > bar && touch -m -r foo bar
$ rsync -a foo bar
$ cmp foo bar
foo bar differ: char 1, line 1
$ rsync -a --ignore-times foo bar
$ cmp foo bar
$
So, notice initially, even with the -a (--archive) option,
in our example, rsync still fails to replicate foo to bar.
Why? By default, rsync takes some shortcuts - and in my opinion
excessively so. What shortcut(s) specifically?
Well, if source and target are same size, and have same
mtime (modification time) - and they're the same relative
pathnames (I could've created different parent directories on source
and target, and same named file within each to make it bit more
clear, but in any case ...), it will presume the file contents are
the same, and neither read nor update the target. That could be very
hazardous for backups, especially if one really wants to well and
accurately backup "everything".
The solution for that little bit is --ignore-times
that option tells rsync to not consider the modification times,
but in all cases do the hashes of the file blocks and update where they
don't match.
So ... without using such option, by default, newer data could end
up not being backed up by rsync. Even a malicious/crafty user could
avoid newer data being picked up by rsync, by keeping pathname, size,
and (user settable) mtime of a file the same.
And, for the curious, what do my typical rsync options look like when
I'm doing a backup? Let's see ... from some such programs I have
laying around (and use semi-regularly) ...
rsync \
--archive \
--acls \
--xattrs \
--hard-links \
--numeric-ids \
--relative \
--sparse \
--checksum \
--partial \
--one-file-system \
--delete-excluded \
--ignore-times \
--quiet \
{ non-option arguments ... }
The above is for a local-to-local (e.g. I physically attach a
backup drive). I've also got the --one-file-system - that's where I
don't want to cross filesystem boundaries (mount points). Be sure to
omit such option if you do want to traverse filesystem boundaries.
One also might want to omit --quiet ... or not.
Let's see ... another example ...
rsync \
--ipv4 \
--archive \
--acls \
--xattrs \
--hard-links \
--numeric-ids \
--relative \
--sparse \
--rsh='ssh -aTx -o BatchMode=yes '"$SSH_OPTs" \
--checksum \
--partial \
--one-file-system \
--delete-excluded \
--ignore-times \
--compress-level=9 \
[ optionally filter expression(s) to include/exclude as desired ]
--quiet \
{ non-option arguments ... }
The above example does a remote to local over ssh. Again, my
options may not be fully suitable for you. :-)
I may also optionally have some stuff set in SSH_OPTs in the
environment that I may want to pass along to ssh as options.
Whether or not to use compression and/or how aggressively,
what's optimal depends on where the bottleneck is or may be.
E.g. slow link, reasonably fast CPU, generally high compression good.
Fast link & drives, slow CPU: little to no compression likely faster.
When in doubt, test. :-)
And don't forget to reasonably test, at least on occasion,
when you're damn sure - lest you get bitten hard and unexpectedly.
More information about the conspire
mailing list