[sf-lug] Example of an rsync(1) based backup scheme/script

Tue Feb 5 09:43:21 PST 2013


    Amazing and useful. Many thanks. I'll have to 
reread this a few times, but wow! (And thanks for 
the  expand  example, too.) 


On Tue, 2013-02-05 at 01:33 -0800, Michael Paoli wrote:
> So, I've occasionally mentioned before[1], and also elsewhere, e.g.
> BUUG[2], some bits about rsync(1), and caveats about using it for
> backup purposes, and also a bit more generally about backups, e.g.
> [1][3].  Anyway, latest case I had for doing an rsync based
> script/program was rather recently.  I do, at least occasionally,
> backup vicki[4], most notably bits there of interest to BALUG[5], and at
> least sometimes also SF-LUG.  Anyway, I thought it was time I create
> script/program(s) related to such to make it not only more convenient,
> but at least typically quite a bit faster.  And rsync - at least with
> suitable invocation, is well suited to the task.
> 
> In this particular scenario, as is highly typical when I want to use
> rsync for backup purposes - and where I or others might possibly end up
> quite dependent upon that backup data, I want to quite ensure that the
> backups are good, accurate, and complete.  The rsync program, at least
> by default, plays a bit loosey-goosey with that, leaning more towards
> speed and efficiency over integrity.  It strikes a good balance for
> many of its more typical usages, but not exactly the balance I prefer
> where I want an at least somewhat higher degree of integrity on
> backups.  Fortunately rsync has lots of options, so one can generally
> tweak that suitably to one's liking, and I do find its options quite
> sufficient for doing so, and to suit my purposes.  One downside/caveat
> I've found with that, though - the options tend to change significantly
> among various (major) version differences of rsync releases.  So,
> unfortunately, it does sometimes need some careful review and
> tweaking/updating (e.g. I've written fairly similar scripts before, and
> have had to significantly change them due to updated/different versions
> of rsync).
> 
> Anyway, I give example script[6], and example usage below.  I'm not going
> to fully explain rsync (or shell, or ssh, or Perl, or ...) below, or
> even anywhere close, but I'll point out at least some of the more
> interesting points and bits, and some of the bits that might not be so
> obvious or intuitive.
> 
> First of all, rsync can be very fast.  Most notably where target already
> looks rather to quite similar to source - and that is, in fact, what
> rsync very much excels at, and why I chose rsync as key component of
> backup in this case (backups happening over Internet, and with at least
> one link not or not guaranteed to be especially fast - e.g. not suitable
> for regularly streaming full backup of all the source system's data -
> but quite fine for grabbing the incremental/differential data - e.g. via
> rsync).
> 
> Anyway, a bit further below, I first show some example runs of the
> script/program (which I called rsync_host2dir[6]), along with timing.  I
> don't show the "very first run" - that took on the order of several
> hours, as the target was quite out-of-date (had lots of the "user data"
> and applications data, but was quite out-of-date and/or relatively
> incomplete on most of the rest of the operating system (OS) bits).
> However, subsequent backups have generally been comparatively very fast
> - e.g. under 6.5 minutes total to fully backup (rsync) 3 hosts (one
> physical host, two virtual guest hosts).  The first timing run shown,
> was a bit longer (bit under 38 minutes for one of the hosts), but in
> that case the host had somewhat over 380 MiB of additional new data.
> All three hosts, combined, have roughly around 8 MiB total of data of
> interest (actual data on filesystems, excluding: easily reobtained OS ISO
> images, some FSH[7] volatile contents such as on/under /tmp,
> virtual/pseudo filesystems such as /proc and /sys, purely redundant
> filesystems such as an bind mount, and one rather ancient no longer very
> important backup of a much older predecessor host).
> 
> Anyway, examples.  I show some basic command line usage, in this case,
> bit of "one liner" (or so) invocation to backup 3 hosts.  I've
> reformatted the input command line a bit as it would look if I did it a
> bit differently to make it more readable - otherwise it's the
> same/equivalent.  In each case, I prepared target, by first copying
> (most) recent earlier backup to the target location(s) (a moderately
> large, but fast local copy), and then using the rsync script to update
> the targets.
> 
> The script takes two arguments, a source host, and a local target
> directory.  In the examples below, that's seen on the lines:
> > do time rsync_host2dir "$tmp" \
> and the following line giving the target.
> In the examples, the shell substitutes in for "$tmp" the host arguments
> that I've given to the shell's for loop.  The leading "> " on each line
> is the shell's PS2 prompt (it's essentially prompting that it needs
> additional input to complete the command's syntax).
> The do and time are part of how I invoked it under shell - do part of
> the for loop syntax, and time, a built-in to the shell that gives us
> some timing information - most notably what it reports as "real" is
> total elapsed "real" time according to the system's clock - useful for a
> gross overall "how long did it take?".  The not-so-obvious "host" (DNS)
> names under the .balug.org. domain are for the BALUG host of interest,
> and for host vicki (the physical host which contains the BALUG and
> SF-LUG virtual hosts).
> 
> Somewhat more detailed description of the rsync_host2dir script/program
> follows these example invocations.  The actual code for rsync_host2dir
> is also shown further down in the references and also available at [6].
> 
> $ (for tmp in \
> > sf-lug.com. \
> > balug-sf-lug-v2.balug.org. \
> > balug-sf-lug-v2.console.balug.org.
> > do time rsync_host2dir "$tmp" \
> > /home/r/root/tmp/mnt/balug/2013-01-28_BALUG/"$tmp"/root
> > done)
> 
> real    1m24.543s
> user    0m7.368s
> sys     0m3.156s
> 
> real    3m52.409s
> user    0m21.633s
> sys     0m7.648s
> 
> real    37m3.104s
> user    0m27.754s
> sys     0m15.449s
> $
> 
> $ (for tmp in \
> > sf-lug.com. \
> > balug-sf-lug-v2.balug.org. \
> > balug-sf-lug-v2.console.balug.org.
> > do time rsync_host2dir "$tmp" \
> > /home/r/root/tmp/mnt/balug/2013-01-29_BALUG/"$tmp"/root
> > done)
> 
> real    1m1.408s
> user    0m5.908s
> sys     0m1.356s
> 
> real    3m27.008s
> user    0m19.945s
> sys     0m6.656s
> 
> real    1m50.894s
> user    0m7.040s
> sys     0m2.916s
> $
> 
> So, some comments about the rsync_host2dir script/program.  First of
> all, it includes some bits to, if not executed as superuser, to
> reinvoke itself as superuser via sudo.  Although in some other contexts
> I might write script/program to do similar (or go from root to some
> application ID via su if not invoked as the application ID), this is a
> bit atypical compared to my more common rsync scripts.  But in this
> case, it's (thus far) fired up manually on an ad hoc (but hopefully
> fairly regular) basis.  Were it intended to be, e.g. driven by a cron
> job, that would probably be a bit different.  Here also, in that sudo
> use, it essentially leverages (presumed) user's ssh key access via
> ssh-agent and passes that along.  Again, a bit atypical compared to
> such scripts I've more commonly done - but particularly
> handy/useful/convenient in this case - especially since those ssh keys
> are passphrase protected and generally only used via ssh-agent, and
> generally "only" by that invoking user.  Note also that it doesn't need
> the ssh keys for very long.  It does two ssh connections to host, first
> to gather mount information, and second - quite shortly thereafter - to
> do the rsync based backup.  They keys or only needed when making the
> ssh connections, not after they're already established, so, e.g., the
> key(s) can be made available only a quite short time (e.g. a minute or
> two or less) via ssh-agent, and still work quite fine, even if the
> backup takes much longer.  Scheduled production uses would typically
> have a somewhat different setup regarding key(s) and ID(s) and such.
> 
> In the program's first ssh connection to source host, it runs the mount
> command, and then parses the output of that.  It does so to determine
> filesystem(s) to back up, and also the order in which we want them
> backed up.  It looks at filesystem type, and mount point, only selecting
> filesystems that are of a desired type and also excluding mount point
> patterns we wish to skip.  They're then sorted in a priority order -
> this ordering is based upon a probable restore order in "worst case
> scenario" where we need to restore "everything".  In such cases, where
> there are separate filesystems, we will generally need things restored
> in this order of priority:
> /boot
> /
> /usr
> /var
> /home
> And then anything/everything else in sorted order (so filesystems
> containing mountpoints of other filesystems are restored are restored
> before those other filesystems).  We're also not horribly picky about
> the order of these latter filesystems, other than that caveat, so we use
> a basic sort to cover that.  Anyway, in "full" recovery/restore
> scenarios, one may often want to first recover those initial
> filesystems, and may then opt to restart the recovered OS, quite
> possibly in single user mode, and then restore the remaining filesystems
> onto that running OS.  Also, in this particular case, since we're
> writing target to filesystem(s) - essential (presumed) random access media,
> rather than sequential, the order isn't as important, but still may be
> fairly useful (and that bit of code, or quite similar, is also used in
> some other backup code I use - including code that also does backups to
> sequential media or media that's handled more-or-less as sequential in
> full recovery/restore scenarios).  The shell then shoves the list of
> filesystems desired into named parameter ("variable") backupmountpoints.
> Perl is used in parsing the output of the mount command.  The only Perl
> bit that might not be quite so obvious for those not somewhat familiar
> with Perl, is bits about quoting and shell/Perl interaction.  The Perl
> program is executed as part of shell program/script, so it's given as a
> single argument to Perl's -e option.  To do that, the whole thing is put
> within single quotes ('), to protect it from interpretation by the
> shell.  That's all fine and dandy, except then how do we effectively do
> ' within the perl program, since the shell is interpreting ' in that
> context.  We've two options:
> '\''
> q/STRING/
> The first of those, within the context of single quoted string within
> shell, gets interpreted as a single quote, and thus passed to Perl that
> way.  Or more precisely, it ends up as terminating the single quoted
> string, having a literal single quote, and then starting (resuming)
> single quoted string - which shell then parses as all part of same
> argument, leaving the literal single quote in, and discarding the
> surrounding single quotes, and passing that along as argument.  However,
> the '\'' context gets ugly to read.  It can, however, be used, as
> needed, recursively - but the parsing of such is best left to programs,
> as that does end up quite ugly.  In this case, however, we go for the
> second option.  In perl, ' is just a more common shorthand for Perl's
> more generalized q operator.  By using it explicitly, starting with q,
> we can explicitly give our "single quote character" (or implied matched
> pair) to be used by Perl on that particular invocation of quoting.
> That makes it easier for the person familiar with Perl to read, than
> seeing '\'' and having to decode the shell context first, before Perl.
> It also can be a bit easier for the person looking at shell, as start
> and end of the single quoted string is easier to find/see/parse/search,
> without a bunch of use of '\'' within.  So, we use Perl's q in this
> case.
> 
> In our rsync invocation, I use a bunch of non-default options, to
> accommodate two particular objectives.  First of all, want rather high
> integrity backups, so that adds a spattering of non-default options,
> e.g.: --archive --numeric-ids --sparse --checksum --ignore-times.
> In this case, bit of double-edged sword, but we definitely want
> --numeric-ids, as we always want those interpreted consistently,
> regardless of where that backup may move to or what /etc/passwd and
> /etc/group or the like look like on the system having those backups.  We
> also chose to do it that way, as that data will never be directly used
> (e.g. run as operating system) on backup host - at least certainly not
> without suitable adjustments or context (e.g. also along with use of the
> backed up host's user/group context information).  The other objective
> is more-or-less attempting certain optimizations for our particular
> backup scenario and usage.  E.g. we use --relative, as we may have
> multiple source mountpoints, and we want to preserve their hierarchial
> relationship under the target directory.  We use --one-file-system, as
> we've explicitly selected all the filesystem(s) we wish to backup, and
> wish to not include any others.  We give --compress-level=9, as we wish
> to optimize for bandwidth, rather than CPU, even if that might make for
> slower over-all backups (we're more likely to have CPU to spare, and may
> not have bandwidth to spare or may wish to conserve bandwidth as
> feasible).  In other scenarios we might make a very different
> CPU/bandwidth tradeoff decision (e.g. >= Gigabit uncongested "free" or
> fixed cost bandwidth with desire to minimize backup time).
> We use some --filter= options to exclude some stuff we don't want to
> backup.  We'd excluded on filesystem basis earlier, this bit is to
> exclude any bits that may be within filesystems - e.g. we don't want to
> backup the FHS volatile /tmp, nor do we want to backup easily reobtained
> OS ISO images (and related data), so we exclude where we have only
> those.
> 
> Well, hopefully that covers at least the bits that may not be so
> obvious, at least given other handy reference documentation (man pages,
> etc.).  Script/program is shown further below and also available at [6].
> 
> And yes, I did talk at least some bit about rsync at and immediately
> following the SF-LUG 2012-01-21 meeting, and have also discussed rsync
> at other meetings, e.g. [2].
> 
> If you actually find a bug, please certainly let me know.  But I'm not
> exactly looking for "feature requests" or the like - this is (almost) a
> one-off program, not (quite) designed/intended to more generally solve
> this particular type of backup scenario (but it's "general enough" I
> could use it for multiple systems, and in fact use it for at least 3
> hosts thus far).
> 
> references/excerpts:
> 1. http://linuxmafia.com/pipermail/sf-lug/2010q1/007678.html
> 2. http://www.buug.org/
> 3. http://linuxmafia.com/pipermail/sf-lug/2010q2/007732.html
> 4. http://linuxmafia.com/pipermail/sf-lug/2012q1/009159.html
>      
> http://www.wiki.balug.org/wiki/doku.php?id=system:vicki_debian_lenny_to_squeeze
>     http://www.wiki.balug.org/wiki/doku.php?do=index&idx=system
> 5. http://www.balug.org/
> 6. http://www.rawbw.com/~mp/unix/sh/examples/rsync_host2dir
> 7. http://www.pathname.com/fhs/
> 
> $ expand -t 4 < ~/bin/rsync_host2dir
> #!/bin/sh
> program=/home/m/michael/bin/rsync_host2dir
> 
> [ $# -eq 2 ] || {
>      1>&2 echo "usage: $0 host directory"
>      exit 1
> }
> [ -n "$1" ] || {
>      1>&2 echo "host cannot be null: usage: $0 host directory"
>      exit 1
> }
> [ -n "$2" ] || {
>      1>&2 echo "directory cannot be null: usage: $0 host directory"
>      exit 1
> }
> [ x$(id -u) = x0 ] || {
>      # make our directory absolute before cd /
>      directory=$(pwd -P)/"$2" || exit
>      set -- "$1" "$directory"; unset directory
>      cd / &&
>      {
>          exec sudo su - root -c "LC_ALL=C SSH_AUTH_SOCK=$SSH_AUTH_SOCK  
> $program $1 $2" ||
>          exit
>      }
> }
> 
> host="$1"
> directory="$2"
> set --
> 
> [ -d "$directory" ] || {
>      1>&2 echo "$0: directory $directory doesn't exist, aborting"
>      exit 1
> }
> 
> # ssh -atx "$host" 'hostname; id'; exit
> 
> backupmountpoints=$(
>      ssh -ax "$host" 'cd / && umask 077 && exec mount' |
>      #/home/m/michael/src/backup/bin/device__mount_point__type__options
>      perl -e '
>          $^W=1;
>          use strict;
> 
>          #data fields to gather from output of mount(8)
>          my $match_mount=q:^(.+) on (.+) type (.+) \((.*?)\)\n*$:;
>          #                  device
>          #                          mount point
>          #                                    type
>          #                                           options
>          my @mount=();
> 
>          while (<>){
>              if (/$match_mount/) {
>                  my $device=$1;
>                  my $mount_point=$2;
>                  my $type=$3;
>                  my $options=$4;
>                  #skip filesystems we are not presently interested in
>                  (
>                      #must be one of these types ...
>                      $type =~
>                          /
>                              ^   (?:
>                                      ext[234] |
>                                      reiserfs
>                                  )
>                              $
>                          /ox
>                          ||
>                      #or one of these type and ...
>                      $type =~
>                          /
>                              ^   (?:
>                                      ntfs |
>                                      vfat |
>                                      fat
>                                  )
>                              $
>                          /ox
>                          &&
>                      #mounted readonly
>                      $options =~
>                          /
>                              (?:^|,)
>                                  ro
>                              (?:,|$)
>                          /ox
>                  )   &&
>                      #and not one of these mount points
>                      $mount_point !~
>                      m!
>                          ^
>                              (?:
>                                  /+mnt |
>                                  /+media |
>                                  /+var/+local/+pub/+iso |
>                                  /+var/+local/+tower |
>                                  /+home/+r/+root/+tmp/+mnt
>                              )
>                          (?:$|/)
>                      !ox
>                  or next;
>                  #push device mount_point type options on our array,
>                  #split out the options
>                  push @mount,[$device,$mount_point,$type,[split(/,/,$options)]]
>              }
>              else {
>                  print ("else\n");
>                  print STDERR ("$0: ",(m:^(.*?)\n*$:)," failed to  
> match $match_mount\n");
>              }
>          }
> 
>          @mount=sort {
>              #handle highest priorities (if present) first:
>              #/boot, / (root), /usr, /var, /home
>              for my $pri (
>                  q:/boot:,
>                  q:/:,
>                  q:/usr:,
>                  q:/var:,
>                  q:/home:
>              ) {
>                  if(@$a[1] eq $pri && @$b[1] ne $pri) { return -1; }
>                  if(@$b[1] eq $pri && @$a[1] ne $pri) { return 1; }
>              }
>              #everything else is higher and compares normally
>              #print ("@$a[1] cmp @$b[1] ",@$a[1] cmp @$b[1],"\n");
>              @$a[1] cmp @$b[1];
>          }   @mount;
> 
>          my $mountpointsout=q::;
>          for my $line (@mount) {
>              #print join(q: :,(@{$line}[0..2]),join(q:,:,@{${$line}[3]})),"\n";
>              #print(@{$line}[1],"\n"); # just the mount points
>              if($mountpointsout ne q::){
>                  $mountpointsout .= q: :;
>                  $mountpointsout .= @{$line}[1];
>              }else{
>                  $mountpointsout = @{$line}[1];
>              };
>          }
>          print($mountpointsout);
>      '
> )
> 
> #echo "$backupmountpoints"
> 
> rsync \
>      --archive \
>      --acls \
>      --xattrs \
>      --hard-links \
>      --numeric-ids \
>      --relative \
>      --sparse \
>      --rsh='ssh -aTx -o BatchMode=yes ' \
>      --checksum \
>      --partial \
>      --one-file-system \
>      --delete-excluded \
>      --ignore-times \
>      --compress-level=9 \
>      --filter='-,/ /tmp/**' \
>      --filter='-,/ /var/local/pub/mirrored/cdimage.debian.org/**' \
>      --quiet \
>      "$host":"$backupmountpoints" "$directory"
> #   --verbose
> #   --bwlimit=KBPS
> #   --inplace
> #   --compress
> 
> 
> _______________________________________________
> sf-lug mailing list
> sf-lug at linuxmafia.com
> http://linuxmafia.com/mailman/listinfo/sf-lug
> Information about SF-LUG is at http://www.sf-lug.org/