[sf-lug] version control, rsync, list archive (improved: sf-lug.mbox ... rsync ... now gently backed up overnightly ...)

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sun Mar 22 23:00:52 PDT 2015

Yes, version control for such has advantages and disadvantages.

In not necessarily any particular order ;-) ...

Some of the disadvantages:
o (some more) complexity/layer(s), how many folks do/don't know (the)
   version control, etc. and can/can't reasonably well navigate it with
   documentation (man pages) sufficiently well to at least not make a
   nasty mess of it should some (unusual) need come up
o sometimes inefficient: depending on the version control and number of
   changes, sometimes extracting version(s) from much earlier, or more
   recent, or somewhere in the middle, or whatever, can be rather
   inefficient (CPU/IO intensive), sometimes that likewise applies for
   check-ins/commits or may in some circumstances.  Sometimes for some
   types of version control, certain types of data changes can be very
   inefficient on checkin/commit (but that last bit probably not issue
   for append mostly list archive or similar).  May be inefficient to
   remove earlier "middle" versions.
o space - depending upon version control flavor and other factors, may
   not be the most space-efficient storage means (e.g. filesystem(s) with
   deduplication/compression may sometimes be much more efficient with
   space while also being more convenient)
o can be difficult in some scenarios (e.g. legal reasons) to easily or
   feasibly purge one bit of code/data that's present throughout many
o There are probably other disadvantages too - but those are at least
   some that quickly pop to mind.

Some of the advantages :-) ...:
o relatively space efficient - especially if one wants/needs to keep
   "all" the earlier versions.
o *May* - at least for those that are or become reasonably familiar with
   (the) version control, offer more intuitive overview of what versions
   exist from when, and at least when they changed, and their differences
   (as opposed, to, e.g., someone's one-off implementation of a
   collection of various older versions of files), and likewise offer a
   more intuitive interface for, in general, dealing with older versions
   (viewing, inspecting, retrieving, comparing, managing, etc.
o *lots* of other advantages of version control - especially more
   generally, I'm not going to even attempt to list all or even many of
   them here.

And ... a bit more specifically for list mbox file archive & version
control (and the rsync bit):
o Avoids "gottcha" risks of wget --continue blindly presuming the
   earlier retrieved data hasn't changed (generally always true in this
   case, but not guaranteed to be true).
o Avoids "gottcha" risks of blindly trusting full content via rsync - in
   case any unexpected change should ever come in - with version control
   it can be rolled back or reviewed or whatever.
o If, e.g., we ever find some very old "lost" (missing from archive)
   messages (I think some many years back, there was some hardware
   failure and subsequent restore from backup, and some small handful of
   messages went missing from archive? ... but I may not be correct in my
   (fuzzy) recollection of that), such could be potentially reinjected
   upstream (presuming they're "found"), to "fix" the archive - and we'd
   have version control history of that, etc. (at least to granularity of
   about daily, anyway).  Likewise if we ever found something that wasn't
   properly inserted earlier (duplicate, wrong sequence, missing,
   whatever), similar could potentially likely be corrected, and again,
   we'd have version control on that (e.g. the off-list "list" while
   linuxmafia.com was down ... I thought of but didn't raise the question
   - did we ever get consent - or ask, if anyone objected to any of the
   "off-list" list emails they'd sent, becoming "permanently" and
   publicly archived in the sf-lug list archives?  Well, anyway, if
   someone *did* object, and those bits were taken out ... could do that,
   and version control, etc. ... but that could also have disadvantage
   too, with version control - at least if they also wanted it out of the
   version control and/or that not publicly available (as I've set it
   up, the version control file is publicly available).

So, ... anyway, on balance ... I thought the rsync bit method both quite
efficient for updating "backup" copy of file remotely (and especially
for append-mostly file), and offered higher integrity than
wget --continue approach.  And version control, quite "easy enough" to
implement, giving the advantages of saving "all" (well, daily or so,
anyway) the versions, while doing it efficiently on space.  It also
simultaneously, in so doing, avoided me mostly having to even think
about how to prune out older versions of file and what kind of
algorithm to do that (mostly on account of space concerns) - that
became effectively moot with version control, relieving me of the
burden of having to even consider that (though logrotate and friends
are quite capable of handling such ... but a more complex algorithm,
like keeping some of the older files, but "thinning" the various
versions - e.g. like over a year old, keep most recent from each month,
but toss the rest - that's bit more complex to do ... but with version
control, no worries, as I've no need/desire to thin out the older
versions, and space is not an issue of concern for this type of file
stored mostly using version control).  Oh, and RCS ... sure, not the
most capable and spiffiest and most trendy among version control ...
but for simple case of dealing with versions of effectively one file,
pretty dead simple, straight-forward, easy and simple to use and
implement, and a relatively high percentage of sysadmins are already
rather to quite familiar with it (and don't have to learn the latest
whiz-bang version control ... which also may be pase' in 5 or 10 years,
replaced by something even shinier to yet (re)learn - whereas RCS will
still be chuggin' along as reliably as ever, and consistent with - or
at least highly backwards compatible with, the interface it already
presently has ... whereas such may not (quite) be the case for the new
kid (version control) on the block ... at least not quite yet, anyway).

And ... version control in general :-) yes, for the most part excellent
and highly useful in the realms of systems administration and
programming.  Often very useful/valuable to be able to know what
changed, when, how it changed, and have the ability to retrieve older
versions or compare against such.  And, often as or even more
important, is often knowing *why*!  :-)  That's where appropriate
("hand") system log(s) / log book, or the like can come in very
useful/handy - and also comments (message / reason for change)
associated with various changes in version control (and similarly often
in code/configurations - the *what* is often relatively self-evident -
perhaps with a bit of digging/testing/research - but the *why* is often
not readily, nor even at all, apparent - so having that information can
often be rather to highly important (lest folks try otherwise, and
learn or rediscover the hard way why *not* ...)).

> From: "Rick Moen" <rick at linuxmafia.com>
> Subject: Re: [sf-lug] improved: sf-lug.mbox ... rsync ... now gently  
> backed up	overnightly ...
> Date: Sun, 22 Mar 2015 21:37:03 -0700

> Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):
>> This also means, as least as far back as I (/we?) have it, older
>> versions of the sf-lug.mbox files are in RCS....
> I applaud the thoroughness.  FWIW, since the mbox is additive
> (other than on vanishingly rare occasions, that have never happened with
> sf-lug and maybe once in a decade on some other B.A. technical mailing
> lists, when the listadmins decide a posting is so problematic on, e.g.,
> legal ground that it needs to be removed), strikes me as a lot of work
> to protect against farfetched use cases -- like, dunno, something
> suddenly clobbering the cumulative mbox and nobody notices before all
> backups have been expired out).
> But it's your work, so you get to decide what problem's worth fixing --
> and certainly having things in version control is an impeccable
> administrative practice generally.

More information about the sf-lug mailing list