[sf-lug] zsync, rsync, jigdo, bittorrent, metalink, ..., oh my!

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sat Apr 23 10:04:07 PDT 2016

Yes, zsync works quite well in certain circumstances.  It works
relatively similar to rsync, and they'll typically work similarly well
in fairly similar circumstances - most notably the nature of the
differences in the data to be updated.  Essentially they look for
chunks of the file that haven't changed, reuse that, at least if
pointed at such an existing generally earlier version to be updated,
and essentially grab the needed missing chunks and assemble the
finished file.  One advantage zsync has, is convenient use with HTTP
protocols over http/https URLs - their standard default ports are less
likely to be blocked, e.g. by firewalls, thus generally more
accessible.  Also for many serving hosts, it's also one less additional
service/protocol to run, as many of these hosts already offer HTTP.
The zsync files themselves can also be examined and interpreted,
whereas with rsync, those details are handled by the protocol itself.
However, rsync is quite a bit more versatile, supporting many options
and capabilities beyond zsync's more limited functionality.  Also,
rsync can be more easily integrated with tunneling over ssh (or rsh or
similar programs), and rsync can also be operated as its own
stand-alone server.

Essentially where zsync and rsync work quite well, is updating a large
image file, where a significant percentage of the source data already
exists in the target, e.g. slightly older edition of very similar
content.  In some circumstances, however, they're not as efficient as
some other means of updating a large image file.

Use of jigdo can be great, and even more efficient, where the target
file image is mostly composed of many smaller individual files and one
already has some or many of those files.  E.g. that works quite well
with the standard (non-"live") versions of Debian CDs/DVDs/BDs, and
similarly for *buntu flavors/spins with the "alternate" installer
versions (not the "live" CD/DVD versions).  Those non-live image
versions mostly consist of a quite large number of .deb files, whereas
"live" versions almost always consist mostly of one very large
compressed file.  Hence jigdo isn't feasible for "live" images, whereas
zsync and rsync work well for "live" images.  On the other hand, if one
has a bunch of the constituent files (e.g. earlier ISO image of quite
similar, or one has been downloading and installing those security and
critical bug fix updates and has a bunch of the .deb files in
/var/cache/apt/archives), it can be much more efficient to use jigdo.

And bittorrent?  It's (mostly) peer-to-peer, so it can be *much*
lighter on the upstream servers (mostly just needs a "tracker" upstream
- much less data to deal with than hosing all the data to be
downloaded).  So, bittorrent can also be highly efficient in that
regard, especially if bandwidth of clients wanting to download far
outstrips available bandwidth of the main upstream hosting server(s) -
bittorrent takes great advantage of being able to pull pieces from many
peers, so often the downloading client is mostly only limited by its
aggregate download bandwidth, rather that of some congested upstream
server(s).  However, bittorrent isn't nearly so flexible in working
with an existing older copy or collection of files to assemble the
target - essentially it can't do that, though it can resume from a
partially constructed target.  With bittorrent, it expects the correct
chunks of data to be logically in the correct space in the target file
- where that's not the case, it downloads the needed chunks.  Beyond
that it can't analyze similar or constituent pieces available on target
host to start assembling the target file.

And then there's also metalink - it supports multiple protocols (HTTP,
FTP, Bittorrent also for many clients), multiple/alternative sources,
many client implementations can download from multiple sources
simultaneously, it supports collections of files, and for at least some
browsers, there exist plugins for metalink which can make using
metalink essentially a one-click operation.  Would appear, however,
metalink isn't used (at least in Linux contexts?) as much as many of the
other download protocols.  Perhaps in part, due to the complexity of its
format?  (Apparently rather easy to use, but comparatively complex to
generate the metalink specification file).

So ... I used to mostly use jigdo ... still do use it a lot - probably
more than zsync.  But once I'd encountered "live" images and wanted to
update those without need to download the entire new image, I quickly
learned of zsync (and I already knew of rsync, and also sometimes use
it to likewise update an ISO image).

And yes, on 2016-04-21 (Ubuntu release day), I did watch for when
Canonical actually made the downloads available:
(Lynx is just a wrapper script I have for lynx):
$ { while :; do Lynx -dump http://www.ubuntu.com/download/desktop |
> fgrep 16.04 >> /dev/null 2>&1 && break; sleep 300; ; done; ...; } &
... that loop exited around 9:15 A.M. PDT.
I then updated my images using zsync (I probably could've better used
jigdo for the Ubuntu-Server image (which is more of .deb files, and less
of compressed filesystem image file), so I saved bit of my time using
zsync for all 5 images I had and wanted to update ... at modest expense
of (computer, but not my) time/bandwidth on one of the five where jigdo
would've generally been more (computer/bandwidth) optimal).  As I'd done
similarly a few times or so over the last week, including the day before
release, with the daily builds ... it turns out when I did zsync on
release day - the bits were identical to the last daily builds before
release - only the names of the files changed (and the HTTP
Last-Modified header times - which I also use to set the mtimes on the
assembled/downloaded files).  So, also, by coincidence, since there
were no changes to the data in the target file, the (non-)"update" of
the Ubuntu-Server probably happened to be more efficient than using
jigdo ... where normally jigdo would've been more efficient for that
particular image.

Oh, and if I do say so myself ;-), the jigdo article on Wikipedia
is pretty decent (I once found it rather lacking and did a substantial
overhaul on it; yes, you can also improve documentation - among other
formats and sources, most distributions also have wiki(s) with some, or
even lots of documentation - often essentially anyone can work to
update/improve such documentation, and one can also improve stuff on
wikipedia.org, contribute on forums/lists, ...).


> From: "Ken Shaffer" <kenshaffer80 at gmail.com>
> Subject: [sf-lug] Tried zsync for an Ubuntu ISO update and liked it
> Date: Wed, 20 Apr 2016 11:53:25 -0700

> Ubuntu 16.04 is days from final release, and I thought I'd update my
> month-old
> 16.04 beta2 ISO.  I'd never tried the ISO-update capability offered by the
> zsync
> utility, so I gave it a try.  After a false start because I didn't give the
> (changed) old ISO name, the download went to 75% when the network dropped.
> Restarting used the partially downloaded information as a starting point,
> and
> the zsync download finished, and confirmed the checksum.  Only 360M needed
> to be downloaded, since the other 1.1G was unchanged from the beta2.  I'm
> happy with zsync and I'll be using it for my future ISO updates.

More information about the sf-lug mailing list