[sf-lug] sf-lug.mbox ... wget ... rsync? :-)

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sun Feb 1 00:10:13 PST 2015


[list hopping this to sf-lug list, as seems much more fitting/relevant there]

So ...
http://linuxmafia.com/pipermail/sf-lug.mbox/sf-lug.mbox
and the potential hazards of
wget --continue
(of which I'm well aware) ...
Rick, I notice you've got rsync server running on linuxmafia.com
and fair bit of publicly accessible content.  :-)
Is
http://linuxmafia.com/pipermail/sf-lug.mbox/sf-lug.mbox
Also available via such rsync access (and if so could you let me/us
know the rsync path to it), or if it isn't could you make it so
and let us know?  Thanks for your consideration on this.

references/excerpts:

> From: jim <jim at well.com>
> To: balug-talk at lists.balug.org
> Subject: Re: [BALUG-Talk] Good News, Sad News
> Date: Sun, 25 Jan 2015 14:52:35 -0800
>
>     Thank you, Rick,
>     The man page from which you quoted is the same as is on
> my system. I think it's particularly well-written info, although
>
>     Below is a summary of my attempt to run the wget
> command that Rick sent me (my attempt worked).
>
> $ wget http://linuxmafia.com/pipermail/sf-lug.mbox/sf-lug.mbox
>
>     I'm inferring from the man page info that I can, in the near
> future, run
> $ wget -c ...
> and that wget will open the remote file, move the file pointer
> to the byte position (in the remote file) of the last byte in the
> local file (on the design assumption that there has been no
> change to the contents of the remote file below that byte
> position), and then commence copying the remote file,
> appending the contents to the existing local file.
>
> On 01/25/2015 02:26 PM, Rick Moen wrote:
>> Quoting Jim Stockford (jim at well.com):
>>
>>> I'm willing to read man pages....
>> Could help.  ;->
>>
>> $ man wget
>> [...]
>>        `-c' `--continue' Continue getting a partially-downloaded file.
>>        This is useful when you want to finish up a download started  
>> by a previous
>>        instance of Wget, or by another program.  For instance:
>>
>>        wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
>>
>>        If there is a file named `ls-lR.Z' in the current directory, Wget
>>        will assume that it is the first portion of the remote file,  
>> and will ask
>>        the server to continue the retrieval from an offset equal to  
>> the length
>>        of the local file.
>>
>>        Note that you don't need to specify this option if you just want
>>        the current invocation of Wget to retry downloading a file should the
>>        connection be lost midway through.  This is the default behavior.
>>        `-c' only affects resumption of downloads started _prior_ to this
>>        invocation of Wget, and whose local files are still sitting
>>        around.
>>
>>        Without `-c', the previous example would just download the remote
>>        file to `ls-lR.Z.1', leaving the truncated `ls-lR.Z' file alone.
>>
>>        Beginning with Wget 1.7, if you use `-c' on a non-empty file, and
>>        it turns out that the server does not support continued  
>> downloading, Wget
>>        will refuse to start the download from scratch, which would  
>> effectively
>>        ruin existing contents.  If you really want the download to  
>> start from
>>        scratch, remove the file.
>>
>>        Also beginning with Wget 1.7, if you use `-c' on a file which is
>>        of equal size as the one on the server, Wget will refuse to  
>> download the
>>        file and print an explanatory message.  The same happens  
>> when the file
>>        is smaller on the server than locally (presumably because it  
>> was changed
>>        on the server since your last download attempt)--because
>>        "continuing" is not meaningful, no download occurs.
>>
>>        On the other side of the coin, while using `-c', any file that's
>>        bigger on the server than locally will be considered an incomplete
>>        download and only `(length(remote) - length(local))' bytes will be
>>        downloaded and tacked onto the end of the local file.  This behavior
>>        can be desirable in certain cases--for instance, you can use  
>> `wget -c'
>>        to download just the new portion that's been appended to a data
>>        collection or log file.
>>
>>        However, if the file is bigger on the server because it's been
>>        _changed_, as opposed to just _appended_ to, you'll end up with a
>>        garbled file.  Wget has no way of verifying that the local file is
>>        really a valid prefix of the remote file.  You need to be especially
>>        careful of this when using `-c' in conjunction with `-r', since
>>        every file will be considered as an "incomplete download" candidate.
>>
>>        Another instance where you'll get a garbled file if you try to
>>        use `-c' is if you have a lame HTTP proxy that inserts a "transfer
>>        interrupted" string into the local file.  In the future a  
>> "rollback" option
>>        may be added to deal with this case.
>>
>>        Note that `-c' only works with FTP servers and with HTTP servers
>>        that support the `Range' header.
>>
>> Note the caution about files that have been _changed_ as opposed to
>> merely appended to.  If an administrative user has, for some reason,
>> decided to purge some past mails from an mbox, your 'wget -c' fetch
>> of that file will append whatever follows the byte range you already
>> have, though the assumption that resuming from that point makes sense is
>> actually incorrect.  To be safe against that (unlikely) option, you
>> would omit '-c'.  As it happens, the mbox file isn't really very big.





More information about the sf-lug mailing list