[sf-lug] Jim Stockford (and/or others?): Do you have old list emails?

Michael Paoli Michael.Paoli at cal.berkeley.edu
Mon May 27 10:53:56 PDT 2019


> From: jim <jim at well.com>
> Subject: Re: [sf-lug] Jim Stockford (and/or others?): Do you have  
> old list emails?
> Date: Mon, 27 May 2019 12:42:25 -0400

> what time period?

For BALUG ...
Well, the short version is everything <= 2016-02-18  8-O
Various pieces of that are missing or we only have non-ideal formats.
Fair bit more detailed is below.

Listing further below covers what we do have - but much of that is
only in non-ideal format (e.g sucked from web archive but not
mbox format (not offered by DreamHost)), effectively doing
an inverse from that listing, gives what we're generally
missing (DreamHost screwed up many many times - like every ,
in the listing is yet another time they screwed up the respective
list).  So ...
these are the time periods I'm missing or
partially missing (non-ideal formats, at best):
... Well, in the list below, for each of the respective lists,
What's shown is what I have (and at least one message at start/end of
ranges, but not necessarily all for that date),
so, what is or may be missing, determined from the listing below:
anything <= the first date shown in the range (again, for each list
respectively)
Anywhere a , is shown, from and including the date immediately before the ,
through the date immediately after the ,
Where the list ends with {date}--, I have all after {date}, but where
that's immediately preceded by , I may be missing item(s) on that {date}.
Hints: ISO date format, and using -- to show range (/ can also be used
for range in ISO date format, but as / can't be in Unix/Linux filename,
and commonly I'll be indicating time ranges there ...)
$ grep . $(find * -name 'archive_date_ranges' -type f -print)
balug-admin/archive_date_ranges:2005-03-18--2013-05-24,2014-01-11--2015-01-31,2015-05-01--2015-11-30,2016-02-18--
balug-announce/archive_date_ranges:2001-06-15--2013-07-12,2013-11-18--2014-09-30,2014-11-14--2015-01-31,2015-04-20--2015-11-30--2015-12-15,2016-02-15--
balug-talk/archive_date_ranges:2001-06-15--2013-07-13,2013-11-09--2014-10-19,2014-10-22--2015-01-31,2015-04-06--2015-12-05,2016-01-23--
$
Also, for stuff before the starting dates shown on the lists above,
the older list stuff had different list names.  All I have on those
is non-ideal formats scraped from archive.org - and those miss anything
after the last bits archive.org picked up.  Likewise that's the only way.
I can provide more information on those list names if that's needed.

Also, if browsing the existing archives appears to have more than just  
what the listings above would imply, some of them are semi-messed up,  
and
especially around boundaries where DreamHost messed up.  E.g. often they're
missing messages at/around there ... sometimes they even have some message
in duplicate or triplicate (yes, they also messed up on restores and partial
restores ... but I think I may have already cleaned those out? ... or not?
I'll have to check again at some point).

As for SF-LUG ... Rick may more easily be able to provide some of those
details (or at least starting points thereof).  I don't recall in detail,
but I seem to be aware of a few gaps or semi-gaps.
There was the more recent, some moderate number of years back.
The linuxmafia.com was down quite a while.  A bunch of stuff was
done off-list to bunch 'o email addresses, and later (mostly? or
maybe entirely) reinjected, Courtesy of Bobbie Sellers (if I'm recalling
correctly) providing most or all of that to Rick for reinjection (wouldn't
be bad of us(/me) to recheck an alternative source, and see if we got all
that back into there).
I also seem to recall some (many) years earlier, there was some shorter
outage due to, I think hardware problem, ... some messages may have been
lost from that (I think at least some were - I think a restore from
backup was involved, and I believe those sent after backup and before
hardware failure didn't get restored).
There might be some other gap bits, but the only other I'm explicitly
aware of would of course be,
anything before list was hosted on linuxmafia.com (so, that'd be
<~=2005-12-26 - at least from what I can easily see of the archives
hosted there).
If Rick isn't so sure of dates of the more recent outage a few or so
years back, I can track that down easily enough.  The much earlier
briefer outage I seem to recall, I don't think I'd be able to easily
track that one down.  And I'm not aware of any others, but Rick
(and/or others) might be able to fill in or help fill in
information about any particular missing date ranges - for any
of the ranges where there were outages, etc.

I'll also see if I can clean up that meta-data some more,
more specifically of what ranges on what stuff is missing or
non-ideal format, or might be missing or partly missing.

I think I'll also (re)check for duplicates in the mbox format
archives, where some duplicates (other than one "best" copy)
should be removed.

Also, for the SF-LUG stuff, I've got RCS version controlled backups
for the last few years or so ... I can investigate any changes that
weren't strictly append - see if anything may possibly have gotten
dropped that shouldn't have ... but I rather doubt there are any
issues on that, as I don't think I have any of that (meta) data
that predates the last significant outage - and I don't think there
have been any data glitches since then - so that's likely 100%
good/okay at least over that later range of metadata I have for SF-LUG.

> On 5/27/19 12:19 PM, Michael Paoli wrote:
>> Are you an archivist (or chronic hoarder of old emails?  ;-))
>> Would love to hear from you.
>>
>> Most notably, especially for BALUG - and also to lesser extent SF-LUG,
>> there are some list posting that have been lost 8-O - in the case of
>> BALUG, many year worth (no thanks to DreamHost, and also some folks
>> earlier switching list service/software, and not bothering to save
>> the older).  In the case of SF-LUG, I believe it's mostly more like
>> moderate handful after some hardware issues on a couple of occasions
>> (most of which were restore, but I believe still some were lost).
>>
>> Anyway, if you've got collection of most or all list emails, and
>> especially older ones, I'm quite interested in grabbing the list
>> emails from older email collections, so any that may be found there
>> that are missing can be restored to the lists/archives.
>>
>> No, don't to (human) read all your emails/collections ... I can write
>> script/program to extract just the list emails.  But may need to
>> have you (or I) initially scan some emails so (and notably for some
>> of the different lists and list software or services used at the time)
>> I/we can identify unique headers of items sent to the lists.
>> Once that's been determined, relatively straight-forward to write
>> program that would extract only email messages that were sent out
>> by the lists (can also add collecting items you sent to list(s), as,
>> depending on software/settings, lists may or may not send posting also
>> to poster).  Anyway, that way, can just extract items sent by(/to)
>> relevant lists, and don't need a human to be reading other emails in
>> email collections.
>>
>> So, Jim Stockford ... let me know how we might arrange this some time.
>> I believe you said you save *all* emails, and have 'em going way back.  :-)
>> Pile 'o hard drives?  Certainly can be well used - I've got hardware which
>> can read most drive (interface) types and the data upon the drives.
>> "Of course" mbox format is easiest, but can likely also deal with other
>> formats (semi-)easily enough - again, I can write bit 'o code (or find
>> such), suitable for reading other formats, convert that to mbox (or
>> similar enough), and likewise then use appropriate header match criteria,
>> for extracting just the list emails.
>>
>> Likewise for anyone else that does or may have such email collections.  :-)
>>
>> As for BALUG (I think I also posted similar to some relevant BALUG
>> list(s) before ... but it's been quite a while - other than slight
>> regular mention ("volunteering to help BALUG" ...
>> "archivist/history/retrieval/etc."))
>> I can provide more specific details on what we're missing from what
>> ranges of time on what lists ... there's much we have; also much we
>> don't.  There's also some fair bit between, where we have less than
>> ideal format (what we could extract from archive.org, but those have web and
>> email mungings that can't be undone (e.g. s/ at /@/g does not undo:
>> s/@/ at /g
>> think for example:
>> John at example.com, use:
>> a=' at '; b='@'; if [ x"$a" != x"$b" ]; then foo; else bar; fi
>> )
>>
>> In any case, even in the case of SF-LUG, not sure of the much earlier
>> list stuff - notably before list being hosted on linuxmafia.org.
>> Anyway, if someone has those old emails may be very possible to
>> reintroduce them to the archive ... of the earlier list(s) were
>> sufficiently different (e.g. different set of lists, or quite
>> different naming/purpose) we might want to alternatively preserve
>> those in some separate available read-only format for folks (and
>> search engines) to be able to peruse and provide useful (and
>> historical) information from.
>>
>> And yes, can reintroduce (or remove if necessary/warranted) items from
>> mailman archive.  One of my pre-"go live" tests for moving of
>> mailman hosting of BALUG's lists, from DreamHost to the balug VM
>> (hosted by yours truly), was testing that I could not only restore
>> archive, but also (re)inject emails to list archive, and also remove
>> emails from archive.  So I do also have all that info. somewhere in my
>> notes too (and looks like Rick's also covered that information on-list
>> too.  :-)).
>>
>>> From: "Rick Moen" <rick at linuxmafia.com>
>>> Subject: Re: [sf-lug] Mobile-friendlying the SF-LUG website (was  
>>> Re: Status of SF-LUG etc) SF-LUG web site (mis?)information thereof
>>> Date: Sun, 26 May 2019 15:29:49 -0700
>>
>>> Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):
>>>
>>>> Some of BALUG's older archives are also semi-missing - so that also
>>>> made it harder for me to check(/correct).
>>>
>>> If we (for you & BALUG values of 'we') have the older stuff in mbox or
>>> can-be-hammered-into-mbox format, it's actually really easy to add them
>>> into Pipermail's archive.  I've done so a bunch of times on my host and
>>> SVLUG's mail host.
>>>
>>> 1.  Use 'cat' to slam together multiple mboxes to make one big one, and
>>> make that become the new
>>> /var/lib/mailman/archives/private/balug-talk.mbox/balug-talk.mbox, which
>>> should be 0644 and owned by list:list .
>>>
>>> 2.  $ su -
>>>     # su - list
>>>     $ cd /var/lib/mailman/
>>>     $ bin/arch --wipe -q balug-talk  
>>> archives/private/balug-talk.mbox/balug-talk.mbox
>>>     $ ## wait a long time, maybe 20 minutes
>>>     $ exit
>>>     # exit
>>>
>>> That's about the limit unless some of the constituent mbox material had
>>> one or more unescaped body text line starting flush-left with 'From ',
>>> in which case /var/lib/mailman/bin/arch (the Pipermail archiver
>>> program) will make hapless parsing errors, which you fix by finding
>>> those lines, escaping each such mbox line with a prefatory '>', and
>>> re-running 'arch'.
>>>
>>> Don't make the common mistake of running the Pipermail 'arch' prgram as
>>> the _root_ user, or the geneerated archives will be obscured by a 403
>>> error because file permissions will be wrong.




More information about the sf-lug mailing list