[sf-lug] Jim Stockford (and/or others?): Do you have old list emails?

Michael Paoli Michael.Paoli at cal.berkeley.edu
Mon May 27 09:19:39 PDT 2019


Are you an archivist (or chronic hoarder of old emails?  ;-))
Would love to hear from you.

Most notably, especially for BALUG - and also to lesser extent SF-LUG,
there are some list posting that have been lost 8-O - in the case of
BALUG, many year worth (no thanks to DreamHost, and also some folks
earlier switching list service/software, and not bothering to save
the older).  In the case of SF-LUG, I believe it's mostly more like
moderate handful after some hardware issues on a couple of occasions
(most of which were restore, but I believe still some were lost).

Anyway, if you've got collection of most or all list emails, and
especially older ones, I'm quite interested in grabbing the list
emails from older email collections, so any that may be found there
that are missing can be restored to the lists/archives.

No, don't to (human) read all your emails/collections ... I can write
script/program to extract just the list emails.  But may need to
have you (or I) initially scan some emails so (and notably for some
of the different lists and list software or services used at the time)
I/we can identify unique headers of items sent to the lists.
Once that's been determined, relatively straight-forward to write
program that would extract only email messages that were sent out
by the lists (can also add collecting items you sent to list(s), as,
depending on software/settings, lists may or may not send posting also
to poster).  Anyway, that way, can just extract items sent by(/to)
relevant lists, and don't need a human to be reading other emails in
email collections.

So, Jim Stockford ... let me know how we might arrange this some time.
I believe you said you save *all* emails, and have 'em going way back.  :-)
Pile 'o hard drives?  Certainly can be well used - I've got hardware which
can read most drive (interface) types and the data upon the drives.
"Of course" mbox format is easiest, but can likely also deal with other
formats (semi-)easily enough - again, I can write bit 'o code (or find
such), suitable for reading other formats, convert that to mbox (or
similar enough), and likewise then use appropriate header match criteria,
for extracting just the list emails.

Likewise for anyone else that does or may have such email collections.  :-)

As for BALUG (I think I also posted similar to some relevant BALUG
list(s) before ... but it's been quite a while - other than slight
regular mention ("volunteering to help BALUG" ...
"archivist/history/retrieval/etc."))
I can provide more specific details on what we're missing from what
ranges of time on what lists ... there's much we have; also much we
don't.  There's also some fair bit between, where we have less than
ideal format (what we could extract from archive.org, but those have web and
email mungings that can't be undone (e.g. s/ at /@/g does not undo:
s/@/ at /g
think for example:
John at example.com, use:
a=' at '; b='@'; if [ x"$a" != x"$b" ]; then foo; else bar; fi
)

In any case, even in the case of SF-LUG, not sure of the much earlier
list stuff - notably before list being hosted on linuxmafia.org.
Anyway, if someone has those old emails may be very possible to
reintroduce them to the archive ... of the earlier list(s) were
sufficiently different (e.g. different set of lists, or quite
different naming/purpose) we might want to alternatively preserve
those in some separate available read-only format for folks (and
search engines) to be able to peruse and provide useful (and
historical) information from.

And yes, can reintroduce (or remove if necessary/warranted) items from
mailman archive.  One of my pre-"go live" tests for moving of
mailman hosting of BALUG's lists, from DreamHost to the balug VM
(hosted by yours truly), was testing that I could not only restore
archive, but also (re)inject emails to list archive, and also remove
emails from archive.  So I do also have all that info. somewhere in my
notes too (and looks like Rick's also covered that information on-list
too.  :-)).

> From: "Rick Moen" <rick at linuxmafia.com>
> Subject: Re: [sf-lug] Mobile-friendlying the SF-LUG website (was Re:  
> Status of SF-LUG etc) SF-LUG web site (mis?)information thereof
> Date: Sun, 26 May 2019 15:29:49 -0700

> Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):
>
>> Some of BALUG's older archives are also semi-missing - so that also
>> made it harder for me to check(/correct).
>
> If we (for you & BALUG values of 'we') have the older stuff in mbox or
> can-be-hammered-into-mbox format, it's actually really easy to add them
> into Pipermail's archive.  I've done so a bunch of times on my host and
> SVLUG's mail host.
>
> 1.  Use 'cat' to slam together multiple mboxes to make one big one, and
> make that become the new
> /var/lib/mailman/archives/private/balug-talk.mbox/balug-talk.mbox, which
> should be 0644 and owned by list:list .
>
> 2.  $ su -
>     # su - list
>     $ cd /var/lib/mailman/
>     $ bin/arch --wipe -q balug-talk  
> archives/private/balug-talk.mbox/balug-talk.mbox
>     $ ## wait a long time, maybe 20 minutes
>     $ exit
>     # exit
>
> That's about the limit unless some of the constituent mbox material had
> one or more unescaped body text line starting flush-left with 'From ',
> in which case /var/lib/mailman/bin/arch (the Pipermail archiver
> program) will make hapless parsing errors, which you fix by finding
> those lines, escaping each such mbox line with a prefatory '>', and
> re-running 'arch'.
>
> Don't make the common mistake of running the Pipermail 'arch' prgram as
> the _root_ user, or the geneerated archives will be obscured by a 403
> error because file permissions will be wrong.




More information about the sf-lug mailing list