[sf-lug] List postings per username in given timerange + useful pipe/script

aaronco36 aaronco36 at SDF.ORG
Mon Nov 15 07:45:00 PST 2021


Quoting top of Michael P's 'sf-lug: List: stats, etc.' at [1]:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The roster (list of subscribers), number of subscribers, by date:
$ sf-lug_roster_stats
YYYY-MM-DD
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Might also be interesting for admittedly just a few readers to see the 
number of mailing-list postings per contributor in a given timerange; the 
number of each person's postings during, for example, the current Fourth 
quarter 2021.

Roughly eyeballing it by sorting Fourth quarter 2021 Archives by author[2] 
and further manually sorting by ascending posting frequency, the results 
are approximately the following to date...

User 1stname        Number of postings
================    =============================
John                2
Al                  3
Ronald              4
aaronco36 (self)    4 (including this posting)
Michael             13
Bobbie              15
Rick                16

Hmmm.... seems that am currently at the Median for comparative number of 
postings in given timerange.

Notwithstanding an effective bash pipeline, bash script, Michael's 
preferred "sh will do fine, thankyouverymuch" of [3] ;-D, perl script, 
python script, or whatever else..., would the following 
prototyping/pseudocody rough draft be a first approximation for automating 
this?

[start script]
..Download latest G'zip'd Text file, e.g. at [4], into 
localhost's|otherhost's /<downloadsubfolder>
..Gunzip /<downloadsubfolder>/2021q4.txt.gz
..Loop through each email message's 'From:' and 'Date:' fields in the 
/<downloadsubfolder>/2021q4.txt textfile ....
....Check If contents of each 'From:' for each 'Date:'-timestamp are 
validated (by list-admins?) as genuine vs spammy-seeming
...Check If-other validation tests on same or on other fields (e.g., 
blank contents, much-too-long contents, attachments...) ?
...Backup/Copy the /<downloadsubfolder>/2021q4.txt textfile to 
/<downloadsubfolder>2021q4<newname> and then Remove full contents of 
certain posts within that /<downloadsubfolder>2021q4<newname> textfile, 
e.g., obvious duplicates, forged headers,...etc.
..Loop through each email message's 'From:' field in the now-validated 
/<downloadsubfolder>2021q4<newname> textfile
....Assign tallying counter variables for each post for each particular 
previously-validated 'From:'username in 
/<downloadsubfolder>2021q4<newname>
..Display the unique string contents of each 'From:'username' as well as 
the total _number_ of their posts for the given time period of the G'zip'd 
Text file previously downloaded above
[end script]


-A

=========================================
REFERENCES/EXCERPTS
=========================================
[1]http://linuxmafia.com/pipermail/sf-lug/2021q4/015421.html
[2]http://linuxmafia.com/pipermail/sf-lug/2021q4/author.html
[3]http://linuxmafia.com/pipermail/sf-lug/2021q4/015450.html
[4]http://linuxmafia.com/pipermail/sf-lug/2021q4.txt.gz
=========================================

aaronco36 at sdf.org
--



More information about the sf-lug mailing list