[conspire] Piping, redirection and shellscipts: 3/5/2025 7pm Eastern Standard time
Michael Paoli
michael.paoli at berkeley.edu
Mon Mar 3 09:07:32 PST 2025
On Mon, Mar 3, 2025 at 2:42 AM Ron <admin at bclug.ca> wrote:
> Steve Litt wrote on 2025-03-02 15:02:
> > Yesterday I was tasked with getting every domain name I owned into
> > my list at 444domains.com/domains , which is created by a
> > shellscript and Python program that read a Yaml file and convert it
> > to the web page. So I had to do the following:
... < much detail about going from web page(s), through programs,
comparisons, editor, data, and to web page(s) >
> > This sounds straightforward
> No, it does *not* sound straightforward!
Uhm, sounds pretty straightforward(ish) to me. ;-)
> The list of all domain info should be in a DB or at least a text file,
> for starters.
Ah, if only. In many cases the data is in non-ideal forms/sources,
and in many cases we have little to no choice about it.
> From there, do the sort(s), build the YAML, etc. Why sort a YAML file?!?
> Running `diff` and manually merging?
Typically there are better (semi-)automated ways, e.g.:
$ comm -23 <(sort -u < file1) <(sort -u < file2)
That, among multiple possible ways, will give exactly once, each
unique line present in file1 that isn't present in file2.
> All this scraping seems pointless, why is the source of this data inside
> web pages?
Often the format/source isn't a choice, but what one needs deal with.
If it only need be done once, to perhaps a few times, well, then perhaps one
just does it rather to quite manually - of course leveraging standard
utilities and commands and such as feasible.
But if there's need to do it on a regular repeated basis - automate it
- or at least to the extent that's feasible and makes sense.
E.g. I was migrating off of AT&T (landline), to another (mobile) carrier.
I wanted to download all my voicemail I had on their service.
It had exactly two interfaces - Plain Old Telephone Service (POTS)
with just DTMF/audio, or ...
web - all digital. And no, not some database I could merely query or the like.
So, I wrote a program that well did the needed.
Simply invoke it and it would handle all the web and related bits,
to cleanly download the relevant. Trivial? No. But beats the hell out of
doing it all manually - and especially for doing it multiple times:
http://linuxmafia.com/pipermail/conspire/2024-November/012879.html
And sure, better to have data in much more usable formats ... but
that's not always an option.
More information about the conspire
mailing list