[sf-lug] bash <(...) >(...): Re: dash ...: Re: SF-LUG meeting notes for Monday 15 April 2019

Tue Apr 16 16:50:31 PDT 2019

> From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
> Subject: [sf-lug] dash ...: Re: SF-LUG meeting notes for Monday 15 April	2019
> Date: Tue, 16 Apr 2019 10:37:50 -0700

> way).  For interactive stuff though, at least on Linux, I'm typically
> using bash, and for scripting, there are a *very* few bashisms I find
> sufficiently useful sometimes I'll use bash instead, ... but otherwise
> mostly code for POSIX/sh/dash.  And what few bashisms?  Besides interactive
> stuff, redirection to/from in way that works like file (functions as
> file argument), but is command, e.g.: <(some_command) >(some_other_command)
> ... is quite handy for stuff that requires some data from a "file" argument,
> reads - or writes it - but doesn't seek on it, and can't do that
> data from stdin or stdout.  So, that's probably about the only bashism I
> wouldn't mind seeing POSIX pick up.  The rest of the bash stuff is
> bells 'n whistles and stuff handy for interactive but really unneeded
> for scripts - at least in my opinion.  :-)
> There are of course ways to do that type of redirection without that bash
> mechanism, but that then involves creating some (temporary) named pipes,
> and removing them after ... which is more code ... but bash can do all
> that for one with fairly straight-forward syntax.

Yes, something useful in (and relatively unique to?) bash, and not in
dash, POSIX (at least yet?), etc.

The redirection I noted above ... found myself using it yet again today,
e.g., let's say I've got two files, file1, and file2,
they're moderately large files, each file one short(ish) line each (say
<=20 isgraph ASCII characters each), there may be duplicates within each
of the files, and they're not sorted.
I want to know what lines are unique to file1 (not in file2), and likewise
those in file2 but not file1.  Now, comm(1) comes in very handy for that,
but it wants its inputs already sorted.  Well, what if I don't want to
change those original files, and would rather not have to myself create
temporary sorted versions, then also handle tossing them away once I'm
done with them.
Well, with bash, we can do something like this:
$ comm -3 <(<file1 sort -u) <(<file2 sort -u)
And in our output, the first (non-offset) column will be lines in
file1 (but not file2), and the 2nd column (offset) will be lines in
file2, but not file1 - and lines won't be reported multiple times, and
they'll come out in sorted (by the line contents) order.
The bash shell implements that syntax using named pipes ... so if the
utility (e.g. diff) references the "names", you'll see those named
pipe names, rather than a "file" (though a named pipe is a type of
file - but not ordinary file, "special" - a FIFO in this case, but
no special privileges needed to create a named pipe).  Basically
FIFO (First-In-First-Out) - also pipe - something reads it, something
writes it ... like | in shell, except a named pipe, it has a pathname
in the filesystem, so one can read and write that pathname.
It's an error to write the pipe before something has it open for
reading, and one can't seek on a pipe - and generally only want one thing
reading it and one thing writing it, but other than that, rather like
an ordinary file, it can be read, it can be written - but unlike
ordinary file, it takes no disk space for the file's data - its data is
only a buffer in (virtual) memory.  So, more generally, pipes and named
pipes can be much more efficient than otherwise unnecessarily using,
e.g. temporary files - as one avoids all that drive I/O (and space
usage - in most/many cases), etc.