[conspire] uniq

Michael Paoli Michael.Paoli at cal.berkeley.edu
Thu Jul 30 00:09:02 PDT 2020


> From: "Ruben Safir" <ruben at mrbrklyn.com>
> Subject: Re: [conspire] desktop and my laptop can't ping one to another
> Date: Wed, 29 Jul 2020 12:50:05 -0400

> KISS
> KISS
> KISS
> KISS
> KISS
> KISS
> KISS
> KISS
> KISS
> KISS
> KISS

This would be a handy use of uniq(1).

E.g.:
$ yes KISS | head -n 11 | uniq
KISS
$

By default, uniq squashes consecutive identical lines to a single
such line.  Often folks think of:
... sort | uniq
That's often a quite customary usage too - squashing all duplicate
lines to a single such line.
uniq(1) itself does not know or care about ordering,
it only considers at most two lines at a time -
is the current line identical to the immediately
preceding line (if any).  And based upon that, output the
line, or not.  Behavior can also be modified by options,
E.g. so it will also count, indicating how many adjacent
consecutive times it saw that identical line,
or giving only duplicates, or only unique lines ... again
unique only considering those that are adjacent.

Also handy for, e.g. text.  Have some big text input, just a bunch of
written out paragraphs ... but the spacing between paragraphs varies
... anywhere from one, to three or more empty lines between paragraphs.
Want those all redone as a single blank line between paragraphs?
uniq(1).  As long as you don't have issue of potentially
other lines within paragraphs that are duplicate sequential
lines that are not supposed to be squashed to single line.  But even in
such case, one could fairly easily first check for that.
E.g.:
$ < text_file uniq -d | sort -u
If that gives nothing but empty/blank line(s), then that's all one would
be squashing to single line each with:
$ < text_file uniq
And, what if the whitespace on the empty/blank lines varies?
Could first make those all the same, e.g. with sed ... heck, even
if line endings might vary - some with a trailing CR,
e.g.:
$ < text_file sed -e 's/^[      ]*^M*$//'
Where, in the above, between those square brackets, we put a single space
character and a single tab character, and the ^M at the end would be
a literal carriage return character.  That would then turn all
such blank lines into empty lines, so, where consecutive,
uniq would squash them to a single blank line - as they'd then
identically match.




More information about the conspire mailing list