[sf-lug] regex: how to match any one to four character word in a file

Clyde Jones slash5toaster at gmail.com
Wed Dec 10 09:34:36 PST 2008


On Wed, Dec 10, 2008 at 08:37, jim <jim at well.com> wrote:
>
> i've given up on the online tutorials.
>
> i have a text file with over 100000 words (and lines,
> one word per line). i wanna grep out all words that
> are from one to four characters, e.g. 'a' or 'and'
> or "fact" but not "apple" or "zounds".
>
> $  grep '[.]{4}' words.txt
> got me a newline.
>
> i've got lots of other variations in my .bash_history
> if anyone wants a good laugh.

This works for me - this collects all the words from a file that are
1-4 characters long, change the egrep to egrep -v  to eliminate all
the short words

cat <file>|  tr -s '[:punct:][:blank:]' '\012' | egrep ^[[:alnum:]]\{0,4}$



-- 
We are what we think. All that we are arises with our thoughts. With
our thoughts, we make the world.
-Buddha
Franklin P. Jones  - "All women should know how to take care of
children. Most of them will have a husband some day."




More information about the sf-lug mailing list