[sf-lug] regex: how to match any one to four character word in a file

Jeff Bragg jackofnotrades at gmail.com
Wed Dec 10 12:21:22 PST 2008


That won't work (I just verified for myself that it doesn't by trying it).
Those are line anchors, not word anchors.  It will only match lines that
have no more than 4 characters on them (including newlines, carriage
returns, etc).

Something like '\w\{1,4\}' should work, though in practice it doesn't seem
to honor the maximum match condition (4 in this case).  But you can pipe it
to another grep to filter it out.  grep '.\{1,4\}' myfile.txt | grep -v
'.\{5,\}' seems to work.

On Wed, Dec 10, 2008 at 10:12 AM, jim <jim at well.com> wrote:

>
> from charles-henri (using an email address not
> registered on the sf-lug mailing list):
>
> jim wrote:
>        > i've given up on the online tutorials.
>        >
>        > i have a text file with over 100000 words (and lines,
>        > one word per line). i wanna grep out all words that
>        > are from one to four characters, e.g. 'a' or 'and'
>        > or "fact" but not "apple" or "zounds".
>        >
>        > $  grep '[.]{4}' words.txt
>        > got me a newline.
>        >
>
>
> [.] will match a literal '.'
> '{' needs to be escaped. \{$4\} will match exactly 4
> Also, you need to anchor your regex (with ^ and $)
>
> So:
> grep '^.\{1,4\}$'
>
>
> --
> Charles-Henri
>
>
>
>
> _______________________________________________
> sf-lug mailing list
> sf-lug at linuxmafia.com
> http://linuxmafia.com/mailman/listinfo/sf-lug
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://linuxmafia.com/pipermail/sf-lug/attachments/20081210/a06e3c6c/attachment.html>


More information about the sf-lug mailing list