[sf-lug] regex: how to match any one to four character word in a file

Charles-Henri Gros chgros at coverity.com
Wed Dec 10 13:25:33 PST 2008


Jeff Bragg wrote:
> That won't work (I just verified for myself that it doesn't by trying it).
> Those are line anchors, not word anchors.  It will only match lines that
> have no more than 4 characters on them (including newlines, carriage
> returns, etc).
>
> Something like '\w\{1,4\}' should work, though in practice it doesn't seem
> to honor the maximum match condition (4 in this case).
newlines / carriage returns are not matched by '.'

Also, your test is not much better, since it doesn't use any anchors at 
all, so it will do a partial match (hence the "not honoring the maximum 
match condition")

Word boundary is \b (\< for left only, \> for right only

So you can use:
'\b\w\{1,4\}\b'
or
'\<\w\{1,4\}\>'

but in either case, it will print the whole line that contained the 
matching word. Use grep -o to only print the matching text.

In any case, it said one word per line, so line anchors should work.

-- 
Charles-Henri





More information about the sf-lug mailing list