[conspire] regex fun, etc. (was: The plural of regex is regrets)

Michael Paoli Michael.Paoli at cal.berkeley.edu
Thu Oct 29 20:16:21 PDT 2020


Regex ... yummy.  ;-)

Some semi-random bits.
Ran across this not too long ago:
https://regexcrossword.com/

And, useful tip I learned long ago from Oakland Perl Mongers
(group has long gone inactive, but the list persists:
http://oakland.pm.org/
)
The /x modifier in Perl REs (some other languages may have also
adopted that part of Perl REs).  From perlre(1) we have:
"/x" and  "/xx"
A single "/x" tells the regular expression parser to ignore most
whitespace that is neither backslashed nor within a bracketed character
class.  You can use this to break up your regular expression into more
readable parts.  Also, the "#" character is treated as a metacharacter
introducing a comment that runs up to the pattern's closing delimiter,
or to the end of the current line if the pattern extends onto the next
line.  Hence, this is very much like an ordinary Perl code comment.
...

So, that allows one to make pretty readable REs, e.g.:
# match to IPv4 dotted quad address?
/^
     (
         (
             \d\d?|                      # a digit or two
             [01]\d\d|2[0-4]\d|25[0-5]   # or three (in range)
         )
         \.                              # dot
     ){3}                                # thrice that
     (
         \d\d?|                          # a digit or two
         [01]\d\d|2[0-4]\d|25[0-5]       # or three (in range)
     )
$/x

Compare that to, e.g.:
/^((\d\d?|[01]\d\d|2[0-4]\d|25[0-5])\.){3}(\d\d?|[01]\d\d|2[0-4]\d|25[0-5])$/

And "of course" - and not just applicable to regular expressions ...
o Don't reinvent the wheel.
o and especially don't reinvent the wheel ... poorly.
E.g. perl, one will generally find modules that have good solid REs for
handing the more complex yet commonly desired RE matches, e.g.
email addresses, IPv6 addresses, IPv4 addresses, IP addresses, etc.

Oh, and too, I've done presentations/sessions on regular expressions.
Some may also find those materials useful.
Location may shuffle about, but, e.g. can presently find that ...
here:
http://www.mpaoli.net/~michael/public_html.rawbw/unix/regular_expressions/Regular_Expressions_by_Michael_Paoli.odp

> From: "Deirdre Saoirse Moen" <deirdre at deirdre.net>
> Subject: [conspire] The plural of regex is regrets
> Date: Thu, 29 Oct 2020 16:08:09 -0700

> I hadn’t heard this one before, but it sounds like it’s a reference  
> to an old joke.
>
> https://www.reddit.com/r/ProgrammerHumor/comments/jk7bv2/regret/  
> <https://www.reddit.com/r/ProgrammerHumor/comments/jk7bv2/regret/>
>
> When someone pasted the RFC 822 compliant (email address) regex, the  
> followup comment, "You really need to work on your ASCII art my  
> dude” got a good chortle out of me. Also possibly handled by the  
> regex (I haven’t checked, frankly), but 💩.la is an actual valid  
> domain (that’s a pile of poo emoji if your email client is doing bad  
> things to it), so it should handle domains like that.
>
> Anyhow, any long regex thread that involves programming always,  
> always, always mentions my favorite Stack Overflow answer ever, one  
> so golden they locked it and put a moderator note up not to report  
> it as it was actually rendering correctly.
>
> https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags  
> <https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags>




More information about the conspire mailing list