[conspire] URL sanity

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sun Jul 5 03:47:26 PDT 2020


> From: "Rick Moen" <rick at linuxmafia.com>
> Subject: Re: [conspire] URL sanity
> Date: Fri, 3 Jul 2020 22:34:07 -0700

> Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):
>
>> at least delimiting by whitespace will sort-of-kind-of-mostly work,
>> with most clients.
>
> Achieves perfectly _my_ objectives.
>
> Note:  I really don't give a tinker's damn whether the recipient's MUA
> autoconverts a URL I mention in my e-mail into an HTML hotlink.  I
> send out _plaintext_, in compliance with Postel's Prescription[1].  If
> the receiving user wishes to feed a URL I send out into some software, I
> figure copy and paste still works super-fine, and the user's fingers
> probably aren't broken.
>
> [1] https://en.wikipedia.org/wiki/Robustness_principle
>
>> put your dang URLs within <>
>
> Nope, I'd very much rather not:  Whitespace delimiting achieves
> absolutely everything I consider worth achieving.

Well, nobody's forcing you to put stuff within <>.

I was curious too, looked over two clients that try to, rather aggressively
automagically link things.  They behaved similarly, but not quite
identically.

Oh, and I think I got my first G-20.In incorrectly in the text,
should've been:
for G-20.In July
And yes, I found some clients would take line having just:
g-20.in.
and link the "g-20.in" portion, and others didn't link it.
Most properly handled the stuff within <>,
exception being, I found both where the path portion ended with
.html. just before the >, it didn't include the . at the end of html
in the link.  And got that behavior regardless of using <> or not.
So, encoding the . on the end would be prudent, e.g. this ought work:
<https://www.balug.org/tmp/dot.html%2e>
And presumably most client would also do okay with this:
https://www.balug.org/tmp/dot.html%2e
and maybe even link these okay:
www.balug.org/tmp/dot.html%2e
balug.org/tmp/dot.html%2e
... though more debatable if client ought to take upon itself to
link those last two.
I also saw differing client behavior with URL ending with ")"
character.  Some simply didn't include it in the linking,
some guessed, based on an earlier non-encoded (, to include the
ending ), but if the earlier ( was encoded, didn't include the
ending unencoded ).  But they all got it correct within <>.

Also can be highly annoying dealing with clients that want to link
darn near everything, when most or all of those links are or would be
useless or invalid, or it's otherwise not desired to link them.
I can thin for example of some commercial "wiki" software that behaves
very annoyingly so, and doesn't give one the capability to turn that
behavior off ... other than clicking on every dang one of 'em and telling
it to "remove link" or the like.  Very annoying when one is pasting in
text that has hundreds of bits of text that gets automagically linked,
when absolutely none of it ought be linked.  But I digress.

So, sure, if you want to go by whitespace rather than <>, can do that ...
heck, I'm often lazy(/efficient?) and do so myself.  Especially if it's
unlikely most clients would misinterpret where the URL starts and
ends.  E.g. if it starts with http or https, and ends with / or
an alpha-numeric character, most clients will guess reasonably correctly.
But when in doubt, within <> is less likely to be screwed up by
the client's interpretation (and maybe even so for the humans :-)).




More information about the conspire mailing list