[sf-lug] HTML, <PRE></PRE>, <-- -->, email & mime encoded ...

Michael Paoli Michael.Paoli at cal.berkeley.edu
Wed Oct 2 07:16:34 PDT 2019


So, taking some bits from some other thread(s),
and "moving" to here, as folks may have some more general
interest(s) regarding such ...

So, let's say one has, in an email, some example bits of
code ... maybe even including HTML,
and for simplicity sake, we'll say it's only composed of isprint(3)
ASCII characters and newline characters, and no trailing spaces, and no
tabs:
a=' !"#$%&'\''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLM'
b='NOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~'
isprint="$a$b"
<HTML>
<HEAD>
<TITLE>HTML snippet example 2019-10-02T13:28:46+00:00.721100596Z</TITLE>
</HEAD>
<BODY>
<UL>
   <LI>foo</LI>
   <LI>bar</LI>
   <LI>baz</LI>
</UL>
</BODY>
</HTML>

With various common email encodings, that might be mime quoted-printable
or base64 encoded.  So, those might then look like
this (quoted-printable):
a=3D' !"#$%&'\''()*+,-./0123456789:;<=3D>?@ABCDEFGHIJKLM'
b=3D'NOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~'
isprint=3D"$a$b"
<HTML>
<HEAD>
<TITLE>HTML snippet example 2019-10-02T13:28:46+00:00.721100596Z</TITLE>
</HEAD>
<BODY>
<UL>
   <LI>foo</LI>
   <LI>bar</LI>
   <LI>baz</LI>
</UL>
</BODY>
</HTML>
or this (base64):
YT0nICEiIyQlJidcJycoKSorLC0uLzAxMjM0NTY3ODk6Ozw9Pj9AQUJDREVGR0hJSktMTScKYj0n
Tk9QUVJTVFVWV1hZWltcXV5fYGFiY2RlZmdoaWprbG1ub3BxcnN0dXZ3eHl6e3x9ficKaXNwcmlu
dD0iJGEkYiIKPEhUTUw+CjxIRUFEPgo8VElUTEU+SFRNTCBzbmlwcGV0IGV4YW1wbGUgMjAxOS0x
MC0wMlQxMzoyODo0NiswMDowMC43MjExMDA1OTZaPC9USVRMRT4KPC9IRUFEPgo8Qk9EWT4KPFVM
PgogIDxMST5mb288L0xJPgogIDxMST5iYXI8L0xJPgogIDxMST5iYXo8L0xJPgo8L1VMPgo8L0JP
RFk+CjwvSFRNTD4K

Well, we can convert either of those back to the original.
There are various ways to do that.  I used to use a handy
mimencode utility.  But alas, that went away from the distro
I was using ... and I quite liked it and used it a lot.  Ah,
fear not ... I rather easily enough wrote my own similar utility
and with same name ... well, at least reimplementing the features
I actually used anyway.
http://www.rawbw.com/~mp/perl/mimencode

So, then we can convert back easily enough,
e.g. from our base64:
mimencode -u
or from quoted-printable:
mimencode -u -q

Great, so far, so good.
Now let's say we want to put our code example on a web page,
and preserving it as much as possible.
There's the <PRE></PRE> tags, for PREformatted,
they mostly cover that ... but not quite 100%.
Certain characters will still be interpreted within as HTML.
How to deal with that?  HTML encode them so that they represent
and are rendered as the literal intended character.

For that, I long ago wrote htmlquote:
http://www.rawbw.com/~mp/unix/sh/examples/htmlquote
And also, its complement, to undo the same:
http://www.rawbw.com/~mp/unix/sh/examples/htmlunquote

That can be handy in multiple ways in HTML.
Not only within <PRE></PRE> to show code examples,
but also to effectively comment out HTML.
HTML has comments - start with:
<!--
and close with:
-->
Well, that's all fine and dandy, except among other things, they don't
nest - so what if the HTML code one wants to comment out already has
HTML comments using <!-- and --> within?
Again, htmlquote to the rescue.  That will (also) encode
the otherwise problematic --> in the code.
One can also again reverse it later with htmlunquote.
So, say we want to comment out:
<!-- comment -->
use htmlquote and we then have:
<!-- comment -->
and we can then comment that out:
<!--
<!-- comment -->
-->
And we can nest that indefinitely by repeating, e.g. another level:
<!--
<!--
&lt;!-- comment --&gt;
-->
-->
And then also easy enough to undo with htmlunquote.

Tabs ... are another issue - mostly not covered here except this mention.
Characters that aren't ASCII isprint(3) can be problematic (there may be
other was to encode/decode those).
Trailing spaces?  Those may cause issues - especially with
copy/paste and many common clients - often trailing spaces
will be dropped - they can also be a hazard in code inspection,
as they may not be particularly obvious, and may better be
avoided as feasible.




More information about the sf-lug mailing list