[conspire] I renounce the devil Unicode and all of its works

Edward Cherlin echerlin at gmail.com
Thu May 19 23:20:56 PDT 2011


On Thu, May 19, 2011 at 03:00, Rick Moen <rick at linuxmafia.com> wrote:
> (Tongue lodged in teeth?  Not telling.)
>
>
> To: Bhikkhu Pesala <pesala at aimwell.org>
> Cc: "Eric S. Raymond" <esr at thyrsus.com>
> Subject: Re: Wrongly encoded Web Page
> In-Reply-To: <op.vvpriabrq5zcno at aimwell-org.cable.virginmedia.net>
> Organization: If you lived here, you'd be $HOME already.
> X-Mas: Bah humbug.
>
> Quoting Bhikkhu Pesala (pesala at aimwell.org):
>
>> Your web page at
>> http://catb.org/~esr/faqs/smart-questions.html#writewell is wrongly
>> encoded so smart quotes display as code, e.g.
>>
>> â?~B??~SGood question!â?~B?�
>
> Awfully good point.  Thank you, Bhikkhu.
>
> Eric:
>
> I personally am a Norwegian-American reactionary about charsets

I have never seen such an abomination as

Don't confuse
<span class="quote">“<span class="quote">its</span>”</span> with
<span class="quote">“<span class="quote">it's</span>”</span>,
<span class="quote">“<span class="quote">loose</span>”</span> with

<span class="quote">“<span class="quote">lose</span>”</span>, or
<span class="quote">“<span class="quote">discrete</span>”</span>
with
<span class="quote">“<span class="quote">discreet</span>”</span>.

Don't confuse “its” with “it's”, “loose” with “lose”,
or “discrete” with “discreet”.

on a supposedly UTF-8 HTML page. To think that this follows directly
after "Spell, punctuate, and capitalize correctly." What was this
dreck edited in?

^_^

Check out the HTML of

http://www.i18nguy.com/unicode-example.html

to see it done right. For example,

<TR>
<TD class="english">Denmark</TD>
<TD class="english">Soren Hauch-Fausboll</TD>
<TD class="native" lang="da">Danmark</TD>
<TD class="native" lang="da">Søren Hauch-Fausbøll</TD>
</TR>

<TR>
<TD class="english" title="Egypt = ar-EG">Egypt<br>(Masr)</TD>
<TD class="english">Abdel Halim Hafez<br>(singer)</TD>
<TD class="rtlnative" lang="ar-EG">مصر</TD>
<TD class="rtlnative" lang="ar-EG">عبدالحليم حافظ</TD>
</TR>


> when
> writing in the English language, preferring literal ISO 8859-1
> (Latin-1) or (more precisely) its close cousin ISO 8859-15 (informally,
> Latin-9 AKA Western European), which replaces eight symbols with
> more-useful ones, notably the Euro symbol.

To correctly write English requires something on the order of 500
characters, not including math symbols. This is readily seen in
typesetter fonts for the publishing industry.

> UTF-8 is the Wave of the
> Future and Gateway to Unicode<tm>, which IMVAO is reason enough to
> loathe it and all its overengineered baggage.  But you should of course
> make up your own mind.

It is quite proper to loathe certain sections of Unicode, particularly
the11,000+ precomposed Korean Hangeul syllables. Almost everything
else loathsome in Unicode is there because it was in a pre-existing
national or industry standard or pseudo-standard.

> (I won't hate you even if you start stumping for Java.  ;->  )

APL? I once won a Stan Kelly-Bootle programming contest with the expression

⍎X←'⍎X'

read

execute X gets quote execute X quote

The contest was to produce the greatest ratio of error text to source
text. In I-APL, which ran in 64K, this line of seven characters
produced the message WS FULL followed by about 1500 stack frames. On
the largest Sun workstation available at the time it would have
produced WS FULL followed by many millions of stack frames. There is
an actual point to this expression, which is a powerful test of some
of the internals of an APL implementation.

One of my proudest achievements is my personal entry in Stan's
Computer Contradictionary.

> So, me, I'd fix the problem by using the vertical ", which
> Latin-1/Latin-9 carry over from primordial US-ASCII (still in charset
> position 34 decimal), or by using its HTML-entity equivalent " .

U+0022, UTF-8 0x22

What's the problem?

> By contrast, if the curly things are deemed obligatory, in contrast to
> ASCII-like vertical double-quotation marks, then Latin-1/Latin-9 cannot
> suffice, whereas UTF-8 does.
>
>
> _______________________________________________
> conspire mailing list
> conspire at linuxmafia.com
> http://linuxmafia.com/mailman/listinfo/conspire
>



-- 
Edward Mokurai (默雷/धर्ममेघशब्दगर्ज/دھرممیگھشبدگر ج) Cherlin
Silent Thunder is my name, and Children are my nation.
The Cosmos is my dwelling place, the Truth my destination.




More information about the conspire mailing list