[conspire] Lettercase is simple, right?

Rick Moen rick at linuxmafia.com
Thu Mar 28 02:34:56 PDT 2019


I distinctly remember computing that was entirely in 7-bit US-ASCII,
pounded out on our ASR-33 Teletypes -- doubtless a sad time for the
inhabitants of Ålesund, among other places, and for that matter the
entire Møre og Romsdal County that town is in, not to mention anyone
seeking to write γνῶθι σεαυτόν or בראשית ברא אלהים.  

Most of the Western world (all of its main languages, as far as I know)
get along these days with UTF-8, which we treat as 'basically assume
it's ASCII, except that you can spell Ålesund[1] properly'.  But it really
isn't, because it's a limited subset of Unicode, and Unicode can be
maddeningly alien.  Let us stop to pity the programmer obliged to deal
with Unicode, which is many coders these days, because they end up
having to deal with things like this:


https://www.b-list.org/weblog/2018/nov/26/case/


   Truths programmers should know about case
   Published: November 26, 2018

   [...] I hinted, briefly, at the deeper complexity of case in Unicode,
   and I want to take some time to talk about that in more detail, because
   it’s interesting and because understanding it can help you make better
   choices when designing and writing code that processes text. So here, in
   opposition to “falsehoods programmers believe”, is my inaugural “truths
   programmers should know”, on the topic of case.
   [...]

The author's subheadings follow.  Visit the link for a lot of
fascinating detail.

   There are more than two cases

   There’s more than one way to determine case

   You can’t tell a character’s case from looking at it (or from its name)

   Some characters have no case

   Some characters may appear to have multiple cases

   Case is context-sensitive

   Case is locale-sensitive

   Case-insensitive comparison requires case folding

Seriously, I think the piece is worth people's time, although there
were certainly passages I skimmed through quickly.


I found it via a link from an LWN.net story about controversy caused
over a patch to make it possible to make ext4 do case-insensitive
filenames.  (Hint:  Really not a good idea, despite what desktop users
think.  Causes distressing problems.)


[1] That's not an accent over a letter 'A', but rather the 29th and
final letter (pronounced like 'oh') of the Danish/Norwegian alphabet,
whose order is abcdefghijklmnopqrstuvwxyzæøå.  So, in an alphbetical
list of Norway's towns, Vardø comes before Østby, Ænes, and Ålesund.



More information about the conspire mailing list