[conspire] "Time zones exist to make programmers' lives miserable" ...

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sun Feb 7 02:29:05 PST 2021


> From: "Rick Moen" <rick at linuxmafia.com>
> Subject: Re: [conspire] "Time zones exist to make programmers' lives  
> miserable" ...
> Date: Sun, 7 Feb 2021 00:33:09 -0800

> Quoting Michael Paoli (Michael.Paoli at cal.berkeley.edu):
>
> [snippety-snip]
>
>> Oh, and how many timezone names and variations currently?
>> $ find /usr/share/zoneinfo/*/ -follow -type f -print | wc -l
>> 1779
>> $
>
> Yes, _but_ that includes these three files that are not actually timezones:
> /usr/share/zoneinfo/iso3166.tab
> /usr/share/zoneinfo/zone.tab
> /usr/share/zoneinfo/posixrules
>
> And _that_ is why my tzall shell function greps them out.  ;->

That's why I used:
/usr/share/zoneinfo/*/
rather than any of these:
/usr/share/zoneinfo/*
/usr/share/zoneinfo/
/usr/share/zoneinfo
For non-ancient *nix, in general, ending with / limits to
directories, or at least files that resolve to directory.
And, by my so doing, starts the (e.g. find) search at the first
directory level below zoneinfo - thus likewise misses
those files (which are all at top level under zoneinfo, whereas
all the timezone files are at least one directory level lower).

Might also want to grep out some additional files, if you have
newer version:
$ find * -type f -exec file \{\} \; 2>>/dev/null | sed -ne '/:  
timezone data.*$/d;s/: .*$//p'
iso3166.tab
leap-seconds.list
zone.tab
zone1970.tab
$
And the leap seconds listing was a very logical file to include - quite
handy for NTP, and putting it with the timezone data about the most logical -
certainly of any existing packages - place to put it.

Well, interestingly, file classifies posixrules as ...:
$ file posixrules
posixrules: timezone data, version 2, 5 gmt time flags, 5 std time  
flags, no leap seconds, 236 transition times, 5 abbreviation chars
$
I guess it sort-of-kind-of actually is, ... in a funky way.  As one can
use TZ set to POSIX rule type specification, and Linux is backward compatible
with that, and, I'm presuming will figure it out from that file - but it
doesn't correspond to a single timezone at all, but I'm presuming can be
used to use POSIX TZ rule-style specification of timezone and do the
requisite calculations from there.  Anyway, I did count it in my in this
email.
And, interestingly, looking at my counts and such ... implies it's not
unique ...
$ sha512sum posixrules
e960f4655b6b80056dc7328f02cbca9ed9f759f4ab32434058f2321088ce2f91f5df2f7e2acdbd927eeae5a1857c0b7622f5686b8e2619d5c348e7153615229a   
posixrules
$ find * -follow -type f -exec sha512sum \{\} \; | fgrep  
e960f4655b6b80056dc7328f02cbca9ed9f759f4ab32434058f2321088ce2f91f5df2f7e2acdbd927eeae5a1857c0b7622f5686b8e2619d5c348e7153615229a | awk '{print  
$2;}'
America/New_York
SystemV/EST5EDT
US/Eastern
posix/SystemV/EST5EDT
posix/US/Eastern
posix/America/New_York
posixrules
$
So, some "fixed" TZ values not having POSIX "rules" in them, use that
posixrules specification nevertheless to figure out their timezone data.
That sort of makes sense - effectively a short-cut, as the POSIX rules
type TZ specification is (or at least was) backwards compatible to the
simpler, e.g. PST8PDT that didn't specify the whole rule set - it would
default to the applicable US rules for at least those certain known
timezones.
But, somewhat more interesting, the non-unique - it's not a shortcut even
to more of the common US timezones, but exactly one - EST5EDT.  I wonder
if that happened from some modeling of or using some other code base?
E.g. even though UNIX defaults to GMT0 (/UTC), I noticed some years ago,
at least a certain commercial UNIX (notably HP-UX) defaulted not to
GMT0 but rather to EST5EDT (my theory, too much US government contracts
influence).  So ... maybe Linux picked that up from somewhere or modeled
part(s) of its behavior upon such?
$ date; TZ=posixrules date
Sun Feb  7 02:25:36 PST 2021
Sun Feb  7 05:25:36 EST 2021
$
Well, if it looks like a duck, quacks like a duck, walks like a duck ...

... ah, but alas .../*/, upon double-checking,
that missed some files that should've been included.
$ pwd -P && ls -FAb
/usr/share/zoneinfo
Africa/      Cuba      GMT+0@      Kwajalein  Poland      WET
America/     EET       GMT-0@      Libya      Portugal    Zulu@
Antarctica/  EST       GMT0@       MET        ROC         iso3166.tab
Arctic/      EST5EDT   Greenwich@  MST        ROK         leap-seconds.list
Asia/        Egypt     HST         MST7MDT    Singapore   localtime@
Atlantic/    Eire      Hongkong    Mexico/    SystemV/    posix/
Australia/   Etc/      Iceland     NZ         Turkey      posixrules
Brazil/      Europe/   Indian/     NZ-CHAT    UCT         right/
CET          Factory   Iran        Navajo     US/         zone.tab
CST6CDT      GB        Israel      PRC        UTC@        zone1970.tab
Canada/      GB-Eire@  Jamaica     PST8PDT    Universal@
Chile/       GMT       Japan       Pacific/   W-SU
$

So, correcting that bit, have ...
$ find * -follow -type f -exec file -L \{\} \; | fgrep timezone\ data | wc -l
1826
$

>> And ... how many actually with unique timezone data?
>> $ find /usr/share/zoneinfo/*/ -follow -type f -exec sha512sum \{\}
>> \; | awk '{print $1;}' | sort -u | wc -l
>> 774
>> $
>
> Well, yes, those are the ones that have unique bitwise contents, _but_
> the multiple names to some of them are nonetheless each (separately)
> semantically meaningful, for sundry reasons.
>
>> $ echo $(find /usr/share/zoneinfo/*/ -follow -type f -exec sha512sum
>> \{\} \; | fgrep  
>> 01e5216822bc00070c7728249ed4443b070f901f6337de4ee72b7f4b6623b2638be69f72e5eb0838ad3c78e70618f1c839e681928316305f9b0ab9922c039f51
>> | sed -e 's/^[^ ]*  *\/usr\/share\/zoneinfo\///' | sort) | fold -s
>> -w 72
>
> You know you can do specify a non '/' delimiter to sed, right?  And
> avoid the picket-fence ugliness?  That's what I do when having to
> process pathspecs using sed.  I really hate having to backslash-escape
> my forward slashes:  Legibility goes down through the floor.

Ah yes, there is that ... old habit.  Perl refers to it as
Leading Toothpick Syndrome (LTS).

> Just put your prefered delimiter after the 's instead of the /
>
> (I tend to use _ in such cases.)

And, slightly corrected to count how many unique timezone file data
contents ... well, heck, we can even improve further ... don't need
to examine symbolic links ...

$ sha512sum $(find * -type f -exec file \{\} \; 2>>/dev/null | sed -ne  
's/: timezone data.*$//p') | awk '{print $1;}' | sort -u | wc -l
774
$
So, the earlier just happened to also give the correct result - perhaps by
chance, or maybe on account of exactly where the actual timezone data
files are or happen to be stored in the hierarchy structure, or maybe
anything earlier missed at top level was redundant as far as uniqueness
of contents goes ... and peeking a bit, would seem that last is the
correct explanation.




More information about the conspire mailing list