[conspire] GiB/TiB vs. GB/TB - SI vs. binary, etc.

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sun Apr 4 11:34:21 PDT 2010


Ah, such is commonly an area of confusion - had to explain it within the
last week or two to a co-worker on "where'd all the missing space" go.
A wee bit of explanation, and some suitable references:
http://en.wikipedia.org/wiki/Binary_prefix
http://en.wikipedia.org/wiki/SI_prefix
and rather suddenly all was quite clear to said co-worker.

Opinions may and will differ :-) ... but my opinion/perspective is that
when referring to binary units, use the binary terms/abbreviations, e.g.
KiB, MiB, GiB, TiB, etc.  May take some getting used to (and things like
TebiByte still don't roll off the tip of my tongue), but I think in
general it's better and avoids ambiguity.  Such ambiguity can be
especially significant as one gets into larger realms.
$ perl -e 'for  
(1..5){printf("%5.2f%%\n",((2**10)**$_-(10**3)**$_)/((10**3)**$_)*100);}'
  2.40%
  4.86%
  7.37%
  9.95%
12.59%
$
Although one may try to reclaim binary definitions for, e.g. KB, MB, etc.,
I tend to think that will be a losing battle - rather like trying to
regain the original definition of hacker.  In such cases it's often better
to just move on and use newer less ambiguous terms where feasible, e.g.
cracker.

I also note that many programs/utilities, including and perhaps
especially in Open Source, are tending towards using and explicitly
stating binary units.  I think this is, in general, the preferable way
to go, as it avoids the otherwise potential ambiguity and confusion.

Note also that for communications (e.g. bits/bytes/symbols per second,
etc.), convention is to use SI, rather than binary units.  So probably
generally "best" to consistently follow convention there.

For most other data contexts, binary (and explicit use of binary units)
is probably generally preferable, possibly (or not) withstanding hard
drive marketing.  At least in due deference to them, most of the hard
drive manufactures, have in their specifications, and typically
marketing/labeling, at least had the decency to put an (often itty
bitty teensy) asterisk after their MB, GB, and TB, and somewhere down
(typically in teensy print) indicated something like:
* MB is 1,000,000 bytes
* GB is 1,000,000,000 bytes
* TB is 1,000,000,000 bytes
I think, however, most all the technical folks generally prefer to use
and report in binary units (even to non-technical folks) to help avoid
confusion (the non-technical folks will often, at least eventually note
the i (e.g. TiB), and often thus learn/discover that there's a
difference and it can be fairly significant).

Random disk/filesystem/RAID space related comment:  Once upon a time, I
set up a "worksheet" that folks (including non-technical folks) could
use to plug in basic disk information, RAID type/configuration,
filesystem type, etc., and see exactly how much "useable" space they
would end up with.  This was often rather to quite useful in avoiding
unpleasant user surprises of "But we ordered and received 2 TB of hard
drives, why can't we store 2 TiB of file data on there with our
such-and-such filesystem on it on LVM on RAID-n striped across N drives
with M spares?"

Thanks also, for pointing out the UK vs. American English billion,
trillion, etc. (at least historic) differences - I wasn't aware of
those.  Looks like the UK eventually gave in there (well, at least
mostly - looks like the battle isn't 100% over yet).  Yielding or
adjusting to plurality opinion/usage isn't always "best", but it's often
the more practical option.

references:
http://en.wikipedia.org/wiki/Binary_prefix
http://en.wikipedia.org/wiki/SI_prefix
http://linuxmafia.com/pipermail/conspire/2010-April/005429.html, et. seq.
http://linuxmafia.com/pipermail/conspire/2010-April/005433.html, et. seq.
http://linuxmafia.com/pipermail/conspire/2010-April/005439.html
http://www.guardian.co.uk/notesandqueries/query/0,5753,-61424,00.html
http://en.wikipedia.org/wiki/Long_and_short_scales





More information about the conspire mailing list