[sf-lug] .signature files collection

Rick Moen rick at linuxmafia.com
Thu Jan 22 01:34:22 PST 2015


Quoting Daniel Gimpelevich (daniel at gimpelevich.san-francisco.ca.us):

> On Thu, 2015-01-22 at 01:01 -0800, Rick Moen wrote:
> > Quoting Daniel Gimpelevich (daniel at gimpelevich.san-francisco.ca.us):
> >  
> > > UTF-8 .txt files begin with two bytes of magic that identify them as
> > > such to any software that may read them.
> > 
> > Here's the ASCII file:
> > 
> > linuxmafia:/var/www/pub/humour# file sigs-rickmoen-old 
> > sigs-rickmoen-old: UTF-8 Unicode English text
> > linuxmafia:/var/www/pub/humour# 
> 
> I don't mean to suggest anything should've been done differently at this
> point, but:
> 
> danielg4 at chimera:/tmp$ hd sigs-rickmoen-old | head -1
> 00000000  41 72 63 68 69 76 69 73  74 27 73 20 4e 6f 74 65  |Archivist's Note|

So, you're saying the two bytes of magic are not there?  '41' is 'A',
and so on.

Looking quickly through docs, I'm guessing you're talking about the byte
order mark ('BOM').  Setting that in vim and writing it out produces
this:

inuxmafia:/var/www/pub/humour# file sigs-rickmoen-old                                                                                                  
sigs-rickmoen-old: UTF-8 Unicode (with BOM) English text                                                                                                
linuxmafia:/var/www/pub/humour# hd -d sigs-rickmoen-old | head -1                                                                                       
00000000  ef bb bf 41 72 63 68 69  76 69 73 74 27 73 20 4e |...Archivist's N|
linuxmafia:/var/www/pub/humour#

However, the symptom persists.  Moreover,
http://vim.wikia.com/wiki/Working_with_Unicode says that 


In the above example, 'set bomb' is commented out because it can cause
problems if your encoding is utf-8, and is not really necessary. From
the Wikipedia BOM page:

"While Unicode standard allows BOM in UTF-8, it does not require or
recommend it. Byte order has no meaning in UTF-8 so a BOM only serves to
identify a text stream or file as UTF-8 or that it was converted from
another format that has a BOM."

The advantage of setting BOM is that Vim can very easily determine that
the file is encoded in UTF-8, but is often not understood,
misrepresented, or even considered invalid in other programs, such as
compilers, web browsers, or text editors not as nice as Vim.




More information about the sf-lug mailing list