[sf-lug] convert to lowercase

Michael Paoli Michael.Paoli at cal.berkeley.edu
Fri Sep 21 06:23:49 PDT 2007


Quoting Alden Meneses <aldenm at gmail.com>:

> Thanks all for the tr command. I forgot about that one and was trying to
> figure out a way to do it with awk. You guys saved me lots of time.
> 
> Not familiar with the ex utility. Will need to read up more on it but can
> you tell me why preserving the inode is important? Maybe some real world
> examples of doing this method versus tr.

$ id
uid=607(mpaoli) gid=607(mpaoli) groups=607(mpaoli),700(balug)
$ ls -lidba . file subdir/file subdir | sort -bn
16002 drwxrwsr-x  3 balug balug 4096 Sep 21 05:17 .
16003 -rw-rw----  2 balug balug    8 Sep 21 05:18 file
16003 -rw-rw----  2 balug balug    8 Sep 21 05:18 subdir/file
16004 drwxrwsr-x  2 balug balug 4096 Sep 21 05:07 subdir
//  | |           | |     |     |    |            |
//  | |           | |     |     |    |            ^ link name
//  | |           | |     |     |    ^ mtime
//  | |           | |     |     ^ size
//  | |           | |     ^ group
//  | |           | ^ user
//  | |           ^ link count
//  | ^ permissions
//  ^ inode number
$ cat file
FOO BAR
$ cat subdir/file
FOO BAR
//note that file and subdir/file have the same contents
//they are the same file, different links
//note the link count of 2
//note the permissions and ownerships: -rw-rw---- balug balug
$ ex file << \__EOT__
> %s/[A-Z]/\l&/g
> w
> q
> __EOT__
$ cat file
foo bar
$ cat subdir/file
foo bar
$ ls -lidba file subdir/file
16003 -rw-rw----  2 balug balug 8 Sep 21 05:20 file
16003 -rw-rw----  2 balug balug 8 Sep 21 05:20 subdir/file
//note we still have the one same file (with updated contents)
//with the same two links, same inode number, permissions, and ownerships
//now let's convert again ... this time from lower to upper case,
//but this time using a common technique that doesn't keep the same inode:
$ perl -pi -e 's/(.*)/\U\1/;' file
$ cat file
FOO BAR
$ cat subdir/file
foo bar
$ ls -lidba file subdir/file
16006 -rw-rw----  1 mpaoli balug 8 Sep 21 05:34 file
16003 -rw-rw----  1 balug  balug 8 Sep 21 05:20 subdir/file
//note that file was replaced by a new inode; note the ownership change
//the permissions happen to remain the same - probably perl's doing in
//this case, as we have:
$ umask
0077
//and would otherwise have expected a result of -rw------- permissions
//perl may have even possibly tried to preserve the ownership, but
//not having privilege to do so in this case couldn't set that original
//ownership on the new file
//also, in this case, the original file wasn't changed; it was
//unlinked and a new file put in its place, ... however there
//remains another link to the original file, and that original file
//was unchanged.

Each method has its advantages and disadvantages, or more notably,
different effects, which may or may not be the desired or preferred
behavior, depending upon the objectives.

overwrite: preserves permissions, ownerships, link relationships,
changes the original file, and thus all (hard) links will be to the
updated file.  Operating system is multi-user multi-tasking; other things
accessing (most notably reading) the file while it's being written may
get an inconsistent read of the file (something between the original
and final form) - this risk increases as file size increases (some
other factors also influence this risk).

replace file: may not preserve permissions/ownerships; does not preserve
(hard) link relationships (symbolic links aren't impacted), does not
change the original file, but rather unlinks it, new inode is created.
Operating system is multi-user multi-tasking; if the file is replaced
via rename(2) (mv(1) will typically try to use rename(2) where it can),
then the action is atomic - anything attempting to read/access the file's
data gets either the old original data, or the new data - it will always
find the file there, and never find the file not there or find the
data to be between the old and new states.

references:
sh(1)
ex(1)
perl(1)
perlre(1)
perlrun(1)
tr(1)
rename(2)
mv(1)
ln(1)
rm(1)
unlink(2)

> On 9/17/07, Michael Paoli <Michael.Paoli at cal.berkeley.edu> wrote:
> >
> > $ cat file
> > ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz
> > abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
> > $ ex file << \__EOT__
> > > %s/[A-Z]/\l&/g
> > > w
> > > q
> > > __EOT__
> > $ cat file
> > abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz
> > abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz
> > $
> >
> > One of the advantages of the method shown above, is that it
> > preserves the inode and (hard) link relationships (and ownerships
> > and permissions) of the file - no explicit intermediary file required.
> >
> > Extra credit: name a disadvantage with the above (hint: multiuser,
> > multitasking)
> >
> > Quoting Alden Meneses <aldenm at gmail.com>:
> >
> > > I was wondering if anyone knew how to convert a text file to all
> > lowercase.




More information about the sf-lug mailing list