[lug] grep question

Jeffrey Haemer jeffrey.haemer at gmail.com
Tue Jun 12 09:21:35 MDT 2007


Collins,

OK, read your explanation (LANG=C), but I have LANG=en_US.UTF-8, and I
> don't have the problem.


Lucky you!

Posix made the, oh, interesting choice of specifying an I18N --
"internationalization," because there are 18 letters in the word between 'i'
and 'n' -- mechanism without specifying a behavior.

With the exception of LANG=C and its synonyms, you could, I think,
legitimately create a character set with Tibetan characters only, and call
it
en_US.UTF-8.  It would be left to the marketplace to select against your
distro.

LANG=C permits portability, LANG=something_else permits flexibility.  (I
have the nagging suspicion that the Linux Standards Base may now specify
some of these other values, but I confess I don't know.)

In my experience, the behavior Chip complained about -- the opposite
behavior from yours -- appears to be a typical, default, Linux desktop
behavior.  ASCII is wired into my firmware, so this breaks lots of scripts I
write.  In self-defense, my (GNU) makefiles often say "export LANG := C"
early on.

I find it hard to believe that any distro
> would have a LANG= setting that would include lower case characters in
> the range A-Z.


Don't start me on things that I find hard to believe. :-)

For arbitrary character sets, '[A-Z]' means "Any character that falls
between 'A' and 'Z,' inclusive, in the collating sequence."
For guaranteed lower case, Posix offers the
impossible-to-remember-much-less-type expression "[[:lower:]]"

I am, as Dave Barry would say, not making this up.  And yes, you can use
this with your grep:  try " grep '[[:lower:]]' "

OTOH, a distro with /usr/bin/grep is already quite non-standard.


Unusual, yes.  Non-standard, no.  I have run on Posix-conforming boxes in
which /usr/bin was just a symlink to /bin.  The first time I saw it, I found
it, well, hard to believe.

Look at it this way: You can't teach an old dog new tricks.  Me, I try to
learn something new every day as a sentinel; when I finally succeed, I'll
know I'm finally entering my second childhood.

-- 
Jeffrey Haemer <jeffrey.haemer at gmail.com>
720-837-8908 [cell]
http://goyishekop.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20070612/ec087f64/attachment.html>


More information about the LUG mailing list