[lug] grep question

Chip Atkinson chip at pupman.com
Mon Jun 11 07:21:38 MDT 2007


Yes, thanks for the explanation!

Chip

On Mon, 11 Jun 2007, Jeffrey Haemer wrote:

> Okay, now I have time.  Here's a little more background, in five, easy
> steps.
> 
> (1) In Unix, collation (the order of characters) and expressions built on
> collation order ("[A-Z]") used ASCII collating order.
> A few things made people re-think that assumption.
> 
> The obvious thing was character sets with more than 128 characters.  Only a
> few languages can be written without funny letters.  Of modern languages, I
> think the list is English, Indonesian, Hawaiian, and Swahili.  If you do an
> ls(1), where should files that start with Thai characters sort, and what
> order should they come in?  What should sort(1) do with a list of Danish
> first names?  The Germans and Japanese finally got enough money that Unix
> vendors cared.
> 
> Different, but in the same category, was EBCDIC.  If you wanted to make a
> Unix work-alike -- say, grep(1) -- for an old, IBM mainframe, how should it
> behave?  IBM had always had enough money, but finally started caring about
> Unix.
> 
> Different, but in a different category, was the desktop market.  MS-DOS had
> case-insensitive filenames, and everyone's marketing department thought that
> they could finally sell Unix to some people who'd gotten used to Windows.
> 
> (2) To address these, POSIX invented a mechanism to specify a collating
> order that's separate from the character-set order.  Used to be that if you
> wanted to sort backwards, you'd say "sort -r".  Today, you can create a new
> collating sequence, install it, tell the system to use that order, and then
> call "sort" without a flag.  See how much better that solution is?  Me
> neither.  And when's the last time anyone asked us, anyway.
> 
> (3) This mechanism was one of several innovations that came to Unix around
> the same time, all for similar reasons.  For example, your keyboard has a
> dollar sign; some keyboards have pound signs or Euro symbols; some even have
> more than one.  Some places, they write ten thousand as "10000" , some as
> "10,000" , some as "10.000" some as "1,0000" .  Don't you want to be able to
> tell a system how to print prices in Saudi Riyals or Kuwaiti Dinars?  Yeah,
> me neither.  People who make really a lot of money selling computers all do.
> 
> (4) On systems that approximate POSIX-conformance, these behaviors are
> governed by environment variables called things like LC_MONETARY and LC_TIME
> and LC_COLLATE.  There is, however, one ring that rules them all.   Okay,
> two rings:  LANG and LC_ALL.  They differ in subtle but boring ways.  Use
> LANG: it's fewer characters to type.  If you try "echo $LANG" you'll see
> what rules someone has told your system you want.
> 
> (5) To provide normal, predictable, sane behavior -- or, as it's known in
> marketing circles, "traditional Unix behavior" -- say LANG=C.  You can say
> other stuff that works, too, like LANG=POSIX or LANG=XOPEN or even (I'm
> pretty sure -- all of this is from memory) unset LANG.
> The first of these, LANG=C, is the fewest characters to type.
> 
> This help?
> 
> On 6/11/07, karl horlen <horlenkarl at yahoo.com> wrote:
> >
> >
> > --- Jeffrey Haemer <jeffrey.haemer at gmail.com> wrote:
> >
> > > export LANG=C
> > >
> > > will cure this problem.
> > >
> > > (If you want a long explanation, let me know and
> > > I'll write one tomorrow.
> > > Right now, I'm, uh, otherwise occupied.)
> >
> > i'm not having the problem, but i'd be curious to hear
> > your explanation...
> >
> >
> >
> >
> >
> > ____________________________________________________________________________________
> > The fish are biting.
> > Get more visitors on your site using Yahoo! Search Marketing.
> > http://searchmarketing.yahoo.com/arp/sponsoredsearch_v2.php
> > _______________________________________________
> > Web Page:  http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
> >
> 
> 
> 
> -- 
> Jeffrey Haemer <jeffrey.haemer at gmail.com>
> 720-837-8908 [cell]
> http://goyishekop.blogspot.com
> 




More information about the LUG mailing list