[lug] Locales/character encodings, etc...

Harris, James James_Harris at maxtor.com
Tue Feb 11 08:49:57 MST 2003


Hey all --
 
I would like to get a grasp on locales/internationalization and character
sets and googling around has left me with that somewhat overwhelmed feeling.
Basically, I really suffer from being a spoiled, isolated American who has
no clue about any language other than English and I want to understand the
implications of locales better in Linux/computing.  Does anyone have any
recommendations for good reading?  I'm not even 100% concerned with it's
direct relation to Linux.  I'd just like to find more out about how all of
this works, on a fundamental level...
 
For example, I noticed that $LANG in RH 8.0 is now set to en_US.UTF-8.  This
seriously messes up man pages when I try to view them from an xterm on an RH
7.3 machine.  I found that setting LANG to en_US or C or POSIX fixes this.
I also tried launching xterm with '-u8' which supposedly sets 8 bit support,
which helps, but things still aren't 100% right.  Why is that?  What is the
drastic difference between en_US and en_US.UTF-8?
 
Another example that threw me at home: I've recently ripped a few CDs that
are Icelandic and a few that are Celtic which contain foreign characters in
their titles.  For giggles, I ripped them with "high bit" support in the
file names just to see what it would look like.  To my pleasant surprise,
not only is XMMS completely happy with them, but my portable MP3 player is
also happy.  What blew me away, however, was that any GUI I used to view the
filesystem showed the names fine, but an xterm wouldn't.  Everything was a
bunch of question marks.  This was on my woody system which defaults to 'C',
so I set LANG to POSIX and it didn't change anything.  I then set LANG to
en_US.UTF-8 since I'd seen it on RH 8.0 and the names still looked like
poop.  I then set it to en_US and now everything looks wonderful.  So, as
you can imagine, I'm now really curious to understand all of this.
 
Thanks all for any resource recommendations you can give me.  Like I said,
this is just interest peaking out of me and nothing terribly important in
the ultimate scheme of things, but I'd sure love to get a better grasp on
it.  ;)
 
Jim



More information about the LUG mailing list