[lug] OT: Saving Greek in Postgresql

D. Stimits stimits at idcomm.com
Thu Apr 18 01:20:17 MDT 2002


John Starkey wrote:
> 
> On Wed, 2002-04-10 at 07:00, rm at fabula.de wrote:
> > On Wed, Apr 10, 2002 at 04:00:22AM -0600, D. Stimits wrote:
> > Might be an awfully silly question, but how is the encoding of the HTML
> > input form set?
> >
> > I'd also second D. doubts about PHP's type safeness.
> > Could you look at a dump of your request when it comes in to see how
> > the data actually is encoded (i use tcpdump/ethereal for that)?
> 
> Sorry for the incredible delay in getting back to this. I'm in the
> middle of a move and trying to finish up a few projects.
> 
> I went in and recoded the php file to echo the form vars. It came back
> as nonsense. The form is XHTML 1 and encoded UTF-8. I'm wondering now
> whether Apache is cramming the data. Is there a setting in apache for
> locales? I've tried the languages setting, but nothing changed and it
> didn't really make sense that it'd be related. I guess research on that
> end is the next step.

There is indeed some fairly advanced locale stuff in Apache, but I have
never had to configure it. My apache stuff is all behind a firewall
hidden from the outside, I use it to edit and model public website
items, which when satisfactory, get scp'd to the real outside site
(which is zeus, an mass hosting virtual web generator system, designed
to look almost exactly like Apache from the web designer's point of
view...even the .htaccess files are the same, it's very nice and fast).

Now, I don't know what is in UTF-8. But here is an idea, save a UTF-8
page as source from a web browser, then get the same page via telnet to
port 80 and manually issuing the GET command. Then compare output.
Expect to need to edit both files to get to the UTF-8 part and delete
the rest. Add newlines or whatever to get them to sit side by side in
the same format. Suppose you want to get the exact bytes of output from
localhost web server root directory file index.html, the command to
start the log is:
  telnet localhost 80 | tee http_log.txt

What will happen is you will see something like this:
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'

When you see that, you will type:
GET /index.html "HTTP/1.0"

The web page will automagically display, and log file will be created.
Perhaps your favorite hex editor would be good there to get rid of all
but the UTF-8 portions. Compare the version saved in the web browser
with the raw output version, this will give you an idea if something
more is going on since you will compare actual bytes.

A twist that I am looking for is base 64 encoding, which could also be
there. If it is also being base 64 encoded, the bytes will need base 64
decoding before they will look the same. If they already look the same,
then something is interpreting the bytes in different slice sizes, and
the data is not at fault in terms of base 64 or other odd codings (like
MIME).

D. Stimits, stimits at idcomm.com

> 
> Thanks much.
> 
> John
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug



More information about the LUG mailing list