[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



> >> As an implementor of a web server, how do I know how to interpret the
> >> "host:" field in the HTTP header?  If I see something that looks like
> >> ACE, I suppose I would need to decode it to the original UTF-8 to
> >> compare with my virtual domain definitions.
> >
> >No, you wouldn't.  If you don't know anything about IDNs, then in your
> >mind all host names are ASCII, so you would do an ASCII comparison
> >between the Host: field (which is ASCII) and your virtual domain
> >definitions (which are also ASCII), and that would work just fine.
>
> This is only in the mind of a IETF hacker.

Totally agree!! If we use ACE as the URL links then all web designers in the
world needs to be retrained for ACE conversion... school has to start
offering a course on ACE compression and decompress in their Comp Sci or
Engineering program : ))

> In the mind of the common man, a host name is a name composed of
> letters, digits and some more characters. Any letters, not just English
> letters.
> The limitation to English letters is an artificial limit and accepted by
> common man. When entering host names, it does not matter where, common
> man
> will use native character set using any letters used in native language.
>
> The same applies to URLs or URIs. The can contain any letter. Today
> lots of people use non-ASCII letters in URLs, embedd them in links in
> HTML,
> enter them in browsers. Quite often they work as expected.
> Common man cannot understand why IETF does not take the easy route and
> change its artficial and incomprehensible limits in character allowed.
> Just update the current RFC to allow any character outside the ASCII
> range.

RFC should be updated according to new things, so if they said RFC doesn't
allow URL/URI with non-ascii, so how about IPv6 back before IPv6 came out, I
dont think URL/URI link can allow IPv6 representation as well...

> It time everybody face the facts: if you define an RFC restricting
> things to ASCII where is is natural to allow any letter, people will
ignore the
> RFC and use any letter anyway. There is so much talk about backward
> compatibility - but you only think about ASCII. There is a lot of
> inofficial use of non-ASCII in URLs, host names and other places.
> We must support them also.

As far as i know a lot of Chinese, Japanese or Korean websites are using CJK
in their URI for directories... If you look at big5.com, they have been
offering Chinese URI in the directory level for a long time!

David Leung
Chief Technology Officer
Neteka Inc.
T: (416) 971-4302
http://w!.neteka.com