[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



Adam M. Costello wrote:


>> As an implementor of a web server, how do I know how to interpret the
>> "host:" field in the HTTP header?  If I see something that looks like
>> ACE, I suppose I would need to decode it to the original UTF-8 to
>> compare with my virtual domain definitions.
>
>No, you wouldn't.  If you don't know anything about IDNs, then in your
>mind all host names are ASCII, so you would do an ASCII comparison
>between the Host: field (which is ASCII) and your virtual domain
>definitions (which are also ASCII), and that would work just fine.

This is only in the mind of a IETF hacker.

In the mind of the common man, a host name is a name composed of
letters, digits and some more characters. Any letters, not just English
letters.
The limitation to English letters is an artificial limit and accepted by
common man. When entering host names, it does not matter where, common
man
will use native character set using any letters used in native language.

The same applies to URLs or URIs. The can contain any letter. Today
lots of people use non-ASCII letters in URLs, embedd them in links in
HTML,
enter them in browsers. Quite often they work as expected.
Common man cannot understand why IETF does not take the easy route and
change its artficial and incomprehensible limits in character allowed.
Just update the current RFC to allow any character outside the ASCII
range.

It time everybody face the facts: if you define an RFC restricting
things
to ASCII where is is natural to allow any letter, people will ignore the
RFC and use any letter anyway. There is so much talk about backward
compatibility - but you only think about ASCII. There is a lot of 
inofficial use of non-ASCII in URLs, host names and other places.
We must support them also.

   Dan