[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



> At 8:47 AM +0100 3/27/02, Dan Oscarsson wrote:
> >It is important that existing RFCs like the one for the URL, imediately
> >be
> >updated to allow non-ASCII letters. And do not use the IETF hacker
> >language
> >and call it IRI, for the common man it will be a URL and URI.
> >People will not ACE encode host names in URLs, people will not
> >%-encode paths in URLs. They will do like they do today: use native
> >character
> >set of the HTML document. Ignoring reality does not work.
>
> Quite true. The proposal to allow anyone to enter host names in URLs
> using any native encoding scheme, certainly ignores the reality that
> every DNS server would have to have equivalents for every name in
> every conceivable encoding scheme.

I think how the name is being display in the HTML or UI can be a seperate
issue of how it is going to be resolve at the backend, we can have IDNA with
ACE in the name lookup functions instead of the interface where users can
see the ACE... should we treat that as a leakage as well?

> Similarly, the proposal to allow people to enter host names in URIs
> using only UTF-8 ignores the reality that many people enter text in
> many different encoding schemes and often have no idea what scheme
> they are using at the moment (the text on the screen appears the same
> regardless of the encoding scheme).

However this will happen for IDNA as well, IDNA with ACE ASSUMES that all
input are going to be in Unicode, however if user is inputting BIG5 or GB or
any local encoding, you have to specify it on the IDNA client's setting
before it can properly convert that to Unicode and to ACE anyways... so IDNA
and ACE is "ignoring the reality" too!!

But for newer OSes and applications I can see that there is more support for
UTF8 and local encoding is slowly being replaced... I think there should be
support for local encoding for the TRANSITIONAL period to make the
deployment of IDN smoother before users can move toward using UTF8.