[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page




----- Original Message ----- 
From: "Soobok Lee" <lsb@postel.co.kr>
To: "IETF idn working group" <idn@ops.ietf.org>
Sent: Sunday, March 24, 2002 9:36 PM
Subject: Re: [idn] URL encoding in html page


> 
> ----- Original Message ----- 
> From: "Soobok Lee" <lsb@postel.co.kr>
> > > Not necessary, since the HTML and URI specs already limit the host to
> > > ASCII letters, digits, hyphens, and dots.
> > 
> > We experts already knew this. But, many ML.com registrants don't know  about this
> > poor destiny of ML.com. They want to use native ML.com in their HTML homepage.
> > 
> > If we want to have interoperable URI supporting native IDN, we should revise
> > URI spec and HTTP spec BOTH. But, native IDN supports accompany potential
> > legacy code versioning and code interoperablility problems.
> > Would anyone provide indepth analysis on this caveat  ?
> > 
> 
>  
>  Even if we stay with current HTTP/1.1 which allows only ASCII host: header values,
>  still we could revise  URI spec to allow native (utf8 or legacy encoding) IDN in URI.
> 
>  1) With IDNA and HTTP/1.1 , the web browser can encode Native IDN in URI into ACE one , and
>  then open HTTP 1.1 session into the ACEed hostname with ACE host: value.
> 
>  2) With IDNA and revised HTTP with utf8 host support,  the web browser can encode 
>  utf8 IDN in URI into ACE one, and  then open HTTP session into ACE hostname with utf8 host: value.
> 
>  3) With UTF8-based IDN and revised HTTP with utf8 host support, it can check whether 
>  the native IDN is in utf8, and, if not, convert the iDN into utf8 , and then open
>  HTTP session into utf8 webhost with utf8 host: value.
> 
> 
>  2) and 3) may be infeasible due to HTTP's lack of capability negotiation feature like that of ESMTP,

  s/and 3)//    :-)     In 3), the webserver surely support native utf8 host: value.
  

>  because the new web browser with native IDN URI support  can't decide whether the web server supports 
>  native IDN or supports only ASCII(ACE) host in HOST: value   before trying that twice with both forms 
>   of host: value (utf8 first, and then ACE if needed). Using ACE host: value is always  safe in 1) and 2).
> 
>  BTW, in 1) and 2), we cannot avoid legacy versioning problems because 
>   most ACE conversion would be done by "ACE(NFKC(CaseFold(legacy-to-Unicode(native label))))".
>   Most homepages in east asia are in legacy encodings and that monopoly (near 100%) won't change
>    in the forseeable future.
> 
>  new legacy codes may be created after IDN-aware browsers are distributed.
>  old legacy codes may get new code points for newly added characters.
>  If IDN-aware browsers/applications are not upgraded with new legacy-to-Unicode mappings,
>   they will occasionally fail to convert  legacy-encoded IDN into UNICODE one.
>   That kind of IDN failure had  never seen in LDH DNS.  
> 
> Soobok Lee
> 
>  
> 
>   
>  
>  
>