[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page




----- Original Message ----- 
From: "Bruce Thomson" <bthomson@fm-net.ne.jp>
To: "Soobok Lee" <lsb@postel.co.kr>; "IETF idn working group" <idn@ops.ietf.org>
Sent: Friday, March 22, 2002 5:14 PM
Subject: Re: [idn] URL encoding in html page


> Encodings of anything on an HTML page are an issue as well. Various
> methods are already defined for the codeset to be determined, so
> whether UTF-8, legacy encodings, or ACE are used, there shouldn't
> be a problem. If uses a different encoding for the URL from the rest
> he will have a problem, 

What if all the html viewable text is in english, but, only the href url contains
legacy (korean) encoded hostnames?  chinese visitors would see clean english homepage,
but fail to click through the korean link.

> but that's no different than if he tried to mix
> two different legacy encodings in his text without intervening META
> tags.
> 
> By the way idns in HTML already work pretty well, except that IE can't handle
> an idn in its base for relative URLS (unless of course you specify
> explicit ACE in your BASE directive). Until Microsoft fixes this, just
> use absolute URLs.

MS IE 6.0 already know IDNA or UDNS?  Otherwise , do you mean that IE determines
the encoding correctly if <META> charset tags are present in appropirate section
in html pages ?

> 
> IDNA probably doesn't need to make a recomendation here. What is more
> critical is how the browser sends the data in the HTTP headers.

Is HTTP/1.2 being planned for IDN  HOST: values ?
If not, HTTP/1.2 HOST: values should not contain any legacy/utf8 encoding.

Soobok Lee


> 
> Bruce
> 
> > 
> > If a simple HTML page contains the following tag,
> >   <a href=http://www.<ML>.com>Hello World!</a>
> >  in which, <ML> maybe in a native legacy encoding or utf8 encoding, it is easy to imagine that
> >  some vistors who click that link may be led to wrong sites or nowhere.
> > 
> > When <ML> encoding is not specified by HTML <meta> charset tags  and 
> > the author and visitors have different default char encodings,
> > interoperability problems will surge on , no matter which architure we 
> > choose between IDNA and UTF8-based IDN. IDN-non-aware and even IDN-aware 
> > Web Browsers would not be able to decide which encoding was used by the author.
> > 
> > IDNA-non-aware browsers will always fail to resolve native encoded Web Hostmaname in the URL, 
> > even when the html page has specified its charset encoding in the <head> section.
> > In this case, IDNA backward compatibility cannot save the old browser from failures,
> > without dns/webserver workarounds for on-the-fly native-to-ACE heuristic encodings.
> >  
> > With LDH-only URL, we had no such problems and headaches.
> > 
> > someone may argue that  the html URL should have to contain ACEed URL like this:
> > 
> >   <a href=http://www.bq--blahblah.com>Hello World!</a>
> > 
> > Should IDNA recommend all HTML authors to use such ACEed URL for backward compatilbility
> > and error-free fast deployment?
> > 
> > 
> > 
> > 
> > 
> >