[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



Encodings of anything on an HTML page are an issue as well. Various
methods are already defined for the codeset to be determined, so
whether UTF-8, legacy encodings, or ACE are used, there shouldn't
be a problem. If uses a different encoding for the URL from the rest
he will have a problem, but that's no different than if he tried to mix
two different legacy encodings in his text without intervening META
tags.

By the way idns in HTML already work pretty well, except that IE can't handle
an idn in its base for relative URLS (unless of course you specify
explicit ACE in your BASE directive). Until Microsoft fixes this, just
use absolute URLs.

IDNA probably doesn't need to make a recomendation here. What is more
critical is how the browser sends the data in the HTTP headers.

Bruce

> 
> If a simple HTML page contains the following tag,
>   <a href=http://www.<ML>.com>Hello World!</a>
>  in which, <ML> maybe in a native legacy encoding or utf8 encoding, it is easy to imagine that
>  some vistors who click that link may be led to wrong sites or nowhere.
> 
> When <ML> encoding is not specified by HTML <meta> charset tags  and 
> the author and visitors have different default char encodings,
> interoperability problems will surge on , no matter which architure we 
> choose between IDNA and UTF8-based IDN. IDN-non-aware and even IDN-aware 
> Web Browsers would not be able to decide which encoding was used by the author.
> 
> IDNA-non-aware browsers will always fail to resolve native encoded Web Hostmaname in the URL, 
> even when the html page has specified its charset encoding in the <head> section.
> In this case, IDNA backward compatibility cannot save the old browser from failures,
> without dns/webserver workarounds for on-the-fly native-to-ACE heuristic encodings.
>  
> With LDH-only URL, we had no such problems and headaches.
> 
> someone may argue that  the html URL should have to contain ACEed URL like this:
> 
>   <a href=http://www.bq--blahblah.com>Hello World!</a>
> 
> Should IDNA recommend all HTML authors to use such ACEed URL for backward compatilbility
> and error-free fast deployment?
> 
> 
> 
> 
> 
>