[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] URL encoding in html page




If a simple HTML page contains the following tag,
  <a href=http://www.<ML>.com>Hello World!</a>
 in which, <ML> maybe in a native legacy encoding or utf8 encoding, it is easy to imagine that
 some vistors who click that link may be led to wrong sites or nowhere.

When <ML> encoding is not specified by HTML <meta> charset tags  and 
the author and visitors have different default char encodings,
interoperability problems will surge on , no matter which architure we 
choose between IDNA and UTF8-based IDN. IDN-non-aware and even IDN-aware 
Web Browsers would not be able to decide which encoding was used by the author.

IDNA-non-aware browsers will always fail to resolve native encoded Web Hostmaname in the URL, 
even when the html page has specified its charset encoding in the <head> section.
In this case, IDNA backward compatibility cannot save the old browser from failures,
without dns/webserver workarounds for on-the-fly native-to-ACE heuristic encodings.
 
With LDH-only URL, we had no such problems and headaches.

someone may argue that  the html URL should have to contain ACEed URL like this:

  <a href=http://www.bq--blahblah.com>Hello World!</a>

Should IDNA recommend all HTML authors to use such ACEed URL for backward compatilbility
and error-free fast deployment?