[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] URL encoding in html page



Soobok Lee <lsb@postel.co.kr> wrote:

> If a simple HTML page contains the following tag,
>   <a href=http://www.<ML>.com>Hello World!</a>
> in which, <ML> maybe in a native legacy encoding or utf8 encoding, it
> is easy to imagine that some vistors who click that link may be led to
> wrong sites or nowhere.

Very easy to imagine indeed, because the HTML spec says that the href
attribute must contain a URI, and the URI spec says that the host must
contain only ASCII letters, digits, hyphens, and dots (or it may be a
bracket-enclosed IPv6 address literal).

> Should IDNA recommend all HTML authors to use such ACEed URL for
> backward compatilbility and error-free fast deployment?

Not necessary, since the HTML and URI specs already limit the host to
ASCII letters, digits, hyphens, and dots.

> Is HTTP/1.2 being planned for IDN HOST: values ?

Bruce Thomson <bthomson@fm-net.ne.jp> replied:

> Well, depending on how you want it to work, version 1.1 might be OK.
> It allows %-escaped UTF-8 I believe.

HTTP 1.1 says that the Host: header field must contain the <authority>
part of the URI (that is, <host>[:<port>] ), and the URI spec forbids
%-escapes in <authority>.

> Would using ACE here be a change to the HTTP spec

No.  ACE host labels are honest-to-goodness valid ASCII host labels, so
you can use them wherever traditional ASCII host labels are allowed.
You don't need any special permission or invitation.

AMC