[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IRIs ought to use internationalized *host* names



Martin Duerst <duerst@w3.org> wrote:

> The IRI proposal is based on defining various important aspects of
> IRIs in terms of their mapping to URIs.

A very good approach.

> The mapping is completely uniform, based on UTF-8 and %HH.  There are
> strong reasons for keeping it uniform, because you can't really look
> into URIs (and therefore IRIs) in the general case.

Okay, but you can easily distinguish generic-syntax [UI]RIs (which
always have a slash just after the scheme:) from all other [UI]RIs
(which never have a slash just after the scheme:).  And you can find the
host field of any generic-syntax [UI]RI.

Let me refine my proposal:  To convert IRIs to URIs, you are welcome to
use UTF8-NFC-%HH wherever %HH is allowed (which is almost everywhere).
But in the host field of generic-syntax URIs, where %HH is not allowed,
use ToASCII.  Optionally, if you know the syntax of the IRI scheme and
you know where the domain names are, you can use ToASCII instead of
UTF8-NFC-%HH to increase the chance that the recipient will not choke on
the name.

Notice that unlike http://www.w%33.org/, mailto:user@dom%61in is valid,
and any browser that doesn't already understand it is broken (which is
probably a lot of them, unfortunately).

> 2) Extending the host name part of URIs to use %HH
>    (in some browsers, that already works, in others, it doesn't,
>     for ASCII; try e.g. http://www.w%33.org in a few browsers)

www.w%33.org does not work in Netscape 4.77 for Linux, nor in Mozilla
0.9.8 for Linux, nor in w3m 0.3 for Linux, nor in lynx 2.8.4rel.1 for
Linux.  It does work in Konqueror 2.2.2 for Linux.

The idea of having two concepts, IRIs and URIs, with every IRI being
equivalent to some URI, is a good idea, because it allows us to add
internationalization without changing the URI syntax and without
breaking any existing software that continues to use that syntax.  It's
the exact same technique being employed by IDNA.

So it would be a shame to change the URI syntax unnecessarily.

AMC