[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IRIs ought to use internationalized *host* names



Some more explanations:

Please note that the goal of draft-ietf-idn-uri-01
(and of draft-masinter-url-i18n-08) is not to have
more and more URIs with lots of %HH in the host name
part. Indeed, ideally none would ever show up anywhere.

The idea is to have as much as possible IRIs only.
When they are resolved, they should be resolved directly,
i.e. an IRI resolver wouldn't need to produce the %HH
URI representation, it should do toASCII directly as
soon as it has identified a host name.

In the current drafts, that's not yet clear, but I plan
to make this clear in the next round.

Of course, that can't be realized overnight. So I think
it's perfectly acceptable for people to choose to put
an ACEd hostname into an IRI slot as long as they don't
want to risk the chance that it doesn't get resolved
otherwise.

So the work on IRIs and URIs with these two drafts is
just part of what many people have been talking about,
namely that we have to look at every IETF spec and see
how it can be upgraded to deal with IDNs. Every upgrade
will take some time. But if we don't write a spec,
receiver-side implementers won't know what to prepare for,
and therefore, the sender side will never be able to
move forward.

Regards,  Martin.

At 17:14 02/04/04 +0900, Martin Duerst wrote:

>At 03:30 02/03/27 +0000, Adam M. Costello wrote:
>
>>The IRI proposal (draft-masinter-url-i18n-08) calls for the host labels
>>to be ASCII LDH only, just like in URIs.
>>
>>When converting an IRI to a URI, you have to convert the path components
>>from the local charset to Unicode, then do Unicode normalization, UTF-8
>>encoding, and %-escaping.  But you don't do anything to the host labels
>>because they're already LDH.
>>
>>I suspect that the reason the IRI proponents don't internationalize the
>>host field is that they don't yet have an official IDN spec to point at.
>>When they do, I suspect they'll want to revise their proposal so that
>>the host field can use the local charset.
>
>This is very close, but it's actually a tiny little bit more
>complicated.
>
>The IRI proposal is based on defining various important aspects
>of IRIs in terms of their mapping to URIs. This way, the IRI
>proposal doesn't have to deal with questions such as "what's a resource".
>The mapping is completely uniform, based on UTF-8 and %HH.
>There are strong reasons for keeping it uniform, because you
>can't really look into URIs (and therefore IRIs) in the general
>case.
>
>So there are three pieces:
>
>1) Extending URIs to use non-ASCII characters (i.e. IRIs)
>
>2) Extending the host name part of URIs to use %HH
>    (in some browsers, that already works, in others, it doesn't,
>     for ASCII; try e.g. http://www.w%33.org in a few browsers)
>
>3) Extending IRIs to use non-ASCII characters in hostnames
>
>The way the drafts are currently structured, it's:
>
>draft-masinter-url-i18n-08: 1)
>
>draft-ietf-idn-uri-01: 2) and 3)
>
>It would be more straightforward to have it as follows:
>
>draft-ietf-idn-uri: 2)
>
>draft-masinter-url-i18n: 1) and 3)
>
>The main advantage would be that people implementing IRIs
>find everything to start in the same place, and don't have
>to do the implementation in two stages.