[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: idn-uri document

To: Erik Nordmark <Erik.Nordmark@sun.com>
Subject: Re: [idn] Re: idn-uri document
From: Martin Duerst <duerst@w3.org>
Date: Sun, 03 Nov 2002 15:49:18 +0900
Cc: idn@ops.ietf.org
In-reply-to: <Roam.SIMC.2.0.6.1036104046.2114.nordmark@bebop.france>

At 23:40 02/10/31 +0100, Erik Nordmark wrote:

> Ah, I see, of course the resolvers pass things through and then
> pass the negative result back, so they don't actually reject it.
> So now the sentence reads:
>
> However, such syntax should never be used, and will never be
> resolved because no such domains will be registered.

ok


> >The defined syntax rules for declare certain ASCII domain names illegal
> >(such as *.example.org). Where is the check for illedgal names assumed to
> >be performed? For IDNA it probably makes sense to only apply this types
> >of checks (setting the UseSTD3ASCIIRules flag) when verifying domain name
> >registrations and not do such checks in the clients.
>
> This is an IDNA question, not a idn-uri question. As far as I remember,
> the idea was to have the checks done on the clients, too (with some
> leeway for unassigned characters to stay forward-compatible with
> new character assignements). The reason for this was to create
> pressure on registries to follow the rules.

My point is that the idn-uri document is more restrictive in its
syntax than the IDNA document. I don't know if this is a good idea
or a bad idea, and we need to understand which type of idea it is.

URIs [RFC 2396] uses the same restrictions as STD3. I think it
therefore makes sense to also use these restrictions in this
upgrade.

On the other hand, while URI syntax contains these checks, URI
implementations are supposed to not check (because they would need
scheme-specific knowledge to do so, which would restrict
deployment of new schemes (even more than it's currently restricted)).

> >The above statement says that for all domain names (note that the term
> >"IDN" is defined to include the existing ASCII domain names)
> >one should apply nameprep. This might be fine but it makes sense
> >stating this explicitly. The ToASCII in IDNA does not apply nameprep
> >to all-ASCII labels.
>
> The idea was simply to say: We RECOMMEND that you apply the IDNA rules
> already when you create an URI. What these rules are is up to IDNA.
> If IDNA says that their preparation of ascii-only labels is the
> identity operation, then we recommend that you apply that (i.e. do
> nothing), and not something else. If you see a way to make this clearer,
> please tell me.

It is the identity operation for ASCII-only labels (plus some checks
that will reject certain labels).

If you want to do that part, but not apply the Punycode step
then I think you need to explicitly state that one should
apply ToASCII without the punycode step (and other steps that don't
make sense if there are any).

Yes, thanks, fixed.

> >Which are the "any steps required as part of domain name resolution"
> >above? I can't figure out to what it might refer.
>
> That's the nameprep and related checking that the client has to
> do when it resolves a domain name. In IDNA terms, 'client' would
> be easy to understand. But using the word 'client' in an URI context
> doesn't work, so I tried to word around it. Any improved wording
> appreciated.

In that case I think refering to ToASCII would be best.

I have changed the parts in question to read:

>>>>
For domain names containing non-ASCII characters, the legal domain
names are those for which the ToASCII operation
([IDNA], [Nameprep]; using the
unescaped UTF-8 values as input), with the flags "UseSTD3ASCIIRules"
and "AllowUnassigned" set, is successful. The URI resolver MUST
apply any steps required as part of domain name resolution by
[IDNA], in particular the ToASCII operation, with the above-mentioned
flags set.
URIs where the ToASCII operation results in an error should be
treated as unresolvable.

For domain names containing non-ASCII characters, the Nameprep
specification ([Nameprep]) defines some mappings,
which mainly include normalization to NFKC and folding to lower case.
When encoding an internationalized domain name in an
URI, these mappings SHOULD NOT be applied. It should be assumed
that the domain name is already normalized as far as appropriate.
>>>>

I have explicitly changed it so that encoding does not require any
Nameprep steps, have clearly and repeatedly mentioned ToASCII,
and have added the necessary flags.

> >Finally, is the intent that nameprep always be applied before characters
> >are encoded in UTF-8? Then it makes sense stating that in the first real
> >paragraph on page 4.
>
> No. In the context e.g. of IRIs, the conversion from an IRI to an URI
> would not do nameprep.

OK. Then it makes sense stating that explicitly somewhere.

Done above. I'm submitting a new draft, it should show up in
the directory in a few days (due to backlog).

Regards,    Martin.

Follow-Ups:
- Re: [idn] Re: idn-uri document
  - From: Erik Nordmark <Erik.Nordmark@sun.com>

References:
- Re: [idn] Re: idn-uri document
  - From: Erik Nordmark <Erik.Nordmark@sun.com>

Prev by Date: Re: [idn] Re: idn-uri document
Next by Date: Re: [idn] Re: idn-uri document
Previous by thread: Re: [idn] Re: idn-uri document
Next by thread: Re: [idn] Re: idn-uri document
Index(es):
- Date
- Thread