[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Versions of Nameprep



Patrik,

It appears that this email comes from the perspective that there is no need
for versioning in nameprep because ...see below... am I right?

I have seen the diagram Mark draw on AIO, AI, D and U. You are right on most
point except on problem may occurs when new characters are assigned in "AI"
zone.

You mention:
>    - Character X is normalized to character Q and therefore the character
>      is moved to AI. If the user enter character Q, he will end up at the
>      correct service, but if he enters X, he will not reach any service at
>      all, as X can not exist in a registered domainname.

This would mean that the user would require to know exactly what (version of?)
nameprep his application used in order to know if he can type in X or should
he be specific and type it in Q. A typical end-user is not going to recongize
the differences and probably start to wonder why it fails.

However, this does not means I think we need tag nameprep with version in the
protocol or ACE. As you correctly pointed out, this could done by versioning
in software/applications but a proper documented nameprep version is still
going to be useful.

-James Seng


----- Original Message -----
From: "Patrik Fältström" <paf@cisco.com>
To: <idn@ops.ietf.org>
Sent: Thursday, December 14, 2000 5:21 AM
Subject: [idn] Versions of Nameprep


> (This should probably have been discussed in the nameprep design team
> first, and that was my intention, after the IETF, but now when this
> was discussed I'll post it to the whole IDN mailing list instead...)
>
> Summary:
>
>      By introducing 2 different policies for use of unassigned codepoints
>      in Unicode, i.e. codepoints being unassigned according to
>      newest version of the NamePrep specification.we do not
>      need versioning in the IDN protocol.
>
>     (a) Registries are never registering domains with unassigned
>         codepoints.
>     (b) Clients let unassigned codepoints pass through without
>         any modification.
>
> Now, here is the full rationale:
>
> Versions of Unicode.
>
> - Since Unicode will be updated fairly frequently, we also want to allow new
> characters to be used as soon as they are defined.
> - We do this by updating NamePrep.
> - We want to allow the maximal compatibility between systems running
> different versions of NamePrep.
>
> Mechanism.
>
> In a particular version of NamePrep, the following lists of code points can
> be generated based on the version of Unicode and the mapping/prohibition
> tables.
>
> AIO. code points allowed in the input and in the output
> AI. code points allowed in the input, but not in the output
> D. assigned code points that are disallowed completely (input and output --
> includes noncharacters, unpaired surrogates, etc.)
> U. unassigned code points
>
> Note: the reason that AI exists is that some characters will disappear or be
> transformed in mapping or normalization, so they can appear in the input,
> but will never be in the output.
>
> In any subsequent version of NamePrep, because of updates to Unicode, code
> points from U will move to D, AI or AIO.
>
> Policies.
>
> Registrars are forbidden to register any IDNs containing code points outside
> of AIO for the latest version of Unicode / NamePrep. That is, they are
> forbidden to register any IDNs containing AI, D or U code points. (In
> addition, the allowable names must be in canonical order!)
>
> Clients should treat U code points as if they were AIO as they are
processing
> IDNs as a part of NamePrep. Some certain applications might though be
> implemented to treat them as U, or AIO after first warning the user
> about the fact that the character is of class U -- all based on the
> use of the domainname in the specific application.
>
> Intermediaries may reject names that are not in canonical order, or that
> contain code points that are in their versions of AI or D, but must not
> reject names for containing U.
>
>    - Character X moved to AIO. By passing the characters through as is,
>      the client will end up at the correct service.
>    - Character X is normalized to character Q and therefore the character
>      is moved to AI. If the user enter character Q, he will end up at the
>      correct service, but if he enters X, he will not reach any service at
>      all, as X can not exist in a registered domainname.
>    - Character X is moved into D. This can not exist in any domainname
>      either, so entering X makes the service not reach any service.
>    - Characters XY is specified to be ordered YX. If the user enters
>      YX, he will reach the correct service, but no domainname will be
>      registered with the characters in the order XY, so entering XY
>      will not make the user reach any service.
>
> As we see in the table above, what happens if the client enters class
> U characters which are registered in a newer version of the NamePrep
> document is that the client either reach the domainname that was
> intended, or no domain at all (even though he should, and this is
> because normalization and ordering rules are not updated in the
> client).
>
> In no case will the client reach a domainname which is registered as
> a different domain than the one which the client is attempting to
> reach.
>
> Scenarios.
>
> This will provide for compability in the following ways:
>
> A. Suppose that a client or intermediary is on Unicode 3.1 and the site is
> on Unicode 3.0. This case is simple: there will be no domains on the site
> that can't be accessed by the client, since the client uses a superset of
> the code points accepted by the site.
>
> B. Suppose that a client or intermediary is on Unicode 3.0 and the site is
> on Unicode 3.1. Because the client NamePrep passed through any unassigned
> character, the user can access domains on the site that use characters in
> Unicode 3.1. No domains on the site can have code points that are unassigned
> in 3.1, since that is illegal.
>
> The restrictions in case B are that the client has to type the characters in
> the right order, and has to use the post-mapped, post-normalized code
> points.
>
> Example1: domain is XYZW.com. Y and Z are combining marks, in canonical
> order. Y is not in Unicode 3.0. If the client is on 3.1, XZYW will normalize
> to XYZW.com, so the user can type either one. The 3.0 client must type
> precisely XZYW.com or the name will be rejected.
>
> Example2: domain is BCDE.com. C is the normalized form of Q. A Unicode 3.1
> client can type either BQDE.com or BCDE.com; both will work. The 3.0 client
> can only type BCDE.com.
>
> --
>