[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Versions of Nameprep



At 11.46 +0800 00-12-18, James Seng/Personal wrote:
>It appears that this email comes from the perspective that there is no need
>for versioning in nameprep because ...see below... am I right?

Correct. But as I said at the meeting, this probably should have been 
verified with the nameprep design team before posting it here. It 
might be the case that there is a bug hiding which I can not see...

>I have seen the diagram Mark draw on AIO, AI, D and U. You are right on most
>point except on problem may occurs when new characters are assigned in "AI"
>zone.
>
>You mention:
>>     - Character X is normalized to character Q and therefore the character
>>       is moved to AI. If the user enter character Q, he will end up at the
>>       correct service, but if he enters X, he will not reach any service at
>>       all, as X can not exist in a registered domainname.
>
>This would mean that the user would require to know exactly what (version of?)
>nameprep his application used in order to know if he can type in X or should
>he be specific and type it in Q. A typical end-user is not going to recongize
>the differences and probably start to wonder why it fails.

The key point is that people which previously have been talking about 
versioning  have said that characters in class U should be forbidden. 
That means that in the case above, regardless of if the user typed X 
or Q would work (if both X and Q were in class U in previous versions 
of Unicode).

According to my proposal, the client doesn't have to keep track of 
versions, and the user will at least get a hit on the correct domain 
in 50% of the cases, which I claim is better than if the client have 
to keep track of versions and not get any hit at all.

>However, this does not means I think we need tag nameprep with version in the
>protocol or ACE. As you correctly pointed out, this could done by versioning
>in software/applications but a proper documented nameprep version is still
>going to be useful.

The key thing is that the nameprep rfc have to be updated / obsoleted 
every time a new version of Unicode is released, and applications 
have to say (as always) what RFCs (what version of Nameprep) they 
handle.

I wrote the proposal because I see a problem with versioning itself 
in the protocol -- especially when using some ACE encoding, and I 
think it is bad if an application using one version of nameprep based 
on for example Unicode 3.0 is prohibited to use the characters added 
by Unicode 3.1. If we have versioning, this will be the case.

At least in my mind, and I am so far not convinced that I am wrong.

     paf


>
>-James Seng
>
>
>----- Original Message -----
>From: "Patrik Fältström" <paf@cisco.com>
>To: <idn@ops.ietf.org>
>Sent: Thursday, December 14, 2000 5:21 AM
>Subject: [idn] Versions of Nameprep
>
>
>>  (This should probably have been discussed in the nameprep design team
>>  first, and that was my intention, after the IETF, but now when this
>>  was discussed I'll post it to the whole IDN mailing list instead...)
>>
>>  Summary:
>>
>>       By introducing 2 different policies for use of unassigned codepoints
>>       in Unicode, i.e. codepoints being unassigned according to
>>       newest version of the NamePrep specification.we do not
>>       need versioning in the IDN protocol.
>>
>>      (a) Registries are never registering domains with unassigned
>>          codepoints.
>>      (b) Clients let unassigned codepoints pass through without
>>          any modification.
>>
>>  Now, here is the full rationale:
>>
>>  Versions of Unicode.
>>
>>  - Since Unicode will be updated fairly frequently, we also want to allow new
>>  characters to be used as soon as they are defined.
>>  - We do this by updating NamePrep.
>>  - We want to allow the maximal compatibility between systems running
>>  different versions of NamePrep.
>>
>>  Mechanism.
>>
>>  In a particular version of NamePrep, the following lists of code points can
>>  be generated based on the version of Unicode and the mapping/prohibition
>  > tables.
>>
>>  AIO. code points allowed in the input and in the output
>>  AI. code points allowed in the input, but not in the output
>>  D. assigned code points that are disallowed completely (input and output --
>>  includes noncharacters, unpaired surrogates, etc.)
>>  U. unassigned code points
>>
>>  Note: the reason that AI exists is that some characters will disappear or be
>>  transformed in mapping or normalization, so they can appear in the input,
>>  but will never be in the output.
>>
>>  In any subsequent version of NamePrep, because of updates to Unicode, code
>>  points from U will move to D, AI or AIO.
>>
>>  Policies.
>>
>>  Registrars are forbidden to register any IDNs containing code points outside
>>  of AIO for the latest version of Unicode / NamePrep. That is, they are
>>  forbidden to register any IDNs containing AI, D or U code points. (In
>>  addition, the allowable names must be in canonical order!)
>>
>>  Clients should treat U code points as if they were AIO as they are
>processing
>>  IDNs as a part of NamePrep. Some certain applications might though be
>>  implemented to treat them as U, or AIO after first warning the user
>>  about the fact that the character is of class U -- all based on the
>>  use of the domainname in the specific application.
>>
>>  Intermediaries may reject names that are not in canonical order, or that
>>  contain code points that are in their versions of AI or D, but must not
>>  reject names for containing U.
>>
>>     - Character X moved to AIO. By passing the characters through as is,
>>       the client will end up at the correct service.
>>     - Character X is normalized to character Q and therefore the character
>>       is moved to AI. If the user enter character Q, he will end up at the
>>       correct service, but if he enters X, he will not reach any service at
>>       all, as X can not exist in a registered domainname.
>>     - Character X is moved into D. This can not exist in any domainname
>>       either, so entering X makes the service not reach any service.
>>     - Characters XY is specified to be ordered YX. If the user enters
>>       YX, he will reach the correct service, but no domainname will be
>>       registered with the characters in the order XY, so entering XY
>>       will not make the user reach any service.
>>
>>  As we see in the table above, what happens if the client enters class
>>  U characters which are registered in a newer version of the NamePrep
>>  document is that the client either reach the domainname that was
>>  intended, or no domain at all (even though he should, and this is
>>  because normalization and ordering rules are not updated in the
>>  client).
>>
>>  In no case will the client reach a domainname which is registered as
>>  a different domain than the one which the client is attempting to
>>  reach.
>>
>>  Scenarios.
>>
>>  This will provide for compability in the following ways:
>>
>>  A. Suppose that a client or intermediary is on Unicode 3.1 and the site is
>>  on Unicode 3.0. This case is simple: there will be no domains on the site
>>  that can't be accessed by the client, since the client uses a superset of
>>  the code points accepted by the site.
>>
>>  B. Suppose that a client or intermediary is on Unicode 3.0 and the site is
>>  on Unicode 3.1. Because the client NamePrep passed through any unassigned
>>  character, the user can access domains on the site that use characters in
>>  Unicode 3.1. No domains on the site can have code points that are unassigned
>>  in 3.1, since that is illegal.
>>
>>  The restrictions in case B are that the client has to type the characters in
>>  the right order, and has to use the post-mapped, post-normalized code
>>  points.
>>
>>  Example1: domain is XYZW.com. Y and Z are combining marks, in canonical
>>  order. Y is not in Unicode 3.0. If the client is on 3.1, XZYW will normalize
>>  to XYZW.com, so the user can type either one. The 3.0 client must type
>>  precisely XZYW.com or the name will be rejected.
>>
>>  Example2: domain is BCDE.com. C is the normalized form of Q. A Unicode 3.1
>>  client can type either BQDE.com or BCDE.com; both will work. The 3.0 client
>>  can only type BCDE.com.
>>
>>  --
>>



--