[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Versions of Nameprep



At 21.30 -0500 00-12-13, Hollenbeck, Scott wrote:
>If "Unicode will be updated fairly frequently", are you suggesting that
>nameprep ultimately be documented in a base RFC and then a chain of "update"
>RFCs that track the Unicode updates?

The NamePrep RFC should be either updated or obsoleted. That is correct.

>Couldn't that become unwieldy over
>time, especially if the update rate is "frequent"?

I think we talk about a frequence of about once a year.

>Could there be another
>way to track codepoint-specific nameprep updates, perhaps through a service
>provided by IANA?

Maybe, but just because the question on what codepoints are allocated 
or not (and because of that what is possible to register as a 
domainname) is such an important question, I would belive that both 
IANA and IESG would suggest handling the registration through an RFC. 
This is the case with many other registrations which is of such 
importance which IANA does. I.e. the registration done by IANA must 
be documented by an RFC on a certain track.

I am happy, and prepared of having this discussion with the IANA, 
IESG and IAB the day we think a scheme like this is viable. First 
though, I want people in the IDN wg think about what I wrote 
regarding behaviour of clients and servers which use different 
versions of NamePrep (because this will happen). As a follow on 
question the registration issues should be discussed. So, don't 
forget to bring this up (again) when it is time for it.

    paf


>
>Scott Hollenbeck
>VeriSign Global Registry Services
>
>-----Original Message-----
>From: Patrik Fältström [mailto:paf@cisco.com]
>Sent: Wednesday, December 13, 2000 4:21 PM
>To: idn@ops.ietf.org
>Subject: [idn] Versions of Nameprep
>
>
>(This should probably have been discussed in the nameprep design team
>first, and that was my intention, after the IETF, but now when this
>was discussed I'll post it to the whole IDN mailing list instead...)
>
>Summary:
>
>      By introducing 2 different policies for use of unassigned codepoints
>      in Unicode, i.e. codepoints being unassigned according to
>      newest version of the NamePrep specification.we do not
>      need versioning in the IDN protocol.
>
>     (a) Registries are never registering domains with unassigned
>         codepoints.
>     (b) Clients let unassigned codepoints pass through without
>         any modification.
>
>Now, here is the full rationale:
>
>Versions of Unicode.
>
>- Since Unicode will be updated fairly frequently, we also want to allow new
>characters to be used as soon as they are defined.
>- We do this by updating NamePrep.
>- We want to allow the maximal compatibility between systems running
>different versions of NamePrep.
>
>Mechanism.
>
>In a particular version of NamePrep, the following lists of code points can
>be generated based on the version of Unicode and the mapping/prohibition
>tables.
>
>AIO. code points allowed in the input and in the output
>AI. code points allowed in the input, but not in the output
>D. assigned code points that are disallowed completely (input and output --
>includes noncharacters, unpaired surrogates, etc.)
>U. unassigned code points
>
>Note: the reason that AI exists is that some characters will disappear or be
>transformed in mapping or normalization, so they can appear in the input,
>but will never be in the output.
>
>In any subsequent version of NamePrep, because of updates to Unicode, code
>points from U will move to D, AI or AIO.
>
>Policies.
>
>Registrars are forbidden to register any IDNs containing code points outside
>of AIO for the latest version of Unicode / NamePrep. That is, they are
>forbidden to register any IDNs containing AI, D or U code points. (In
>addition, the allowable names must be in canonical order!)
>
>Clients should treat U code points as if they were AIO as they are
>processing
>IDNs as a part of NamePrep. Some certain applications might though be
>implemented to treat them as U, or AIO after first warning the user
>about the fact that the character is of class U -- all based on the
>use of the domainname in the specific application.
>
>Intermediaries may reject names that are not in canonical order, or that
>contain code points that are in their versions of AI or D, but must not
>reject names for containing U.
>
>    - Character X moved to AIO. By passing the characters through as is,
>      the client will end up at the correct service.
>    - Character X is normalized to character Q and therefore the character
>      is moved to AI. If the user enter character Q, he will end up at the
>      correct service, but if he enters X, he will not reach any service at
>      all, as X can not exist in a registered domainname.
>    - Character X is moved into D. This can not exist in any domainname
>      either, so entering X makes the service not reach any service.
>    - Characters XY is specified to be ordered YX. If the user enters
>      YX, he will reach the correct service, but no domainname will be
>      registered with the characters in the order XY, so entering XY
>      will not make the user reach any service.
>
>As we see in the table above, what happens if the client enters class
>U characters which are registered in a newer version of the NamePrep
>document is that the client either reach the domainname that was
>intended, or no domain at all (even though he should, and this is
>because normalization and ordering rules are not updated in the
>client).
>
>In no case will the client reach a domainname which is registered as
>a different domain than the one which the client is attempting to
>reach.
>
>Scenarios.
>
>This will provide for compability in the following ways:
>
>A. Suppose that a client or intermediary is on Unicode 3.1 and the site is
>on Unicode 3.0. This case is simple: there will be no domains on the site
>that can't be accessed by the client, since the client uses a superset of
>the code points accepted by the site.
>
>B. Suppose that a client or intermediary is on Unicode 3.0 and the site is
>on Unicode 3.1. Because the client NamePrep passed through any unassigned
>character, the user can access domains on the site that use characters in
>Unicode 3.1. No domains on the site can have code points that are unassigned
>in 3.1, since that is illegal.
>
>The restrictions in case B are that the client has to type the characters in
>the right order, and has to use the post-mapped, post-normalized code
>points.
>
>Example1: domain is XYZW.com. Y and Z are combining marks, in canonical
>order. Y is not in Unicode 3.0. If the client is on 3.1, XZYW will normalize
>to XYZW.com, so the user can type either one. The 3.0 client must type
>precisely XZYW.com or the name will be rejected.
>
>Example2: domain is BCDE.com. C is the normalized form of Q. A Unicode 3.1
>client can type either BQDE.com or BCDE.com; both will work. The 3.0 client
>can only type BCDE.com.
>
>--



--