[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Versions of Nameprep



I'll use the name "Stable Nameprep" for Patrik's idea as elaborated in
email, just to have a name for it in discussion. The features are:

- All unassigned code points (for the version of Unicode used) are "passed
through" in nameprep on client software.
- All unassigned code points (for the version of Unicode used) are
prohibited in nameprep on servers (in registered names).
- Any new version of Stable Nameprep will *only* move code points from class
U to classes D, AI or AIO: it will not move code points between D, AI or
AIO.

James, let's look at an example. Suppose that half-width and full-width
katakana had not been in Unicode version X, and is added to Unicode version
Y. (Remember that half-width katakana is normalized to full-width). For
those not familiar with Unicode, this is *only* an illustration. Both
half-width and full-width are encoded, and have been for some time.

 The implications of stable nameprep are:

- If the server is on Unicode version Y, a full-width katakana name which is
the result of nameprep Y can be registered. (If the server were on version
X, then neither half-width nor full-width are allowed.)

- If your client is on Unicode version X, you can use the full-width IDN.
Even though those characters are in class U on your machine, they will 'pass
through' name prep and get to the server.

- If you are a web-page designer you can use a stable nameprep IDN on your
page, once you make sure it is accepted by the server. Since it is stable
once nameprepped, every client will be able to use it to access the server.

The only restriction is that on the version X client, you couldn't use the
half-width kana. They are in class U on that machine, and will not be
normalized to full-width. However, and this is important, there is a
representation that will correctly reach the server, using the full-width
characters! Once your client is upgraded to Y, then you could type either
one.

Notice also, that unless the character is not already in Unicode *and* will
be in class AI, this is not a problem. Unicode already covers a huge
percentage of characters in use in the world. Any new ones will be
relatively infrequent, and the percentage of new ones that will be in class
AI (as opposed to AIO) will be even smaller. CJK characters, for example,
will be almost completely unaffected.

In light of this, I hardly think these are unreasonable restrictions on
version X software.

Mark

For information on the upcoming characters, see:
http://www.unicode.org/unicode/alloc/Pipeline.html

----- Original Message -----
From: "Patrik Fältström" <paf@cisco.com>
To: "James Seng/Personal" <James@Seng.cc>; <idn@ops.ietf.org>
Sent: Sunday, December 17, 2000 22:05
Subject: Re: [idn] Versions of Nameprep


> At 11.46 +0800 00-12-18, James Seng/Personal wrote:
> >It appears that this email comes from the perspective that there is no
need
> >for versioning in nameprep because ...see below... am I right?
>
> Correct. But as I said at the meeting, this probably should have been
> verified with the nameprep design team before posting it here. It
> might be the case that there is a bug hiding which I can not see...
>
> >I have seen the diagram Mark draw on AIO, AI, D and U. You are right on
most
> >point except on problem may occurs when new characters are assigned in
"AI"
> >zone.
> >
> >You mention:
> >>     - Character X is normalized to character Q and therefore the
character
> >>       is moved to AI. If the user enter character Q, he will end up at
the
> >>       correct service, but if he enters X, he will not reach any
service at
> >>       all, as X can not exist in a registered domainname.
> >
> >This would mean that the user would require to know exactly what (version
of?)
> >nameprep his application used in order to know if he can type in X or
should
> >he be specific and type it in Q. A typical end-user is not going to
recongize
> >the differences and probably start to wonder why it fails.
>
> The key point is that people which previously have been talking about
> versioning  have said that characters in class U should be forbidden.
> That means that in the case above, regardless of if the user typed X
> or Q would work (if both X and Q were in class U in previous versions
> of Unicode).
>
> According to my proposal, the client doesn't have to keep track of
> versions, and the user will at least get a hit on the correct domain
> in 50% of the cases, which I claim is better than if the client have
> to keep track of versions and not get any hit at all.
>
> >However, this does not means I think we need tag nameprep with version in
the
> >protocol or ACE. As you correctly pointed out, this could done by
versioning
> >in software/applications but a proper documented nameprep version is
still
> >going to be useful.
>
> The key thing is that the nameprep rfc have to be updated / obsoleted
> every time a new version of Unicode is released, and applications
> have to say (as always) what RFCs (what version of Nameprep) they
> handle.
>
> I wrote the proposal because I see a problem with versioning itself
> in the protocol -- especially when using some ACE encoding, and I
> think it is bad if an application using one version of nameprep based
> on for example Unicode 3.0 is prohibited to use the characters added
> by Unicode 3.1. If we have versioning, this will be the case.
>
> At least in my mind, and I am so far not convinced that I am wrong.
>
>      paf
>
>
> >
> >-James Seng
> >
> >
> >----- Original Message -----
> >From: "Patrik Fältström" <paf@cisco.com>
> >To: <idn@ops.ietf.org>
> >Sent: Thursday, December 14, 2000 5:21 AM
> >Subject: [idn] Versions of Nameprep
> >
> >
> >>  (This should probably have been discussed in the nameprep design team
> >>  first, and that was my intention, after the IETF, but now when this
> >>  was discussed I'll post it to the whole IDN mailing list instead...)
> >>
> >>  Summary:
> >>
> >>       By introducing 2 different policies for use of unassigned
codepoints
> >>       in Unicode, i.e. codepoints being unassigned according to
> >>       newest version of the NamePrep specification.we do not
> >>       need versioning in the IDN protocol.
> >>
> >>      (a) Registries are never registering domains with unassigned
> >>          codepoints.
> >>      (b) Clients let unassigned codepoints pass through without
> >>          any modification.
> >>
> >>  Now, here is the full rationale:
> >>
> >>  Versions of Unicode.
> >>
> >>  - Since Unicode will be updated fairly frequently, we also want to
allow new
> >>  characters to be used as soon as they are defined.
> >>  - We do this by updating NamePrep.
> >>  - We want to allow the maximal compatibility between systems running
> >>  different versions of NamePrep.
> >>
> >>  Mechanism.
> >>
> >>  In a particular version of NamePrep, the following lists of code
points can
> >>  be generated based on the version of Unicode and the
mapping/prohibition
> >  > tables.
> >>
> >>  AIO. code points allowed in the input and in the output
> >>  AI. code points allowed in the input, but not in the output
> >>  D. assigned code points that are disallowed completely (input and
output --
> >>  includes noncharacters, unpaired surrogates, etc.)
> >>  U. unassigned code points
> >>
> >>  Note: the reason that AI exists is that some characters will disappear
or be
> >>  transformed in mapping or normalization, so they can appear in the
input,
> >>  but will never be in the output.
> >>
> >>  In any subsequent version of NamePrep, because of updates to Unicode,
code
> >>  points from U will move to D, AI or AIO.
> >>
> >>  Policies.
> >>
> >>  Registrars are forbidden to register any IDNs containing code points
outside
> >>  of AIO for the latest version of Unicode / NamePrep. That is, they are
> >>  forbidden to register any IDNs containing AI, D or U code points. (In
> >>  addition, the allowable names must be in canonical order!)
> >>
> >>  Clients should treat U code points as if they were AIO as they are
> >processing
> >>  IDNs as a part of NamePrep. Some certain applications might though be
> >>  implemented to treat them as U, or AIO after first warning the user
> >>  about the fact that the character is of class U -- all based on the
> >>  use of the domainname in the specific application.
> >>
> >>  Intermediaries may reject names that are not in canonical order, or
that
> >>  contain code points that are in their versions of AI or D, but must
not
> >>  reject names for containing U.
> >>
> >>     - Character X moved to AIO. By passing the characters through as
is,
> >>       the client will end up at the correct service.
> >>     - Character X is normalized to character Q and therefore the
character
> >>       is moved to AI. If the user enter character Q, he will end up at
the
> >>       correct service, but if he enters X, he will not reach any
service at
> >>       all, as X can not exist in a registered domainname.
> >>     - Character X is moved into D. This can not exist in any domainname
> >>       either, so entering X makes the service not reach any service.
> >>     - Characters XY is specified to be ordered YX. If the user enters
> >>       YX, he will reach the correct service, but no domainname will be
> >>       registered with the characters in the order XY, so entering XY
> >>       will not make the user reach any service.
> >>
> >>  As we see in the table above, what happens if the client enters class
> >>  U characters which are registered in a newer version of the NamePrep
> >>  document is that the client either reach the domainname that was
> >>  intended, or no domain at all (even though he should, and this is
> >>  because normalization and ordering rules are not updated in the
> >>  client).
> >>
> >>  In no case will the client reach a domainname which is registered as
> >>  a different domain than the one which the client is attempting to
> >>  reach.
> >>
> >>  Scenarios.
> >>
> >>  This will provide for compability in the following ways:
> >>
> >>  A. Suppose that a client or intermediary is on Unicode 3.1 and the
site is
> >>  on Unicode 3.0. This case is simple: there will be no domains on the
site
> >>  that can't be accessed by the client, since the client uses a superset
of
> >>  the code points accepted by the site.
> >>
> >>  B. Suppose that a client or intermediary is on Unicode 3.0 and the
site is
> >>  on Unicode 3.1. Because the client NamePrep passed through any
unassigned
> >>  character, the user can access domains on the site that use characters
in
> >>  Unicode 3.1. No domains on the site can have code points that are
unassigned
> >>  in 3.1, since that is illegal.
> >>
> >>  The restrictions in case B are that the client has to type the
characters in
> >>  the right order, and has to use the post-mapped, post-normalized code
> >>  points.
> >>
> >>  Example1: domain is XYZW.com. Y and Z are combining marks, in
canonical
> >>  order. Y is not in Unicode 3.0. If the client is on 3.1, XZYW will
normalize
> >>  to XYZW.com, so the user can type either one. The 3.0 client must type
> >>  precisely XZYW.com or the name will be rejected.
> >>
> >>  Example2: domain is BCDE.com. C is the normalized form of Q. A Unicode
3.1
> >>  client can type either BQDE.com or BCDE.com; both will work. The 3.0
client
> >>  can only type BCDE.com.
> >>
> >>  --
> >>
>
>
>
> --
>