[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] First report from IDN nameprep design team



I repeat my request to ignore certain Hebrew characters, namely points and
accents, i.e. remove them during nameprep.

I think some Arabic experts have indicated that this should be applied also for
Arabic.

Jony

> -----Original Message-----
> From: owner-idn@ops.ietf.org [mailto:owner-idn@ops.ietf.org]On Behalf
> Of James Seng/Personal
> Sent: Thursday, December 07, 2000 6:33 AM
> To: idn@ops.ietf.org
> Subject: [idn] First report from IDN nameprep design team
>
>
> To the IDN WG:
>
> The IDN nameprep design team has been studying the nameprep document,
> and we propose the following changes. We are not finished with our
> work, but want to report our progress and hear input from the WG. Of
> course, this will be discussed heavily in San Diego next week, and a
> new version of the nameprep draft can be made ready before the end of
> December on the points for which there is general agreement.
>
> 1) It is difficult and probably not useful to try to prohibit
>    characters that might cause confusion because they look like other
>    characters or because they might be accidentally entered by users.
>    Therefore, the next list of prohibited characters will be
>    significantly smaller. For example, compatibility characters (which
>    are common for Arabic and Asian scripts) would be allowed on input.
>
> 2) The order of the steps for nameprep will be changed from
>      prohibit -> fold -> normalize
>    to
>      map -> normalize -> prohibit
>
>    This new order has many advantages. It allows many more characters to
>    be input to the nameprep process without returning errors because
>    those characters will get converted by the normalization step into
>    allowed characters. It also allows the mapping step to fix edge-case
>    problems before they get to the normalization step, as described in
>    the next point.
>
> 3) So far, the mapping step in nameprep only maps uppercase
>    characters to lowercase. The compatibility normalization step does
>    the work of converting compatibility characters into their normal
>    forms, but there are other sets of characters that the input
>    mechanisms on users' systems might enter that can be mapped to other
>    characters. For example, there are many different hyphen characters
>    (such as U+00AD, soft hyphen) that do not get normalized but can all
>    be mapped into the single hyphen character that is already allowed by
>    STD 13. Also, with the new order suggested above, there are some
>    special cases for case-mapping that need to be added so that all
>    characters case-map as expected. Some characters might be mapped to
>    nothing, meaning that they will simply be ignored on input; for
>    example, some of the non-displaying characters that are currently
>    prohibited might instead be mapped out of the input stream instead of
>    causing an error. The mapping step will be specified as a single
>    table of mappings so that implementors don't have to create the table
>    themselves from disparate sources.
>
> 4) Doing case-folding from the Unicode data table does not handle all
>    cases of folding. The mechanism for mapping to lowercase will
>    instead be derived from the CaseFolding.txt file. (See UTR 21 from
>    the Unicode Consortium for more details.)
>
> 5) Non-character codepoints will be listed as prohibited characters.
>
> 6) The question of where to do name preparation will be removed from
>    this document, but must be addressed in the eventual IDN protocol
>    document.
>
> 7) Change the word "canonicalize" to "normalize".
>
>
>