[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] 1st stringprep issue: not answered and ignored



I read in http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt

<quote>
0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE

# F: full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces.
# T: special case for uppercase I and dotted uppercase I
#    - For non-Turkic languages, this mapping is normally not used.
#    - For Turkic languages (tr, az), this mapping can be used instead of the normal mapping for these characters.
</quote>

if we choose  F,
  <I dotabove> and <I><dot above> are casefolded into the same <i><dot above>

If we choose  T, we get different outputs.
  <I dot above> --> <i>,  <I><dot above>  --> <i><dot above>

Even Option F makes this trouble:

 In Turkic language, <I dotabove> and <i> form the bicameral pair.
 If turkish people enter an IDN  ???<I dotabove>.com, they could not reach  ???<i>.com.
 At this very point, locale-independence objective of Stringprep casefolding is not fulfilled.
 <i><dotabove> and <i> should be unified into either of them in any locale-independent
  casefolding. You can find about "locale-independce/non-contextual" objectives in UAX#21,
  in the example of <I> -> <dotless small i> --> <i> casefolding. <dotless i> and <i> are *NOT* the
  bicameral pair, but for transitive case-insensitive equivalence, those two are unified into <i> in UAX#21.

Stringprep should address this issue, and
Next CaseFOlding-3.2.?.txt should clarify more about the rquired locale-indepenent/non-contextual
casefoldings for <I dotabove>.


Soobok Lee

----- Original Message -----
From: "Mark Davis" <mark@macchiato.com>
To: "Dan Oscarsson" <Dan.Oscarsson@trab.se>; <idn@ops.ietf.org>; <phoffman@imc.org>; <lsb@postel.co.kr>;
<david.hopwood@zetnet.co.uk>
Sent: Tuesday, May 07, 2002 12:37 AM
Subject: Re: [idn] 1st stringprep issue: not answered and ignored


> The Unicode Consortium recommends that the tables in StringPrep be
> updated to encompass Unicode 3.2, which was released in March.
>
> As a part of this release, there was one change (in addition to new
> characters) in case folding. The situation regarding the
> dotted/dotless I in the case foldings has been cleaned up by providing
> several options, one of which (full case folding without option T)
> preserves canonical equivalence (although not normalization forms --
> text still needs to be normalized after case folding).
>
> http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt
>
> Mark
> __________
>
> http://www.macchiato.com
>
> "Eppur si muove"
>
> ----- Original Message -----
> From: "Dan Oscarsson" <Dan.Oscarsson@trab.se>
> To: <idn@ops.ietf.org>; <phoffman@imc.org>; <lsb@postel.co.kr>;
> <mark@macchiato.com>; <david.hopwood@zetnet.co.uk>
> Sent: Monday, May 06, 2002 01:57
> Subject: Re: [idn] 1st stringprep issue: not answered and ignored
>
>
> >
> > The point that Soobok Lee shows is a very serious matter.
> > The requirement on the ACE form of IDNA is that the same
> > name must always result in the same ACE!!!!
> >
> > If doing casefolding/mapping followed by NFKC results in a
> > different code point sequence than doing NFC, casefolding/mapping
> and
> > NFKC again, we will get DNS lookup failures due to names do not
> > match. While hopefully most data entered into stringprep will
> > be NFC, some will not.
> >
> > If the above is true, stringprep/nameprep must be changed so that
> > the preparation steps for strings are:
> >
> > 1) See to that input strings is NFC.
> >
> > 2) all the steps in stringprep.
> >
> >
> >     Dan
> >
> > --
> > Below i Soobok Lee's text:
> > >UTC casefolding (UAX21) is made for char-by-char casefolding,  not
> for
> > >combining sequences, but stringprep blindly applies UAX21 into
> them.
> > >That is not the problem of UAX21, rather  of the stringprep.
> > >
> > >NFCing before casefolding solves this problem, but this suggestion
> > >was also ignored or not discussed in depth.
> > >
> > >Without any modificationa to UAX21 and NFKC and NFC, we could cure
> > >this <I><dot above> stringprep errors, simply by adding NFC in
> > >step zero in stringprep.
> >
> >