[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] 1st stringprep issue:




From now on, i would like to remove "not answered and ignored" in the subject line of this thread.

Unicode 3.2 came out late March, and IDNA drafts went to IESG early March.
The authors and this WG had not enough time and opportunities to discuss about the issues.
Casefolding standards are out of control of IETF and often are not in sync with IETF WG activities.
I have heard this kind of answer.

But, I would point out that there have been  hastes in proceeding with current drafts set
while relevant UTC works were in progress ,without even mentioning the existence of the
serious issues described below. And the *selection* of one version/candidate among many versions/options
of casefolding/unicode standards have been always under control of the authors and this WG. And So does the
*time* when to proceed.

Pressures from outside often result in hastes and mistakes.


----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "Mark Davis" <mark@macchiato.com>; "Dan Oscarsson" <Dan.Oscarsson@trab.se>; <idn@ops.ietf.org>; <phoffman@imc.org>;
<david.hopwood@zetnet.co.uk>
Sent: Tuesday, May 07, 2002 1:18 AM
Subject: Re: [idn] 1st stringprep issue: not answered and ignored


> I read in http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt
>
> <quote>
> 0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE
> 0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE
>
> # F: full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces.
> # T: special case for uppercase I and dotted uppercase I
> #    - For non-Turkic languages, this mapping is normally not used.
> #    - For Turkic languages (tr, az), this mapping can be used instead of the normal mapping for these characters.
> </quote>
>
> if we choose  F,
>   <I dotabove> and <I><dot above> are casefolded into the same <i><dot above>
>
> If we choose  T, we get different outputs.
>   <I dot above> --> <i>,  <I><dot above>  --> <i><dot above>
>
> Even Option F makes this trouble:
>
>  In Turkic language, <I dotabove> and <i> form the bicameral pair.
>  If turkish people enter an IDN  ???<I dotabove>.com, they could not reach  ???<i>.com.
>  At this very point, locale-independence objective of Stringprep casefolding is not fulfilled.
>  <i><dotabove> and <i> should be unified into either of them in any locale-independent
>   casefolding. You can find about "locale-independce/non-contextual" objectives in UAX#21,
>   in the example of <I> -> <dotless small i> --> <i> casefolding. <dotless i> and <i> are *NOT* the
>   bicameral pair, but for transitive case-insensitive equivalence, those two are unified into <i> in UAX#21.
>
> Stringprep should address this issue, and
> Next CaseFOlding-3.2.?.txt should clarify more about the rquired locale-indepenent/non-contextual
> casefoldings for <I dotabove>.
>
>
> Soobok Lee
>
> ----- Original Message -----
> From: "Mark Davis" <mark@macchiato.com>
> To: "Dan Oscarsson" <Dan.Oscarsson@trab.se>; <idn@ops.ietf.org>; <phoffman@imc.org>; <lsb@postel.co.kr>;
> <david.hopwood@zetnet.co.uk>
> Sent: Tuesday, May 07, 2002 12:37 AM
> Subject: Re: [idn] 1st stringprep issue: not answered and ignored
>
>
> > The Unicode Consortium recommends that the tables in StringPrep be
> > updated to encompass Unicode 3.2, which was released in March.
> >
> > As a part of this release, there was one change (in addition to new
> > characters) in case folding. The situation regarding the
> > dotted/dotless I in the case foldings has been cleaned up by providing
> > several options, one of which (full case folding without option T)
> > preserves canonical equivalence (although not normalization forms --
> > text still needs to be normalized after case folding).
> >
> > http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt
> >
> > Mark
> > __________
> >
> > http://www.macchiato.com
> >
> > "Eppur si muove"
> >
> > ----- Original Message -----
> > From: "Dan Oscarsson" <Dan.Oscarsson@trab.se>
> > To: <idn@ops.ietf.org>; <phoffman@imc.org>; <lsb@postel.co.kr>;
> > <mark@macchiato.com>; <david.hopwood@zetnet.co.uk>
> > Sent: Monday, May 06, 2002 01:57
> > Subject: Re: [idn] 1st stringprep issue: not answered and ignored
> >
> >
> > >
> > > The point that Soobok Lee shows is a very serious matter.
> > > The requirement on the ACE form of IDNA is that the same
> > > name must always result in the same ACE!!!!
> > >
> > > If doing casefolding/mapping followed by NFKC results in a
> > > different code point sequence than doing NFC, casefolding/mapping
> > and
> > > NFKC again, we will get DNS lookup failures due to names do not
> > > match. While hopefully most data entered into stringprep will
> > > be NFC, some will not.
> > >
> > > If the above is true, stringprep/nameprep must be changed so that
> > > the preparation steps for strings are:
> > >
> > > 1) See to that input strings is NFC.
> > >
> > > 2) all the steps in stringprep.
> > >
> > >
> > >     Dan
> > >
> > > --
> > > Below i Soobok Lee's text:
> > > >UTC casefolding (UAX21) is made for char-by-char casefolding,  not
> > for
> > > >combining sequences, but stringprep blindly applies UAX21 into
> > them.
> > > >That is not the problem of UAX21, rather  of the stringprep.
> > > >
> > > >NFCing before casefolding solves this problem, but this suggestion
> > > >was also ignored or not discussed in depth.
> > > >
> > > >Without any modificationa to UAX21 and NFKC and NFC, we could cure
> > > >this <I><dot above> stringprep errors, simply by adding NFC in
> > > >step zero in stringprep.
> > >
> > >
>