[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Requirements I-D





John C Klensin wrote:

> --On Tuesday, 16 May, 2000 19:48 -0600 mark.davis@us.ibm.com
> wrote:
>
> > Perhaps you misunderstand me (or I you).
> >
> > The problem as stated to me was how to extend the current
> > caseless comparison of domain names to the full range of
> > Unicode characters. My answer assumes that  DNS names are
> > *stored* in a case-folded manner. Whatever the user typed in
> > would also be case-folded when by the time the strings are
> > compared for identity. I was not suggesting that every possible
> > match be tried!
>
> Understood.
>
> > I am also at all not arguing that there *should be*
> > country-specific case folding; entirely the contrary -- that
> > there be a single case-folding algorithm that is
> > language-independent.
>
> I don't believe, for reasons that others have explored on the
> list, that a "case-folding algorithm" (in the "drop one bit in
> ASCII" sense) is feasible.  Instead, one would need mapping

Did I say "drop one bit"? No. Correct case-folding requires tables. All
case mappings are irreversible -- they lose data: look at the word
"McGowan". For more information, see
http://www.unicode.org/unicode/reports/tr21/

> tables and would need to understand that some of the "foldings"
> are not reversible (e.g., ones that remove accents or
> diacriticals when going from "lower case" to "upper case".
> Those mapping tables would presumably need to be updated each
> time a character was added to the code set, so they had probably
> best be embedded in the DNS files/tables or, at worst, in the
> servers -- if every client machine on the network needs to be
> updated each time new characters are added, I think we have a
> non-solution.

No, they may need to be updated when *cased" characters are added. However,
given the extent to which those are already added, this will be very seldom
if ever.

>
> I'm also concerned about the problem that acceptable
> mapping/folding rules may differ from locale to locale with the
> same characters.  Perhaps that is in the "bad, but not terrible"
> category, but what this is supposed to be all about is letting
> people use their own characters and languages in natural ways.

>From the scenarios I listed in my first message, I don't think the
drawbacks are particularly severe. Rather than discuss this in the
abstract, it'd be much better to propose scenarios where there is a
problem, and look to either resolve them, or decide that the consequences
can be lived with.

>
> So I was looking for a way to deal with the problem without
> embedding the rules in code we'd have trouble changing and that
> was where the "put in everything" idea came from -- just an
> idea, although obviously not a very good one.
>
>     john
>
>
>