[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Agenda Item for next UTC: Normalizing Case Mapping



Brendan Murray/DUB/Lotus wrote:
> James Seng wrote:
> > Would it also be out of range if we consider case folding for Asian
> language?
> > Simplified-Tradition Chinese. Simplified-Tradition Japanese. or Hiragana,
> > Katangana single full/half width etc etc...
> 
> The above are, I believe, beyond the scope of casing: they are, however,
> admirable suggestions and should be addressed. The width and kana mappings
> should be pretty much given, although I suspect that the normalization of
> Han characters may prove to be somewhat more contentious.

Consider the following domain name.

U+7535 U+90AE '.' U+53F0 U+6E7E  (mean email.taiwan in Chinese)

It can also be represented in the traditional form

U+96FB U+90F5 '.' U+81FA U+7063

To say U+7535 U+90AE '.' U+53F0 U+6E7E != U+96FB U+90F5 '.' U+81FA U+7063 is
as good as saying email.tw != EMAIL.TW.

But why should UC bother with Chinese 'case' folding? Afterall, this is a
problem unique to DNS and we should let it be handled it in DNS aliasing via
DNAME and CNAME, e.g

U+96FB U+90F5 '.' U+81FA U+7063 IN DNAME U+7535 U+90AE '.' U+53F0 U+6E7E

And lets try repeat it 2^3 times for different permuation...I guess we are
lucky since we need to repeat only 8 times for this name. Perhaps we might be
even luckiler for other longer name.

I agreed this is not 'case folding' as one would normally associate with the
meaning of 'case'. But the problems are as real as I = dotless i. It is going
to be very difficult to address this issue, especially with CJK unification
(what is considered equivalent in Chinese may not be so in Japanese or
Korean). But it is definately better to address it in UC than in IDN-WG, IMHO.

-James Seng