[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Agenda Item for next UTC: Normalizing CaseMapping

To: unicore@unicode.org
Subject: Re: [idn] Re: Agenda Item for next UTC: Normalizing CaseMapping
From: James Seng <jseng@pobox.org.sg>
Date: Sat, 19 Feb 2000 05:56:33 +0800
CC: idn@ops.ietf.org
Delivery-date: Fri, 18 Feb 2000 13:59:24 -0800
Envelope-to: idn-data@psg.com

Yes, now you getting it but not completely. 

I agreed that ISO10646 defines characters, not semantic meaning.

But you can consider it 'accidential' that almost every word in Chinese has a
meaning because that is the nature of the language. ditto for kanji.

Secondly, I argued that U+7535 = U+96FB is not because its semantic meaning
but rather it is traditional-simplified form relationship. There are ideograms
which have same semantic meaning but must not be mapped. For example, "wu"
U+5C4B has the similar meaning as my surname "zhuang" U+838A both meaning
"house" but obviously they should not be equivalent.

The argumnent that it is language is also not fair. If there is a comparsion
rules for 'A' = 'a' (Latin), I = dotless i (Turkish), then there should be a
comparsion rules which defines U+7535 = U+96FB. Just the same problem of
dotless i Turkish get 1000x more complicated for CJK.

If UC wish to define a generic comparison rule for its CES, then it should
fairly do so for all languages, and not just some languages. If it only does
some, then it is as good as useless since it cannot be used generically across
the board. For one, a UTR21 which only handles European/Latin language is
definately useless to Asian, especially for domain names.

Btw, traditional/simplified are not locale based. Malaysia, where I am borned,
use both traditional and simplified characters whichever is conveniant to us
altho tended towards traditional form. this is why i get to learn both form. 

If given a choice to restart from scatch, one idea is to represent each code
point by the ideogram 'form' but leave it to the client to rendering it as
traditional or simplified glyph. Unfortunately, this is not the case...

-James Seng

Paul Hoffman / IMC wrote:
> It still sounds like you are describing pictographic synonyms. "Lightning"
> and "lightning with rain above" both *meaning* the same thing doesn't mean
> that there should be a way to convert from one to another automatically. As
> you have shown, that conversion is language- (and possibly local- ) specific.
> 
> The only place where I could imagine that this comes up is when you are not
> looking at the glyphs themselves. If someone says in Chinese "lightning dot
> com", then the person trying to use that domain name might not know which
> characters to use. But this is the same as saying "Dürst dot com" and the
> person not know if that was spelled Dürst or Duerst.
> 
> 10646/Unicode define characters, not semantic meanings. There is some grey
> area there, but there may not be a good reason to expand that grey area. It
> sounds like an interesting and thorny research topic, but not one that has
> much chance of completing any time soon.
> 
> --Paul Hoffman, Director
> --Internet Mail Consortium

Prev by Date: Re: [idn] Re: Agenda Item for next UTC: Normalizing CaseMapping
Next by Date: Trad/simplified (Re: [idn] Re: Agenda Item for next UTC: Normalizing CaseMapping)
Prev by thread: Re: [idn] Re: Agenda Item for next UTC: Normalizing CaseMapping
Next by thread: [idn] Re: Agenda Item for next UTC: Normalizing Case Mapping
Index(es):
- Date
- Thread