[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inputting mixed SC/TC (Re: [idn] A question...)



Hi All:
        The problems of TC/SC related to IDN is only the term --Does we need
comparison of identifier, character by character without to care it is TC or
SC ?
In the Total Listing Table of  Simplified  Characters (簡化字總表) ,  it
define this one way  mapping .
        (U+53eA 只) <-- (U+96bb 隻)
        (U+53eA 只) <-- (U+8879 衹)
In a old book before 1949 ,  you can find  " 衹要..." , "一隻..." are the
traditional using of these characters. If  you want to do reverse mapping
from SC to TC ,  these 1-n pair can be very easy  to do it ,  If  you can
accept that table is one way mapping .
        (U+53ea 只)  --> (U+53ea 只)
In Taiwan ,  you write "只要..." , "一只...." ,  we all know it .  Only a
Chinese Literature Expert will argue  "一只碗" , "一隻鳥"  are different ,
it is used in literature not a simple identifier od domain name.
       If  (U+53ea, U+96bb, U+8879) are not accepted as equivalent set ,
why not remove it  from the table ?  But as an identifier ,  comparing of
these 3 characters as the same is easy and simple .
       Because there are Japan, Korean are all use these Han characters ,
this example (U+53eA 只)  is more complex, you can find UNICODE 3.1.1
Unihan.txt have all the related variant listed.  From Unihan.txt , it
records each character as following:
       (U+8879)  is the traditional variant of (U+53EA)
       (U+6B62) (U+7957) (U+96BB) are all the variant of (U+53EA)
       (U+53EA) is the simplified variant of (U+8879)

To get an identical variant equivalent  table  may be need  more time in UTC
to do it , but   in  IDN WG we need a presentation method/scheme to help DNS
server can do comparison without to care the variant .
        Whether the variant equivalent table is global or local ,  we all
need
1. An interface or scheme that can  pass the equivalent result and its
related information to IDNA, the current draft need to clarify it.
2. A  scheme or encoding mechanism to treat the equivalent result and
related information to convert them to ACE string and that string can do
comparison without to care variants in LDH-DNS server.
3. To recognize current last call drafts are not enough to solve these
variant problems  related to  IDN identifier. IDN WG must be very careful to
announce the result to avoid the  irreversible deployment of  CJK  IDN.
4. CJK area should find a way/stage to do CJK IDN registration  to reduce
the ambiguity of  variant characters and try to find a way to get a/some
final CJK variant equivalent table(s) to be used in identifier of domain
name.

L.M.Tseng
----- Original Message -----
From: "James Seng/Personal" <jseng@pobox.org.sg>

> Originally send the following as a private mail to you but heck, lets
> open this "profound" topic.
>
> U+53EA is both a TC and a SC, depending how it is use, with two tone,
> "zhi1" or "zhi3".
>
> OTOH, U+986F is a TC only character which have only one tone "zhi1".
>
> When you use U+53EA in "1 little bird" in chinese, "一只小鸟" (SC) "一隻
> 小鳥" (TC) then U+53EA ("zhi1") is a SC of U+96BB ("zhi1").
>
> In the case when you use to U+53EA to mean 'Only' ("zhi3"), then using
> U+96BB ("zhi1") is inappropriate.
>
> In my case, "只顯示BIG5字集" is a valid Traditional Chinese sentence.
>
> This is yet another example why TC-SC is not just a simple 1-to-1
> mapping.
>
> ps: If chinese is "so profound" that "no one (human) can enumarate",
> then should we expect a computer (or DNS) to be able to do so?
>
> -James Seng
>
> ----- Original Message -----
> From: "xiang deng" <deng@cnnic.net.cn>
> To: "Kenneth Whistler" <kenw@sybase.com>
> Cc: <idn@ops.ietf.org>
> Sent: Saturday, February 09, 2002 11:13 AM
> Subject: Re: Inputting mixed SC/TC (Re: [idn] A question...)
>
>
> > Mr. Kenneth,
> >
> > Chinese culuture is so profound,no one can enumerate
> > all case, neither you nor me.
> >
> > The U+53EA is one case of the 1:(n+1) simplifications.
> > It is simplified character.
> >
> > All variants of U+53EA are include in GBK GB13000 GB18030
> > And U+7947 is a TongJiaZhi of U+53EA.
> >
> > If you decide one character is simplified of traditional
> > depend on whether it is been included in BIG5 or GB, you
> > don't understand the TC/SC variant issue.
> >
> > And yes, it's tiresome to keep deny a simple fact.
> > There are objecting argument, I respect it.
> > But, I'm not appreciate the way of deny fact.
> >
> > Regards
> > Deng xiang
> >
> > ----- Original Message -----
> > From: "Kenneth Whistler" <kenw@sybase.com>
> >
> > > Mr. Deng,
> > >
> > > > Hi James,
> > > >
> > > > Maybe you not realize that your following input Chinese string
> > > > is  TC/SC mixed:
> > > > the simplified is "鍙樉绀築IG5瀛楅泦"
> > > > <u+53ea u+663e u+793a u+0042 u+0049 u+0047 u+0035 u+5b57 u+96c6>
> > > >
> > > > the traditional is "闅婚’绀築IG5瀛楅泦"
> > > > <u+96bb u+986f u+793a u+0042 u+0049 u+0047 u+0035 u+5b57 u+96c6>
> > > >
> > > > your input is "鍙’绀築IG5瀛楅泦"
> > > > <u+53ea u+986f u+793a u+0042 u+0049 u+0047 u+0035 u+5b57 u+96c6>
> > >
> > > And perhaps *you* do not realize that U+53EA zhi3 is *also* a
> > > traditional character, and that it contrasts in proper usage
> > > with U+96BB zhi3, and that a normal *traditional* representation
> > > of zhi3xian4 "Only show..." would be U+53EA U+986F.
> > >
> > > U+53EA is also one of the 1-n simplifications that screw up the
> > > SC/TC mapping in any case. In PRC orthography, it is not only the
> > > simplification of U+96BB, but also the simplification for U+7947
> > > (and U+7957), which is the proper adverbial particle for "only,
> merely".
> > >
> > > > And, do you mean User will set a cofiguration to forbiden himself
> to use
> > > > his familiar character?
> > >
> > > U+53EA *is* in Big-5. 0xA575. Do your homework.
> > >
> > > This kind of intentionally misleading example and campaign of
> > > ad hominem argumentation directed at the chair is getting
> > > truly tiresome.
> > >
> > > --Ken
> > >
> > >
> >
>
>