[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Prohibit CDN code points



--On 2002-01-23 17.58 +0800 Erin Chen <erin@twnic.net.tw> wrote:

> In the other word,  if  do not prohibits CJK code points,
> the standard also shall hurt CDN requirements.
> 
> So, We have no choice but to prohibit CJK code points temperarily.
> Untill the proper or better rule comes out which will not let CJK hurt
> each other..

The problem with CJK codepoints and the SC/TC problems everyone talks about
is the unification which is done in the Unicode character set.

Just because in Unicode, the same character have the same codepoint
regardless of the language for CJK codepoints, there is no way to compare
two codepoints and say that they sometimes are different and sometimes not.

The Unicode tables are approved by the Unicode Consortium and ISO, and we
can not in the IETF "undo" that unification procedure.

This is the reason why we rely on other standard organizations to do "the
right thing". They have defined that when only comparing two characters, a
given codepoint in a normalized Unicode string is to be equal (or not
equal) with according to the matching rules Unicode are defining.

This means further that if one enter a string in Unicode, one will _NOT_
know whether SC or TC was used.


So, given this unification which _already_is_a_done_deal_, the matching
algorithms which are more "clever" have to have metadata part from the
query string itself, such as language, culture, original script used etc.

We can _not_ do these context-sensitive matchings in the DNS for reasons
which have been stated on this mailing list for the last year, over and
over and over again.

Instead, the IETF have started (in the Applications Area) look into a
lookup system which _can_ handle context-sensitive matchings.

So, given Unicode as character set, context-sensitive matchings have to be
done where one can pass around a context and not only a query string.

And this is NOT the DNS.

    paf