[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: My prod at IDN requirements



--On 2000-01-04 11.04 +0800, James Seng <jseng@pobox.org.sg> wrote:

> this will be a problem if ISO10646 is used. because of the CJK unification
> (arggh who is the idiot?)

The BIG question is from my point of view whether CJK unification should be 
used here, or not (because it is bad). If it is not, can this group create 
something better? See though column 1 and 14 of 
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt and 
ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt before answering... 
:-)

Traditionally the IETF is better at (a) Grandfathering something created 
elsewhere and (b) Forced that "other" body to do a good job, than doing 
something from scratch.

I am not saying that you should adopt everything already done, but think 
twice if it is not (still) better to use something that is done...

The same thing with equality and casing. Being a western-european person, 
only working with some inuit charsets needed in Canada, and helping some 
people in the far eastern library in Stockholm with chinese character sets 
on the mac, I of course do not know enough about these things -- BUT, I see 
that the UNICODE consortium have defined the decomposition rules which 
makes it possible to write code which does comparison of characters (not 
sorting).  (See column 1 and 6 of 
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt)

Are those rules too bad to use for equality definitions? Can we do anything 
better?

If those rules are used, I read the table from Harald the following way:

> For matching records, Choose One:
>
>    it matters whether matching is consistent across all servers

Yes

>    it doesn't matter whether matching is consistent across all servers

No

>    i18c Cyrillic A must compare equal to Latin A
>    i18c Cyrillic A must compare not equal to Latin A

U+0410 eq U+0041?

No

>    i18c A with Ring Above must compare equal to a with ring above
>    i18c A with Ring Above must compare not equal to a with ring above

U+00C5 eq U+00E5?

Yes

>    i18c ASCII A must compare equal to a
>    i18c ASCII A must compare not equal to a

0x41 eq U+0041?

Yes

>    i18c A + COMBINING RING ABOVE must compare equal to A with Ring Above
>    i18c A + COMBINING RING ABOVE must not compare equal to A with Ring
>    Above

(U+0041 + U+030A) eq U+00C5?

Yes


I know we were supposed to come up with requirements, but I "just" wanted 
to show what is possible to define given _one_ specific set of rules 
created somewhere else. Whether those rules are good or bad, a different 
question. I personaly think the questions from Harald were extremely good, 
and when having answers to them, we can see if we need to create our own 
rules, or if we can use something created elsewhere.

Regarding what Martin wrote, the question whether the server or the user 
should define equality, I must say that it must be the same. A user 
initiates a query to a DNS server, and the server is going to return 
records depending on equality definitions in the server. Compared with a 
more "normal" white-pages query in a database, the user does not do a 
post-processing of any kind. I.e. getting false-positives back from a 
server is not an option in DNS -- but one can use that method extremely 
effective when doing white-pages services. I.e. it is better to return some 
extra records than risking missing one. In the case of DNS, we need to 
define equality in such a way that we get back the same result all the 
time. From all servers.

   paf