[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] I-D ACTION:draft-ietf-idn-cjk-00.txt



First, thank you for your comments :-) I will see what I can do in the next
version.

Frank Ernens wrote:
> > Hence, almost all Han ideographs are associated with some meaning by
> > itself which is very different from most other scripts. This causes some
> > confusion that Han folding is a form of lexicon-substitution.
> 
> Well, a look-up table mapping those ideographs which are also
> words is also a lexicon. The test is: is the table the *same* for
> all possible languages using the characters?

I beg to defer. In this case, I->i is also lexicon since it is dependent on
the language you using.

> If not, the table is a lexicon. I don't know enough about the use of written
> Chinese to know if the written languages used in the different language
> areas (Mandarin, Cantonese and half a dozen others) are just close or
> are identical enough to use the same table. But probably different tables
> would be needed for Japanese and ancient Vietnamese than are needed
> for Chinese; are we prepared to guarantee they are *not* for all
> possible languages, past, present and future?

That is the purpose of the I-D. It describe the problem of using CJK wrt to
domain names.
 
> 1. Languages other than Chinese may use these characters. There
> are probably a few such languages in mainland China which the
> PRC government is in no hurry to document. How can you guarantee
> that a TC character has not gained some separate meaning there?
> How do you know it won't in mainstream Chinese in the future?

For Chinese alone, there is no such problem. But it get very complicated when
you put in Japanese and Korean together.
 
> 2. Why should writers of traditional Chinese lose shades of meaning?
> This is surely what is happening if the mapping is many-to-one. It
> would be equivalent to removing all the French-derived words from
> English just because they almost duplicate Anglo-Saxon ones. That
> would make me sad but not melancholy.

I am not discussing "losing shades of meaning". That is for generic text
processing and you are right there. But in domain names, we are more
interested in getting match. How exact and accurate depends what situation you
use it. Think of it like NFKC (where round-trip is not preserved) vs NFC.
 
> This section would be more understandable to users of the Roman script
> if you simply said that the Jamo correspond to our letters, there being
> both vowels and consonants (but more than we have), and syllables
> can be written with their Jamo packed together in a square form called
> a Hangul, most of which also have Unicode code points.

Erm, actually I dont think it is corresponding to Roman letters. The etymology
is different. Describing it as similar is at best, misleading and maybe
totally wrong.
 
> > Katakana is a mirror of hiragana with few more forms
> 
> I seem to remember seeing somewhere katakana forms for VA, VI, VO, VE,
> though, yes, they are not commonly used. They can be represented in
> Unicode using the voiced diacritic 0x3099 after the corresponding
> unvoiced syllable.

They are actually at U+30F7 U+30F8 U+30F9 U+30FA. btw, this is why we say
there are a few more forms.

> The main problem is that the kanji capture shades of meaning which
> are not present in spoken Japanese once you remove inflection,
> gestures, facial expressions etc. Allowing only the kana is not an option.

No one say we should only allow kana.
 
> SECTION 4 [bis], Vietnamese:
> 
> > While Vietnamese also adopted Chinese ideographs ('chu han') and created
> > their own ideographs ('chu nom'), they were now replaced by romanized
> > 'quoc ngu' today. Hence, this document does not attempt to address any
> > issues with 'chu han' or 'chu nom'.
> 
> A department of classics in a Vietnamese university (or even a department
> of Vietnamese in a US university) could reasonably expect to use these
> characters in its names. One cannot just dismiss an entire language
> because it happens to have fewer speakers than one's own or one
> doesn't know much about it.

Agreed. But none of the author knows vietnamese to discuss it in detail.
Someone else should the job, not us.

> SECTION 7, Mechanism
> 
> > c) Folding by Domain Name registration services for the purposes of
> >   preventing confusing allocations CJKV Domain Names which would,
> >   if transcoded, be the same
> 
> How does this differ from the question of US/UK spellings in English?
> We don't expect the software to detect the duplicates "colour" and
> "color" any more than we expect it to recognize "seven-up" as a
> trademark infringement of "7-Up" or to disallow certain naughty
> words (I am told the people who vet ".com" have a list of seven
> of those). These things are done by humans; they probably do use
> software to help them, but that software is not installed on every
> host, its knowledge was given to it by lawyers, not networking
> engineers, and it tracks community standards and laws, not RFC's.
> Such a system for Chinese would be useful and a fine thing to
> base a business on, but I don't believe we should be considering
> embedding these things into every host on the net.

Few things:

a) US/UK spelling in English are more correctly described as contextual 
   conversion. Such mechanism is not discussed in the I-D because it is
   irrelevant to domain names.

b) It *is* possible to have a machine do US-UK. If you go to a registrar
   today to register "color", chances are they are going to recommend
   you "colour".

c) You are confusing the 3 mechanism. We agreed the first 2 are difficult
   but at the miminual, the registration process (ie registrar) should
   be aware of these issues and handle it within their registration system
   like (b). (Thus, there is no such thing as installing at all host on
   the Internet)

> I think the draft is far too Sino-centric.

Bingo! It is *MEANT* to be. Please feel free to write one which is not
sino-centric :-)

> As I have argued in several places above, the kind of folding proposed
> by the draft depends on language, local laws and customs, and therefore
> each zone must develop and enforce its own rules. IMO, the semantic
> folding proposed in the draft should be declared out of the scope of this
> group. We should concentrate on providing a unique tag, and not try to
> guarantee unique meanings for those tags.

We did not propose any folding. We discussed a lot and how it can be done but
no proposal. This is supposed to be informational, so as to let those who does
not understand Asian script to understand it, at least enough for domain names
context.

-James Seng