[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] I-D ACTION:draft-ietf-idn-cjk-00.txt



James Seng wrote:

> Frank Ernens wrote:
> > > Hence, almost all Han ideographs are associated with some meaning by
> > > itself which is very different from most other scripts. This causes some
> > > confusion that Han folding is a form of lexicon-substitution.
> >
> > Well, a look-up table mapping those ideographs which are also
> > words is also a lexicon. The test is: is the table the *same* for
> > all possible languages using the characters?
> 
> I beg to defer. In this case, I->i is also lexicon since it is dependent on
> the language you using.

Ignoring the terminology, we seem to agree that the folding depends on
the language. It was my understanding that Han folding such as you describe
was not included in Unicode because the Japanese objected that it
wouldn't work with their language. And the draft's own section 5
on Japanese says

% Hence, Han folding here is not recommended.

Ran Atkinson wrote

> However, Vietnamese do NOT reasonably expect to use a totally
> dead, archaic written form in Domain Names, which is the quite
> narrow topic being discussed here.

Given that Unicode supports several archaic scripts, they will 
get used, at least for subdomains. It may well be reasonable to 
sacrifice ancient Vietnamese for smoother operation with modern 
Chinese. But AFAIK no-one has said we have decided to do such a 
thing, and the nameprep draft comes close to saying the opposite.

My technical objections to adding new universal folding rules, for
Han or any other characters, are (i) we aren't able to check that any 
folding rules are workable for all living languages in all 
scripts, (ii) a priority dispute could arise between languages 
which would delay the WG, and (iii) naive users assume existing 
domain names are case-sensitive, suggesting there is really no
need for "convenient" folding rules.

I can see the merits of folding for z-variants (assuming that doesn't
cause problems for personal names) and for some other non-ideographic
characters. Maybe it should be done, but first I think we should
find out why they aren't Unicode compatibility equivalents to start
with.

If East Asian scripts are to be supported immediately below
.com and .org (and it seems they are), then I think it is reasonable
for the managers of those zones to decide on priorities for languages there
and (e.g.) not cater for ancient Vietnamese, because all languages can
be catered for as subdomains or below the country TLDs. It is only
*universal* folding that I disagree with - the zone owner should
determine folding policies.

> The more general issue that I have with most (all ?) of 
> your comments is that they evidence a fundamental misunderstanding 
> of the very limited way in which Domain Names are used.  You
> make arguments that would be reasonable with respect to a
> universal character set used for text processing -- those same
> arguments generally do NOT apply to the very narrow uses of
> characters for Domain Names.

Domain names are derived in all sorts of ways but two common ones
are the names of companies and of people, and those follow the
rules of the official language of the country. It would be unfortunate
if two strings were distinct according to the lawyers (and corresponded
to two different entities) but not according to folding rules invented
here. So yes, I do think general text processing behaviour is what
we want where we can get it. Of course we are stuck with existing
restrictions, but they are already well-known - unlike any new ones
we might introduce.

James Seng wrote

> > SECTION 7, Mechanism
> [snipped]
> 
> Few things:
> 
> a) US/UK spelling in English are more correctly described as contextual
>    conversion. Such mechanism is not discussed in the I-D because it is
>    irrelevant to domain names.
> 
> b) It *is* possible to have a machine do US-UK. If you go to a registrar
>    today to register "color", chances are they are going to recommend
>    you "colour".

And as long as the language is English that will usually be fine.

> c) You are confusing the 3 mechanism. We agreed the first 2 are difficult
>    but at the miminual, the registration process (ie registrar) should
>    be aware of these issues and handle it within their registration system
>    like (b). (Thus, there is no such thing as installing at all host on
>    the Internet)

I agree with that. You did mention performance problems with mechanisms
(a) and (b) but didn't say that only (c) [not the a,b,c here] was possible
without language tagging. Back when the draft was still "han.pdf" I think
language tagging had not been ruled out.

> [FE]> I think the draft is far too Sino-centric.
> 
> Bingo! It is *MEANT* to be.

I took the title to mean it was for all names containing CJK characters,
not just such names as used in the Chinese-speaking area and other
zones (.com?) which choose to give priority to that. Maybe the scope
could be clarified in the abstract. A lot of people don't discriminate
well between informational and standards RFC's.

> We did not propose any folding.

The draft says

% The implicit proposal in this document is that CJKV ideographs may or
% may not be "folded" for the purposes of comparison of domain names.

and also

% In alphabetic scripts, there is also requirement to fold Latin, Greek,
% Hebrew, Cyrillic, Hebrew and Arabic together. There may be a stronger
% requirement for CJKV characters.