[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inputting mixed SC/TC (Re: [idn] A question...)





On Mon, 11 Feb 2002 14:44:41 -0500 John C Klensin <klensin@jck.com>
writes:
> --On Monday, 11 February, 2002 09:34 -0800 liana Ye
> <liana.ydisg@juno.com> wrote:
> 
> > Dear John:
> > 
> > I appreciate your understanding of all the problems the IDN 
> > does not solve. The differences between the WG and the 
> > CDN group is not a political issue at its base.
> 
> I think I understand that, as do, I think, most of the
> participants in the WG.  If I correctly understand what they are
> saying when they make comments about political issues, those
> comments are based on three things:
> 
> (i) Whomever and whatever initiated the process, much of the
> IETF reacts very badly to attempts to influence a working group
> (or other effort) by letter-writing campaigns that involve a
> large number of people sending the same note or notes,
> especially when those people have not be participating in the
> WG.  Such note-sending campaigns are a political action, not an
> engineering one.  They are also considered in such bad taste
> that they may damage the position they are trying to advocate:
> the reaction is, more or less, "they have completely run out of
> technical arguments and  are falling back to creating a lot of
> noise; this action indicates that even they do not believe in
> their technical arguments".

If you recall that the Chinese group tends to take "forget about 
them, it is wasting our time and effort to convence them" attitude? 
They would go back to do their own things, like the case 
in the early MSDOS source code licensing in the early 80's. 
I hope this WG does not make a similar mistake commtted by
MircoSoft then.  

I think it is a progress that CDN group care about what is 
being argued in the IETF now, and trying hard to make the 
IETF to understand their position.  They may not have a fully 
satisfactory solution for IDN, but they know the current 
proposal will not work for Han characters, which is 75%
of UCS code points. 

> (ii) We have a continuing communications disconnect in which
> some members of the Chinese-speaking community are, apparently,
> so astonished that their positions are not immediately accepted
> by the WG that they assume that no one [else] in the WG cares
> about the impact of this work on names written in Chinese
> characters.  Again, that leads to statements being made that
> sound to others as if reasoning has been abandoned in favor of
> political actions.

You have misread the protest.  Please recall what Martin Luther 
King. Jr holiday is about.  I  think everyone at that time 
understood what the Blacks are asking for equal rights in taken 
a bus, but not everyone thought it is an issue to put that into law!

 
> (iii) There is also a perception that the CDNC group, and some
> others, are ignoring or dismissing the problems with Japanese
> and Korean that would be induced by special mappings for Chinese
> and, similarly, that many careful explanations of why
> specialized TC<->SC handling is not analogous to case-mapping in
> alphabetic character sets.  Again, this is a failure to
> communicate (perhaps on both sides) at a very fundamental level
> that is leading to ill-feeling and beliefs that decision are
> being made on a political or emotional basis, rather than with
> full understanding and acceptance of the engineering constraints
> and tradeoffs.
 
They are not ignoring Japanese and Korean!  They are not 
Japanese or Korean, so they don't think they are qualified
to argue on behave of them.   They know there are plenty 
problems in Chinese already, which they have dealt with
without the help of IETF in the past.  They know the theortical 
limit of digital computational technology, and realy concern
about extenting the current limit to include Japanese and 
Korean users. 


> > WG is trying 
> > to push TC/SC out of IDN.  The CDN and the others argue
> > for them to be dealt with in IDN. 
> 
> Again, let me try to explain this a little bit differently.
> There is a very strong conviction in the IETF that one of the
> issues, perhaps the key issue, in Internet protocol design is
> scaling.  The scaling concerns are usually though of as applying
> to "size of the Internet" issues -- e.g., we try to avoid
> standardizing anything that can work only in a restricted
> environment or under other "small network" constraints.  But
> they also apply to partial solutions: if a particular aspect of
> a problem or design activity seems to require a comprehensive
> solution, but none is immediately available, the IETF will tend
> to avoid a partial solution until the comprehensive one and a
> migration strategy are well understood.
> 

The scaling you are thinking of is to put all UCS codepoints 
in a sequencial filter, and let Latin code block coming out 
at the end of the sequencial filter.  Which is what the CDN 
has to process with TC/SC conversion, and the 1-1 TC/SC
mapping should come out of this filter at the end too!  


> The other difficulty with the "consumer confusion" aspects of
> the CDN discussion is that many of the comments seem to ignore a
> point that has been made repeatedly in the WG: these issues
> exist, in some form, for almost every language and script we
> have investigated. All of the "equivalent character"
> discussions, the similarity of some characters among, e.g.,
> "Latin", Cyrillic, and Greek scripts, and related issues point
> to considerable possibility of user confusion, misrouted
> queries, and so on.  One might even suggest that the CDNC
> complaints have been inappropriately focused on Chinese
> characters and that the statement should have been "this will
> cause problems all over the world, and the IDN effort should
> just stop".  The people in the WG who have put in huge amounts
> of effort trying to get _something_ to work probably would not
> accept that either, but the position would more nearly recognize
> the realities of likely IDN usage.
 
So every script or script group has to have some types of
sequencial filtering, and should come out of the [nameprep]
at the end which is not confused with other code blocks.


> And that brings me to what I think it is the real problem here,
> and it a problem that many of us, including those taking the
> lead on IDNA (and nameprep, etc) have understood for most of the
> time that the WG has been working on these problems.  There are
> many problems with the use of words in languages that cannot be
> dealt with in the DNS and some other solution will be needed.
> Full TC<->SC matching (of all combinations) appears to fall into
> that category.  So does the Greek-Cyrillic-Latin problem, and
> the ASCII 0<->O and 1<->l problems.  So does a really
> satisfactory solution to the "invisible character" problem, some
> issues with Arabic/Hebrew/Yiddish vowels, and so on.  Mark's
> notion of using presentation mechanisms to identify unusual
> character combinations is, I think, quite ingenious but it isn't
> part of the DNS either.  Your notions about phonetic mappings
> fall into this category as well.  But the two things that all of
> these problems and approaches have in common is the potential
> for user confusion (or fraud) --some of it serious-- and the
> fact that none of them can be dealt with by the simplistic
> matching rules of the DNS.  Instead, we need user choices, or
> likelihood functions, or fairly serious language recognition, or
> knowledge the DNS doesn't have, or heuristics of some [other]
> variety.  And the problems aren't limited to Chinese, even
> though they manifest themselves differently.
>

This is precise the reason, that I have proposed to use 
langue tags, and keep these tags transperent in DNS.
 
> Now I suggest that almost everyone who has been participating
> actively in this effort knows that by now.  We know that other,
> non-DNS, mechanisms are going to be needed to accomplish
> satisfactory internationalization.  Given that, the hard problem
> is whether it makes sense to tamper with the DNS at all.  And,
> if one is going to do that, what the right set of constraints is
> to prevent some of the worst possible damage.   My reading of
> the consensus in the WG is that they believe that eliminating
> (or postponing) every script that raises problems is not a
> solution: we would rapidly have to eliminate all scripts,
> possibly including the LDH subset of ASCII.

That is what I mean by the Latin code block is the only one 
comes out of the filter at the end.  I think you are more fair at 
recognizing the current discussion direction.

> 
> I think the WG believes that there are three options, and that
> most of its members has eliminated the third from consideration:
> 
> 	(a) Use Nameprep, or something very much like it, which
> 	deals with mappings that are complete (i.e., there is no
> 	controversy about them and no need for examination of
> 	surrounding characters, other context, or language
> 	information to do them properly) and which excludes
> 	characters that are problematic in any context in which
> 	they appear.

I think Nameprep can be extented for a parallel processing 
fashion.  Thus Latin is done, and leave other code blocks open
to be handled by their seperated preprocessing routines.

> 	(b) Adopt a very strict "identifier" rule in which
> 	nameprep becomes unnecessary: characters from the
> 	permitted repertoire can be intermixed in any way,
> 	without restrictions, comparisons are made on those
> 	characters only, and the users will somehow learn to
> 	cope with the results.

This has to be used to sequencialize each script group. Thus,
it appears to users that it is a complete, 100% solution at Layer
3, but in IDN they are very strict "identifiers".

> 	
> 	(c) Give it up, keep text labels in the DNS restricted
> 	to the LDH rules, and do _all_ internationalization
> 	"somewhere else".

Yes, this is what I am arguing for. 


> > If you consider IDN is a part of DNS, then the CDN group 
> > say "NO, NO, NO" with force for you and the WG to raise 
> > eyeballs.   And I hope they have accomplished it with 
> > all their protests.
> 
> I fear not.  Many of us were fully aware of the problems, and
> very concerned about them, before now, and even before Salt Lake
> City.    For some of us, the protests have been successful only
> in taking up time that should have gone into the "above DNS"
> work.  For others, they have distracted from the effort to
> carefully review and adjust the details of what nameprep should
> be doing in areas where it can constitute a nearly-100%
> solution. And, as I have suggested above and elsewhere, they
> have led many people to the conclusion (whether correct or
> incorrect) that the real goal of the CDNC protest was to disrupt
> the work of the WG without offering real alternatives or a way
> forward.
> 
> > If you consider IDN is above DNS then you are agreeing 
> > with CDN group. 
> 
> As I trust you know, I have been suggesting that much critical
> IDN work will have to be done "above DNS" since before either
> the WG or MINC got started.  But to agree on that principle does
> not necessarily imply agreement on details of what the WG should
> do.
> 
> > If you admit that you are not an expert in dealing with 
> > Han charaters processing then you should give a good 
> > and hard study regarding what they have been saying 
> > all along.
> > 
> > If you are understand what Xiang Deng is saying then 
> > don't introduce political arguements.  It doesn't help the 
> > communication at all. 
> 
> I didn't raise any political arguments in the note of mine that
> you quoted, nor did I use that word.  "Policy" issues and
> decisions are another matter entirely.   And, while I assume
> neither you nor Xiang Deng did so either, I didn't set off the
> letter-writing campaign.
> 
> regards,
>     john

I think the last part of our conversation shows the difference
resulted from previous technical issues.  I shall be silent 
to futher them.

Regards,

Liana