[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: IDNA: is the specification proper, adequate, and complete?



> > 2. There are no "non-Unicode coding systems" that unify beta
> > and eszed; the language issue is irrelevant.
>
> Sure there are.  We call some of them "books".  Transcription of
> a language into printed form involves a coding system.  And I
> have to assume, although I can claim no personal knowledge, that
> German schoolchildren, brought up looking at Eszett, have to be
> taught, when they encounter mathematical notation that uses
> Greek characters (if not sooner), that it is important to notice
> either the context or the descender -- that the two characters
> are not the same.

My point was that they probably have about the same level of confusion
that you had when you first saw a gamma as a schoolchild and confused
it with a y.

> These distinctions, including getting used to
> the variations and similarities in different fonts of I-l-1, are
> bits of pattern recogition that lay people --as distinct from
> font or character set experts-- rapidly learn, within their own
> language and script contexts, to distinguish from context or by
> relatively subtle clues.  I can't even spell out Arabic or Thai
> scripts because I don't have enough experience with the right
> set of clues -- my loss, but these are learned skills.

True

>
> But this isn't the point, so whether there are, or are not,
> coded character sets that unify the two is not the point either
> (I'll defer to your knowledge and experience on this subject,
> since I haven't studied the question, but statements that sound
> like universal negatives always scare me).

Let me qualify my negative: I have never heard of any that do, and of
the 750 odd code page mapping tables that we have collected on major
platforms (http://oss.software.ibm.com/cvs/icu/charset/data/xml/),
none of them do. Of course, if you go down to Arkansas Bob's "Bait,
Tackle, and Character Encodings Shack", he can whip up a nice
character set in no time flat that'll unify them. You should
understand my phrase 'no "non-Unicode coding systems"' as meaning 'no
"non-Unicode coding systems" of any importance or impact', and I'll
now understand your phrase "non-Unicode coding systems" as meaning
"books" ;-)

..
> (ii) It is addressed to, and solves, a very narrow problem.  We
> (for some definition of "we") have not been explicit, in an
> Internet context, as to what that problem is.  I believe that we
> should be explicit.   Then, having carefully described that
> problem, we then need to carefully evaluate the question of
> whether the benefits of solving it outweigh the risks to the use
> of the DNS in the Internet community that it might pose.  If we
> conclude that we can't reasonably do that evaluation (e.g.,
> because it isn't an IETF problem), then I think we are still
> obligated to delineate the issues and risks to the best of our
> ability -- at least to the extent of writing down the
> implications of problems and issues we already know about.
>
> (iii) A number of items of knowledge and recommendations have
> surfaced in the working group -- of which your suggestion above
> is an excellent example -- that could be used to reduce or
> eliminate some of those risks to the DNS as a piece of usable
> Internet infrastructure.  I think they need to be written down
> as part of WG output, if only because "this risk can be
> ameliorated if one does so-and-so" is a much more satisfactory
> statement than "there is this horrible problem and we should
> consider stopping progress until someone has a solution".

I agree that in both of these cases capturing additional information
as guidelines for dealing with particular issues would be very
helpful.

> Of course, if "mixed scripts in domain names" are considered
> good things, warning when they occur won't help much.   But
> _that_ one, I would contend, is not an IETF problem although I
> think it would be wise and responsible for us to point out that
> mixed script labels pose challenges that homogeneous ones do not.

I agree.  Such UI's would certainly alert people to something very
fishy going on in the case of "Intel.com" spelled with a Cyrillic 'e',
without preventing legitimate registrations such as
"ABC<alpha><beta><gamma>.com". For the latter, the visual indication
would be there, and people would be alerted to the multiple scripts,
but it would not prevent usage.

Regard,

Mark