[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Fw: Cyrillics - Latin



James,

Same old problem.   Mr. Charikov's note assumes that there is
sufficient context to know whether to interpret COBET as
Cyrillic or Latin characters.  In almost any normal context -- a
sentence, a conversation, usually even a business card, that is
a reasonable assumption.  In the DNS, especially in a generic
TLD, where a name or name-reference (e.g., a URI) may exist with
no context at all, it is often is not.  For the DNS, we have the
additional problem that, in many societies, we now have years of
experience with Latin-character DNS names/ URLs in web and email
addresses on business cards that are otherwise exclusively in
other scripts.  So the _context_ of ivan@foo.COBET.bar.RU might
well cause one to predict a Latin-based interpretation for the
ambiguous characters, even if the business card is in Cyrillic.

We have been down this line of argument many times.  It is a
problem.  We don't have a solution within the DNS.  All of the
proposed solutions-by-extended-matching lead to absurdities or
to overly-severe constraints about how one of the languages
involved is used.  The alternatives are 

(i) to accept the problems, move forward, and hope that the
consequences are not as severe as many of us (for different
reasons, and with different languages or scripts as the focus
for our concerns) fear   or

(ii) to give it up and conclude that there are no reasonable
solutions within the DNS that do not cause unacceptable risks of
user confusion and identifier ambiguity.

And, James and Marc, you can consider that a comment on the Last
Calls.

    john


--On Monday, 04 February, 2002 20:58 +0800 "James Seng/Personal"
<jseng@pobox.org.sg> wrote:

> fyi
> 
> ----- Original Message -----
> From: "Sergey Charikov" <s.shar@regtime.net>
> To: <Elisabeth.Porteneuve@cetp.ipsl.fr>;
> <owner-idn@ops.ietf.org> Sent: Monday, February 04, 2002 4:34
> PM
> Subject: Cyrillics - Latin
> 
> 
>> Do believe there's absolutely no any confusion of Latin vs
>> Cyrillics
> for
>> Russian community.
>> 
>> We can see many familiar words in another language characters
> combinations,
>> especially if there are similar drawing characters.
>> 
>> The matter is when Russians see COBET (printed) they see
>> "Soviet" and
> no
>> more
>> And they print on a business card - COBET for website with
>> cyrillics
> content
>> (for russian and cyrillics markets)
>> but they take SOVIET usually for webpages on english.
>> We should understand the cyrillics domains required for
>> cyrillics
> folks
>> only.
>> 
>> A PY (Paraguay) case is a funny thing only. This says about
>> rushing to
> use
>> the native language even in such a way.
>> 
>> Best Regards,
>> Serguei Charikov
>> Chair of Russian Language WG of MINC
>> www.minc.org/WG/russian
>> 
>> > 
>> > If I may add a note on Latin-Cyrillic confusion. Quoted
>> > from an explanation I have been providing to another group.
>> > 
>> > An aside note - I learnt from Russian colleagues that some
>> > Russian favor to register domain names under .PY (ccTLD for
> Paraguay)
>> > rather that .RU (ccTLD for Russia). The reason is that "PY"
>> > is the beginning of the word "Russia" in Cyrillic -
>> > PYCCU[R]. The last caracter is Cyrillic "ya", see below,
>> > any other is
> identical
>> > printing in Latin and Cyrillic, different code points in
>> > Unicode, identical code point in "LDH".
>> > 
>> > Best regards,
>> > Elisabeth Porteneuve
>> > --
>> > 
>> >    Let have a glimpse on both end-user and intellectual
>> >    property perspectives with an example.
>> > 
>> >    The word "COBET" reads as it is if one assumes it is
>> >    Latin alphabet, but spells "soviet" if one assumes it is
>> >    Cyrillic. The Unicode code point representation for
>> >    Cyrillic "C", 0x0421, is different from code point
>> >    representation for Latin "C",
> 0x0043,
>> >    but they are identical on a printed paper, business cards
>> >    or a screen. Taking into account the above, a usage of
>> >    Unicode code points subsequently makes it impossible to
>> >    communicate with anybody without knowing which language
>> >    is _printed_, or, even worst, which letter or sign is
>> >    printed in which language.
>> > 
>> >    In the famous TOYS[R]US the R in brackets is a Cyrillic
>> >    code point 0x042f spelled "ya", which also happen to be
>> >    the letter R seen as in mirror, spelled "are". With the
>> >    exception of that letter [R], any other one in TOYS[R]US
>> >    may be read either as Latin or as Cyrillic code point,
>> >    different spellings, different code points, identical
>> >    printing on paper or screen. In an example of a word of
>> >    6 code points, with the same printing but 2 different
>> >    contents there is 2**6 = 64 possible combinations  It is
>> >    the number of times a 6 letters word should be
>> >    registered to preserve its whole intellectual property
>> >    rights in 2 alphabets, Latin and Cyrillic. It is also
>> >    the maximal number of tries an end-user should made to
>> >    get to a website, if she or he got only a printed
>> >    information.
>> >    I have no competencies to expand this example to other
>> >    alphabets or code points. Hovever, as far as I
>> >    understand, the problem of Chinese code points have some
>> >    similarity.
>> > 
>> > --
>> > 
>> > 
>> > 
>> 
> 
> 
> 
>