[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Fw: Cyrillics - Latin

To: "John C Klensin" <klensin@jck.com>,"James Seng/Personal" <jseng@pobox.org.sg>
Subject: Re: [idn] Fw: Cyrillics - Latin
From: "Mark Davis" <mark@macchiato.com>
Date: Mon, 4 Feb 2002 08:01:51 -0800
Cc: <idn@ops.ietf.org>,<Elisabeth.Porteneuve@cetp.ipsl.fr>
References: <043b01c1ad7b$a815ace0$0d01000a@jamessonyvaio> <77344214.1012814690@localhost>
Reply-to: "Mark Davis" <mark@macchiato.com>

I agree with John that there is no solution in DNS. I would go further
in that I don't think a solution is possible in DNS.

On the other hand, I think it would be quite useful--as I suggested
some time ago--to add some text that strongly recommended that
browsers and similar programs that display URLs indicate with some
sort of visual mechanism where they contains mixed scripts, e.g.
highlight those characters not in the script of the majority of the
characters (with a different color or background).

Mark
—————

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "John C Klensin" <klensin@jck.com>
To: "James Seng/Personal" <jseng@pobox.org.sg>
Cc: <idn@ops.ietf.org>; <Elisabeth.Porteneuve@cetp.ipsl.fr>
Sent: Monday, February 04, 2002 06:24
Subject: Re: [idn] Fw: Cyrillics - Latin


> James,
>
> Same old problem.   Mr. Charikov's note assumes that there is
> sufficient context to know whether to interpret COBET as
> Cyrillic or Latin characters.  In almost any normal context -- a
> sentence, a conversation, usually even a business card, that is
> a reasonable assumption.  In the DNS, especially in a generic
> TLD, where a name or name-reference (e.g., a URI) may exist with
> no context at all, it is often is not.  For the DNS, we have the
> additional problem that, in many societies, we now have years of
> experience with Latin-character DNS names/ URLs in web and email
> addresses on business cards that are otherwise exclusively in
> other scripts.  So the _context_ of ivan@foo.COBET.bar.RU might
> well cause one to predict a Latin-based interpretation for the
> ambiguous characters, even if the business card is in Cyrillic.
>
> We have been down this line of argument many times.  It is a
> problem.  We don't have a solution within the DNS.  All of the
> proposed solutions-by-extended-matching lead to absurdities or
> to overly-severe constraints about how one of the languages
> involved is used.  The alternatives are
>
> (i) to accept the problems, move forward, and hope that the
> consequences are not as severe as many of us (for different
> reasons, and with different languages or scripts as the focus
> for our concerns) fear   or
>
> (ii) to give it up and conclude that there are no reasonable
> solutions within the DNS that do not cause unacceptable risks of
> user confusion and identifier ambiguity.
>
> And, James and Marc, you can consider that a comment on the Last
> Calls.
>
>     john
>
>
> --On Monday, 04 February, 2002 20:58 +0800 "James Seng/Personal"
> <jseng@pobox.org.sg> wrote:
>
> > fyi
> >
> > ----- Original Message -----
> > From: "Sergey Charikov" <s.shar@regtime.net>
> > To: <Elisabeth.Porteneuve@cetp.ipsl.fr>;
> > <owner-idn@ops.ietf.org> Sent: Monday, February 04, 2002 4:34
> > PM
> > Subject: Cyrillics - Latin
> >
> >
> >> Do believe there's absolutely no any confusion of Latin vs
> >> Cyrillics
> > for
> >> Russian community.
> >>
> >> We can see many familiar words in another language characters
> > combinations,
> >> especially if there are similar drawing characters.
> >>
> >> The matter is when Russians see COBET (printed) they see
> >> "Soviet" and
> > no
> >> more
> >> And they print on a business card - COBET for website with
> >> cyrillics
> > content
> >> (for russian and cyrillics markets)
> >> but they take SOVIET usually for webpages on english.
> >> We should understand the cyrillics domains required for
> >> cyrillics
> > folks
> >> only.
> >>
> >> A PY (Paraguay) case is a funny thing only. This says about
> >> rushing to
> > use
> >> the native language even in such a way.
> >>
> >> Best Regards,
> >> Serguei Charikov
> >> Chair of Russian Language WG of MINC
> >> www.minc.org/WG/russian
> >>
> >> >
> >> > If I may add a note on Latin-Cyrillic confusion. Quoted
> >> > from an explanation I have been providing to another group.
> >> >
> >> > An aside note - I learnt from Russian colleagues that some
> >> > Russian favor to register domain names under .PY (ccTLD for
> > Paraguay)
> >> > rather that .RU (ccTLD for Russia). The reason is that "PY"
> >> > is the beginning of the word "Russia" in Cyrillic -
> >> > PYCCU[R]. The last caracter is Cyrillic "ya", see below,
> >> > any other is
> > identical
> >> > printing in Latin and Cyrillic, different code points in
> >> > Unicode, identical code point in "LDH".
> >> >
> >> > Best regards,
> >> > Elisabeth Porteneuve
> >> > --
> >> >
> >> >    Let have a glimpse on both end-user and intellectual
> >> >    property perspectives with an example.
> >> >
> >> >    The word "COBET" reads as it is if one assumes it is
> >> >    Latin alphabet, but spells "soviet" if one assumes it is
> >> >    Cyrillic. The Unicode code point representation for
> >> >    Cyrillic "C", 0x0421, is different from code point
> >> >    representation for Latin "C",
> > 0x0043,
> >> >    but they are identical on a printed paper, business cards
> >> >    or a screen. Taking into account the above, a usage of
> >> >    Unicode code points subsequently makes it impossible to
> >> >    communicate with anybody without knowing which language
> >> >    is _printed_, or, even worst, which letter or sign is
> >> >    printed in which language.
> >> >
> >> >    In the famous TOYS[R]US the R in brackets is a Cyrillic
> >> >    code point 0x042f spelled "ya", which also happen to be
> >> >    the letter R seen as in mirror, spelled "are". With the
> >> >    exception of that letter [R], any other one in TOYS[R]US
> >> >    may be read either as Latin or as Cyrillic code point,
> >> >    different spellings, different code points, identical
> >> >    printing on paper or screen. In an example of a word of
> >> >    6 code points, with the same printing but 2 different
> >> >    contents there is 2**6 = 64 possible combinations  It is
> >> >    the number of times a 6 letters word should be
> >> >    registered to preserve its whole intellectual property
> >> >    rights in 2 alphabets, Latin and Cyrillic. It is also
> >> >    the maximal number of tries an end-user should made to
> >> >    get to a website, if she or he got only a printed
> >> >    information.
> >> >    I have no competencies to expand this example to other
> >> >    alphabets or code points. Hovever, as far as I
> >> >    understand, the problem of Chinese code points have some
> >> >    similarity.
> >> >
> >> > --
> >> >
> >> >
> >> >
> >>
> >
> >
> >
> >
>
>
>
>

Follow-Ups:
- Re: [idn] Fw: Cyrillics - Latin
  - From: John C Klensin <klensin@jck.com>

References:
- [idn] Fw: Cyrillics - Latin
  - From: "James Seng/Personal" <jseng@pobox.org.sg>
- Re: [idn] Fw: Cyrillics - Latin
  - From: John C Klensin <klensin@jck.com>

Prev by Date: Re: [idn] Re: Chinese Domain Name Consortium (CDNC) Declaration
Next by Date: Re: [idn] Fw: Cyrillics - Latin
Previous by thread: Re: [idn] Fw: Cyrillics - Latin
Next by thread: Re: [idn] Fw: Cyrillics - Latin
Index(es):
- Date
- Thread