[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Why IDNA breaks copy-and-paste



liana Ye wrote on 2002-02-15 17:24 UTC:
> Thanks for your input into the discussion.  Although
> I  myself not an expert in termcap, your experience
> in making " xterm preserves as much of the original UCS
> plaintext " is  what I expected to hear from someone.
>
> This is a part of my reasoning to treat Latin script at the
> same lavel with other scripts, not make it a special
> case to squeeze it into DNS.

Ah, now that I have understood what the discussion is about, I can
comment better onto what sort of corruption you should expect:

  - xterm is likely to leave only Normalization Form C of Unicode
    untouched. Composed sequences will be replaced with precomposed
    sequences.

  - xterm is likely to mess up significantly any text from right-to-left
    scripts. The current plan is that bidi might be handled in an extra
    filter stage or in libaries such as ncurses.

  - If Arabic shaping is done not in xterm, but in the intermediate
    filter or ncurses layer, then you will also find in the cut&paste
    buffer presentation forms and not alphabetic characters

To I would be extremely careful with using cut&paste in order to
transfer unique identifiers from a terminal emulator to an application.
This can end in an endless can of worms, if Hebrew, Arabic or Syriac are
involved, and we haven't even begun to understand how to handle that,
even after scratching out heads for a year in heated discussions on the
linux-utf8 and i18n@xfree86.org mailing lists.

The other critical security issue related to DNS are of course
homoglyphs. The same glyph appears in Unicode many times as separate
characters that are part of different alphabets or different usage
backgrounds, Latin, Cyrillic, and Greek "A" being just one example.

Markus

--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>