[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Why IDNA breaks copy-and-paste



Dan,

	I hesitate to say, but you misunderstand.  Grossly.

	For "copy and paste" involving a terminal emulator, the
"application" of interest is the terminal emulator itself.  

	Now, converting from a stream of characters to a stream of
glyphs is in general a non-trivial step.  For a very restricted
subset of strings, the mapping **MAY** be (but usually isn't,
since it is a bad idea and does not generalise well) made by
mapping from the stream of input characters to a stream of
characters representing the glyphs that are to be displayed.
Sometimes, in these restricted cases, the mapping may be the
identity mapping, but even in these restricted cases, the mapping
may take context into account (for contextual shaping of Arabic
for instance) or do BiDi mapping to "visual order" (i.e. left-to-right
order, misnamed).

	But in general this approach does not work: ligatures that don't
have corresponding characters, contextual shapes that don't have
corresponding characters, etc. All-in-all this depends much on
the font chosen in the "front UI application".

	In either case, it is *wrong* to take the sequence of "glyph
indicies" (which is the result of these processes, whether these
correspond to allocated characters or not) as the object to "cut and
paste".  It must be the sequence of characters that was the input
to that process that is the object to do "cut-and-paste" on.  

	Now if a background application does the mapping from characters
to glyph indices (maskerading as characters), cutting and pasting
at the terminal emulator application level can't be expected to
work.  But this is a degenerate situation; the background application
should not attempt to do this mapping.  A properly constructed
"front UI application" should get a stream of characters from 
the "background application"; the mapping to glyph indices (and
where to place the glyphs) should be done by the "front UI
application" (normally via some system library; and the actual
mapping often depends on the font used); cutting (and pasting)
should involve only the "front UI application" and should work on
the characters received, not on the glyph indices (even if they
are mascarading as characters).

	The glyphs, or rather their indices, in a step further may
be turned into a set of pixels on a screen (the true glyphs, if
you like, but it is tedious to say "glyph indices" all the time,
so they are informally referred to as "glyphs"), a step which I
haven't talked about, and is not expected to be reversed by
some kind of OCR process for cut-and-paste (where did you get
that idea from?); in short: the pixels are irrelevant here.

	And yes, I do copy-and-paste from web pages quite often,
I think I know how it works, in principle at least... On rare
occations I also use terminal emulators, and copy-and-paste
there too.  I think I know how that works, or should work, in
principle too...  Xterm, under Linux, does some odd things where
it maps characters to characters (glyph indices really). I'm not
sure if, or from which version, it does correct copy-and-paste.

	Whichever side ("background" or "front") that gets
a Punycoded string, to be able to convert to a Unicode
string, must "know" that it is Punycoded.  That is very
hard to do if *both* sides are "unaware" of Punycode, and
where it may be applied and were it may not be applied...

		Kind regards
		/kent k





> -----Original Message-----
> From: owner-idn@ops.ietf.org 
> [mailto:owner-idn@ops.ietf.org]On Behalf Of
> D. J. Bernstein
> Sent: den 13 februari 2002 17:31
> To: idn@ops.ietf.org
> Subject: [idn] Why IDNA breaks copy-and-paste
> 
> 
> Kent Karlsson writes:
> > What is displayed is glyphs.  In many cases there is no (uniquely)
> > corresponding character.
> 
> You misunderstand.
> 
> Look at the copy-and-paste supported by the UNIX xterm 
> program. Does it
> read dots off the display and attempt to convert them to 
> characters? Of
> course not. It obtains characters directly from xterm.
> 
> This does _not_ mean, however, that it obtains characters 
> directly from
> ``the application,'' such as the MH/NMH ``show'' program that 
> Keith uses
> to read his mail. The system has been engineered so that
> 
>    * programs such as ``show'' simply print characters, 
> without worrying
>      about dots on the display, copy-and-paste, etc.;
> 
>    * programs such as ``xterm'' read characters and manage 
> the display,
>      without worrying about where the characters came from.
> 
> This modularity speeds software development; let me again recommend
> Gancarz's ``UNIX Philosophy'' book. The system doesn't have, 
> and doesn't
> need, copy-and-paste in programs such as ``show''.
> 
> Web-mail systems are engineered the same way:
> 
>    * The web server converts your mail to HTML, without worrying about
>      your display.
> 
>    * The browser displays the HTML, without worrying about 
> how the HTML
>      was generated.
> 
> This modularity again speeds software development---but it again means
> that there's no copy-and-paste support in the web-mail system.
> 
> We seem to have many novice programmers here, people under 
> the delusion
> that copy-and-paste is always handled by ``the application,'' 
> which can
> provide characters in some dorky 7-bit encoding if it wants 
> to. But the
> real world has, in many cases, advanced beyond that primitive model.
> 
> ---D. J. Bernstein, Associate Professor, Department of Mathematics,
> Statistics, and Computer Science, University of Illinois at Chicago
>