[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Document Status?



Patrik Fältström <paf@cisco.com> writes:

> (a1) The email program understand IDNA, but not the address book program.
> As it understands IDNA, it will display (if the script and font exists) the
> correct Unicode characters, and not the ACE encoded string. Now, the copy
> operation happens, and I would if I were the email programmer put two (2)
> things in the paste buffer: One "email address" which is the ACE encoded
> string. Same thing as what is passed in SMTP or POP. One which is the
> address in Unicode (or local script, which will be named as part of the
> tag). The addressbook which fetches data from the paste buffer gets the
> string, and notice it is ace encoded, and can choose to decode that if it
> can/know etc.

At least in X11 cut'n'paste works by transfering charset tagged but
otherwise opaque character arrays.  What you are proposing seem to
require a cut'n'paste protocol to be implemented in both the MUA and
the address book application.  The protocol must specify how the
structure containing the raw string and the ACE encoded string is
encoded and identified by both applications.  Will IDNA define this
protocol for X11, MacOS, Windows etc?

Assuming IDNA will limit itself to not require modifications to
cut'n'paste operations in various operating systems, you will only be
able to cut'n'paste charset tagged but opaque text strings.  If the
strings are to be ACE encoded or raw encoded is not specified anywhere
as far as I can tell, and different implementations will chose
different strategies.  If the application is running in a Unicode
environment, it might (only might!) make sense to transfer the raw
Unicode encoding, but if it is running in a non-Unicode environment
the IDNA specification leaves you in the cold as for how to implement
anything.  The result won't be pleasing.

In general, cut'n'paste of IDNA in the real world is not well defined,
since IDNA only solves the IDNA problem for Unicode, and the real
world isn't running Unicode everywhere.  Even if you dodge that
problem, there are other issues (like whether to send raw or ACE
encoded data).  IMHO, let's limit the scope of IDNA to exclude the
cut'n'paste problem because there are dragons there.

> (a2) The email program does not understand IDNA. It will only see the ACE
> encoded string, and  will just like today place the ACE encoded string into
> the paste buffer. See (a1) for rest of story.
>
> (b) I can type some weird codepoints in my email application, but the
> address book can not handle it. Also in this case the safest way of moving
> forward is to place the ACE encoded string in the paste buffer.

There are other scenarios as well.

(c) The email address was located in the message body, and thus not
    ACE encoded.  If the message body was non-Unicode, see (d) for the
    rest of the story, if the message body was Unicode, it is not
    clear which application, or if at all, will ACE encode it, and you
    have the situation in (a) again.

(d) Email program understands IDNA but is running in a non-Unicode
    environment.  The address is tagged and is transfered to address
    book application using e.g. ISO-8859-1.  IDNA doesn't handle or
    care about this scenario, but it do exists in the real world
    (e.g. my machine).