[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fw: [idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt (in ksc5601-1987)



Wherever there is mislabeling of character sets, that can cause
problems, of course. If ms1252 text is mislabeled as iso8859-1 text,
for example, then all sorts of problems can ensue. Even worse, of
course, is if SJIS text is mislabeled as 8859-1!

In many circumstances of domain name conversion, I think these
problems will be less likely. Take a browser, for example. In the URL
field, someone will be typing in a given URL, in whatever code page
the browser supports. The synchronization between the code page and
the input keyboard will be pretty close, and will be unlikely that
there will be a mismatch unless the OS is completely clueless. (Also,
this will be much less of a problem in the very near future, as most
client machines will be native Unicode [UTF-8, UTF-16, or UTF-32]
anyway.) Also, if transcoding produces an unassigned character, or
detects an illegal sequence, then it would fail.

I don't say that problems can't happen, but it would be useful to
examine some detailed scenarii to see whether, where, and how any
problems would realistically arise.

Mark
__________

http://www.macchiato.com

 “Eppur si muove”
----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "Soobok Lee" <lsb@postel.co.kr>; "Mark Davis"
<mark@macchiato.com>; "Roozbeh Pournader" <roozbeh@sharif.edu>; "IDN"
<idn@ops.ietf.org>
Sent: Tuesday, May 28, 2002 08:23
Subject: Re: [idn] Re: Legacy charset conversion in
draft-ietf-idn-idna-08.txt (in ksc5601-1987)


> One concrete example:  There is one hangul syllable char which is
found only in KSC5601-1992.
>
> "똯"  0x8C70   U+B62F  #HANGUL SYLLABLE SSANGTIKEUT WA KIYEOKSIOS
>
> I  typed in this char "똯"  in this mesage from my Outlook Express
6.0.
> If you look into the mime header of this message,
>   you can find the errornous mime-charset name : "KS_C_5601-1987".
Stupid and Wrong Versioning!
>
> Legacy encodings and its versioning conventions were not designed to
be used in rigorous identifier contexts,
> rather to be used in textual applications like wordprocessors or
html pages.
> but IRI (IDN) is about to use the legacy encodings  in its own
extended way regardless of this caveat.
>
> Soobok Lee
>
> ----- Original Message -----
> From: "Soobok Lee" <lsb@postel.co.kr>
> To: "Mark Davis" <mark@macchiato.com>; "Roozbeh Pournader"
<roozbeh@sharif.edu>; "IDN" <idn@ops.ietf.org>
> Sent: Tuesday, May 28, 2002 11:59 PM
> Subject: Re: [idn] Re: Legacy charset conversion in
draft-ietf-idn-idna-08.txt
>
>
> > How do you think about the variations between  KSC5601-1987 and
KSC5601-1992 ?
> > The latter one has thousands of new hangul syllable characters
(johab), but often
> > has been tagged as shorter "KSC5601" or interchangably as
"EUC-KR". Obviously,
> > This loose versioning convention,which is found everywhere from
Outlook , MSIE , Hotmail, and  Netscape
> > Navigator,  would cause  unnoticeable failures and security
problems.
> > For example, if  the sender encloses  hangul IRI including such
hangul legacy-encoded IDN in his outgoing email message,
> > the recipient with old s/w  may be unable to convert  it into
Unicode.
> > What if  that happens between  machine client and servers?
> >
> > Soobok Lee
> >
> > ----- Original Message -----
> > From: "Mark Davis" <mark@macchiato.com>
> >  >
> > > However, all this being said and done, the variations are
usually a
> > > very small percent of the total, and usually restricted to a few
> > > punctuation or symbols. And to the best of my knowledge, people
do not
> > > vary in how they interpret the ISO 8859 series. Thus while the
> > > document wants to point to the problem, it would be misleading
to give
> > > people the impression that a large number of characters will
cause
> > > security problems, when it is really restricted to a very small
number
> > > of cases.
> > >
> > > What would be productive and useful would be to identify and
list
> > > those characters that could have problems in practice.
> > >
> > > Mark
> > > __________
> > >
> > > http://www.macchiato.com
> > >
> > >  “Eppur si muove”
> > > ----- Original Message -----
> > > From: "Roozbeh Pournader" <roozbeh@sharif.edu>
> > > To: "IDN" <idn@ops.ietf.org>
> > > Sent: Tuesday, May 28, 2002 05:22
> > > Subject: Re: [idn] Re: Legacy charset conversion in
> > > draft-ietf-idn-idna-08.txt
> > >
> > >
> > > >
> > > > > The basic attack: Alice runs on host that uses Latin-1 for
> > > > > input/output and enters www.µbank.com (where µ is 8859-1
0xB5).
> > > The
> > > > > domain is registered using U+00B5, but Alice's application
> > > transcode
> > > > > the string using U+03BC.  Either Alice can't connect (if the
other
> > > > > domain doesn't exist) or she ends up talking to someone else
(if
> > > the
> > > > > other domain does exist).
> > > >
> > > > I'm sorry, but your example doesn't work. In nameprep, when
doing
> > > Unicode
> > > > Normalization, U+00B5 is mapped to U+03BC. So these will be
the same
> > > > domain name, and have the same ACE label.
> > > >
> > > > roozbeh
> > > >
> > > >
> > > >
> > >
> >
>
>