[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt (in ksc5601-1987)



One concrete example:  There is one hangul syllable char which is found only in KSC5601-1992.

"똯"  0x8C70   U+B62F  #HANGUL SYLLABLE SSANGTIKEUT WA KIYEOKSIOS

I  typed in this char "똯"  in this mesage from my Outlook Express 6.0.
If you look into the mime header of this message,
  you can find the errornous mime-charset name : "KS_C_5601-1987".  Stupid and Wrong Versioning!

Legacy encodings and its versioning conventions were not designed to be used in rigorous identifier contexts,
rather to be used in textual applications like wordprocessors or html pages.
but IRI (IDN) is about to use the legacy encodings  in its own extended way regardless of this caveat.

Soobok Lee

----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "Mark Davis" <mark@macchiato.com>; "Roozbeh Pournader" <roozbeh@sharif.edu>; "IDN" <idn@ops.ietf.org>
Sent: Tuesday, May 28, 2002 11:59 PM
Subject: Re: [idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt


> How do you think about the variations between  KSC5601-1987 and KSC5601-1992 ?
> The latter one has thousands of new hangul syllable characters (johab), but often
> has been tagged as shorter "KSC5601" or interchangably as "EUC-KR". Obviously,
> This loose versioning convention,which is found everywhere from Outlook , MSIE , Hotmail, and  Netscape
> Navigator,  would cause  unnoticeable failures and security problems.
> For example, if  the sender encloses  hangul IRI including such hangul legacy-encoded IDN in his outgoing email message,
> the recipient with old s/w  may be unable to convert  it into Unicode.
> What if  that happens between  machine client and servers?
>
> Soobok Lee
>
> ----- Original Message -----
> From: "Mark Davis" <mark@macchiato.com>
>  >
> > However, all this being said and done, the variations are usually a
> > very small percent of the total, and usually restricted to a few
> > punctuation or symbols. And to the best of my knowledge, people do not
> > vary in how they interpret the ISO 8859 series. Thus while the
> > document wants to point to the problem, it would be misleading to give
> > people the impression that a large number of characters will cause
> > security problems, when it is really restricted to a very small number
> > of cases.
> >
> > What would be productive and useful would be to identify and list
> > those characters that could have problems in practice.
> >
> > Mark
> > __________
> >
> > http://www.macchiato.com
> >
> >  “Eppur si muove”
> > ----- Original Message -----
> > From: "Roozbeh Pournader" <roozbeh@sharif.edu>
> > To: "IDN" <idn@ops.ietf.org>
> > Sent: Tuesday, May 28, 2002 05:22
> > Subject: Re: [idn] Re: Legacy charset conversion in
> > draft-ietf-idn-idna-08.txt
> >
> >
> > >
> > > > The basic attack: Alice runs on host that uses Latin-1 for
> > > > input/output and enters www.µbank.com (where µ is 8859-1 0xB5).
> > The
> > > > domain is registered using U+00B5, but Alice's application
> > transcode
> > > > the string using U+03BC.  Either Alice can't connect (if the other
> > > > domain doesn't exist) or she ends up talking to someone else (if
> > the
> > > > other domain does exist).
> > >
> > > I'm sorry, but your example doesn't work. In nameprep, when doing
> > Unicode
> > > Normalization, U+00B5 is mapped to U+03BC. So these will be the same
> > > domain name, and have the same ACE label.
> > >
> > > roozbeh
> > >
> > >
> > >
> >
>