[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] New I-D for Internationalized Resource Identifiers




<quote>
4.7 Transportation of URI/IRIs in document formats and protocols

   Document formats that transport URIs may need to be upgraded to allow
   the transport of IRIs.  In those cases where the document as a whole
   has a native character encoding, IRIs SHOULD also be encoded in this
   encoding, and converted accordingly by a parser or interpreter.  IRI
   characters that are not expressible in the native encoding SHOULD be
   escaped according to Section 2.2, or MAY be escaped in another way if
   the document format provides a way to do this.  For example, in HTML,
   XML, or SGML, numeric character references can be used.  If a
   document as a whole has a native character encoding, and that
   character encoding is not UTF-8, then IRIs MUST NOT be placed into
   the document in the UTF-8 character encoding.
</quote>

There may be another cases and aspects to take consider:

  1) some native-encoded IRI characters are not expressible in 
     a certain version of UCS   because

    a) new native characters are added to the local char encoding, and/but

        a.1)  the new characters are already defined in the version of UCS,
                but their local-to-UCS mappings are not defined or not implemented
        a.2)  the new characters are not yet defined in the version of UCS

    b) native char sets often have loose versioning systems. so do the applications.

     For example, in the email message in old "ks_c_5601-1987"/"euc-kr" korean local char set,
      newly added KSX100{1,2,3,4,5,6} hangul syllable/chinese chars can be inserted by many
      email clients (ourlook express 6.0 or many webmail services like hotmail.com/yahoo.com). 
      If the receiving applications ever support  only the very "ks_c_5601-1987",
      they would fail to map such new native chars into UTF8 ones.

  2) due to differences of expressiveness between UCS and local char sets,
      round-trip conversions (local->UCS->local, or UCS->local->UCS) may lose  
      the original form  of native strings.  sending parties often can't know whether or not
      the receiving ones would do such converions.
      For example, in the cases of hangul compat/conjoining jamos sequences and turkish 'i's.


  3) native char encoding/sets often have no NFKC/NFC-like normalizations.
     How can we enforce some kinds of normalization into native-encoded IRI
      in order to prevent comparison failures or spoofings in receiveing applications ?
     Should we convert native strings into UCS and then normalize them and convert again into 
      native ones before transmission to other applications?


We have the folloing options:
 
  1) Introduce strict/precise versioning systems and warnings for UCS/local character set/encoding for  
     applications.

  2) otherwise, discourage local characer encodings in formats and protocols.

  3) please suggest.

 
Soobok Lee
     

----- Original Message ----- 
From: "Martin Duerst" <duerst@w3.org>
To: <idn@ops.ietf.org>
Sent: Wednesday, April 17, 2002 4:43 PM
Subject: [idn] New I-D for Internationalized Resource Identifiers


> Dear IDN Working Group members,
> 
> I have just submitted draft-w3c-i18n-iri-00.txt to the Internet Drafts
> editor. This draft replaces draft-masinter-url-i18n-08.txt. It should be
> published in a few hours/days. In the mean time it is available at
> http://www.w3.org/International/2002/draft-w3c-i18n-iri-00.txt.
> 
> Based on discussions at the W3C Technical Plenary in February, and in
> particular on input from Larry Masinter, we have made some changes in
> the responsibilities for the Internationalized Resource Identifiers
> (IRI) draft, as follows:
> 
> - The W3C I18N WG is taking on responsibility for carefully
>    reviewing the current draft and bringing it to maturity for
>    submission to the IESG.
> 
> - Larry is glad to step down as a co-editor, and Michel Suignard
>    has volunteered to become a new co-editor. Many thanks to Larry
>    for his work as co-author of many earlier versions of this document.
> 
> This has resulted in the name change. The document will still be handled
> as an individual submission from the point of view of the IETF. We hope
> to take this document to IETF/W3C Last Call in May, after some more work.
> 
> Please review draft-w3c-i18n-iri-00.txt and send comments to
> w3c-i18n-comments@w3.org (publicly archived at
> http://lists.w3.org/Archives/Public/www-i18n-comments/).
> 
> 
> Regards,    Martin.
>