[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] surrogates in draft-ietf-idn-nameprep

To: idn@ops.ietf.org
Subject: [idn] surrogates in draft-ietf-idn-nameprep
From: Frank Ernens <fgernens@enternet.com.au>
Date: Wed, 16 Aug 2000 06:38:31 +1000
Delivery-date: Tue, 15 Aug 2000 19:42:00 -0700
Envelope-to: idn-data@psg.com

Section 3.7.2 says

> So far, all proposals for binary encodings of internationalized name
> parts have specified UTF-8 as the encoding format. In such an encoding,
> surrogate characters MUST NOT be used. Therefore, for UTF-8 encodings,
> the following are prohibited:
>
> D800-DFFF   [SURROGATE CHARACTERS]

This is incorrect. A pair of surrogates corresponds to a character in
the 31-bit ISO 10646 code space, and according to RFC2044 anything
up to 2**31 - 1 can be encoded in UTF-8. Simply transform the
UCS-2 to UCS-4 and then into UTF-8.

What might have been meant was that some current implementations of
UTF-8 mishandle surrogates. Actually, the most likely near-term
use for them is in user-defined ideographs (e.g. obscure Chinese
and Japanese personal names) and therefore it is reasonable
to disallow them - just not for the stated reason. Said another
way, since all ISO 10646 characters in the range representable
by pairs of surrogates are currently undefined (except for private
use characters), and the document elsewhere prohibits undefined
characters, we don't need this section at all.

Prev by Date: Re: [idn] Roundup on optional characters
Next by Date: Re: [idn] Adding "optional" characters in draft-ietf-idn-nameprep
Prev by thread: RE: [idn] Unicode tagging
Next by thread: Re: [idn] surrogates in draft-ietf-idn-nameprep
Index(es):
- Date
- Thread