[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] BOM in draft-hoffman-stringprep-07



The double use of U+FEFF as BOM and ZWNBSP has been disunified.
U+FEFF is now (from Unicode 3.2) used only as BOM, though it
retains the name ZWNBSP (spelled out in full, of course...).  The
function of "true" ZWNBSP has been taken over by 2060;WORD JOINER.

A BOM is not part of any text, just some byte serialisation of that text, and
stringprep can be applied to some text, but not to the (byte) serialisation
of the text, that is a lower level, regardless of whether UTF-8, UTF-16,
UTF-32 or something else is used as processing code.  After stringprep,
the text can again be serialised, so (byte oriented) protocols can use BOM
with (byte serialised) UTF-16 or UTF-32 (or even UTF-8) if desired. 
(Though I would prefer fixating the byte order, rather than ever use BOM
where possible.)

So as long as it is ok to remove WORD JOINER (which is what non-BOM
uses of U+FEFF should be turned into), ZWNBSP should be removed to.

	/Kent Karlsson


> -----Original Message-----
> From: owner-idn@ops.ietf.org 
> [mailto:owner-idn@ops.ietf.org]On Behalf Of
> Simon Josefsson
> Sent: den 8 oktober 2002 15:59
> To: idn@ops.ietf.org
> Subject: [idn] BOM in draft-hoffman-stringprep-07
> 
> 
> (If there is a more appropriate mailing list for stringprep, 
> let me know.)
> 
> I see that draft-hoffman-stringprep-07 maps U+FEFF (ZWNBSP/BOM) to
> nothing (older versions did too) and also prohibits the character in
> the output.  Why?
> 
> U+FEFF used as a byte order mark have its uses in some Unicode
> transformation formats (UTF-16 and UTF-32), and I don't see stringprep
> requiring the use of UTF-8, where I would agree that it makes sense to
> prohibit it.
> 
> My interpretation is that any protocol that uses UTF-16 and UTF-32 in
> byte order independent mode will thus have to modify the stringprep
> tables, to have BOM signatures work, before they can use stringprep.
> Only UTF-8 and byte order tagged UTF-16 and UTF-32 can use stringprep
> tables as is.  Is this a correct interpretation?  In that case, the
> text in section 5 should make it more clear that it is allowed to use
> partial tables (right now it says all or some of the tables may be
> used), and also note that byte order independent UTF-16 and UTF-32
> applications need to remove U+FEFF from the tables.  An alternate
> solution would be to only define stringprep for UTF-8.
> 
> Thanks.
> 
>