[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] BOM in draft-hoffman-stringprep-07



(If there is a more appropriate mailing list for stringprep, let me know.)

I see that draft-hoffman-stringprep-07 maps U+FEFF (ZWNBSP/BOM) to
nothing (older versions did too) and also prohibits the character in
the output.  Why?

U+FEFF used as a byte order mark have its uses in some Unicode
transformation formats (UTF-16 and UTF-32), and I don't see stringprep
requiring the use of UTF-8, where I would agree that it makes sense to
prohibit it.

My interpretation is that any protocol that uses UTF-16 and UTF-32 in
byte order independent mode will thus have to modify the stringprep
tables, to have BOM signatures work, before they can use stringprep.
Only UTF-8 and byte order tagged UTF-16 and UTF-32 can use stringprep
tables as is.  Is this a correct interpretation?  In that case, the
text in section 5 should make it more clear that it is allowed to use
partial tables (right now it says all or some of the tables may be
used), and also note that byte order independent UTF-16 and UTF-32
applications need to remove U+FEFF from the tables.  An alternate
solution would be to only define stringprep for UTF-8.

Thanks.