[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] space-like unicode char
Erik van der Poel wrote:
> Soobok Lee wrote:
>
>> U+1160 is a space-like char and even stringprep/nameprep does not
>> filter it out because the char is not for punctuational purpose.
>
>
> U+1160 is HANGUL JUNGSEONG FILLER and it is used to transform
> nonstandard syllables into standard ones (Unicode 3.0 section 3.11
> (RFC 3454 refers to Unicode 3.2.0)). However, this transformation is
> one of the additional transformations not considered part of Unicode
> normalization (3.2.0's UAX #15 Annex 10).
Exactly. U+1160 is not "touched" by Unicode normalization (NFC).
> So this character is not generated by Stringprep/Nameprep.However, it
> is not prohibited either, so it may occur in the input to (and output
> from) Stringprep/Nameprep.
Yes, it may occur.
> I read some of the sections on Hangul in the Unicode book and Web
> site, but I did not see any rules regarding repeated occurrences of
> U+1160 (as you had in your example, not quoted above). I also did not
> see any rules about what to do when a filler is not followed by a
> Hangul jamo. It would be nice to have these rules in Unicode or in
> Stringprep.
U+1160 problem has been raised 3.5 years ago (you can look into this
huge idn-list archive by keyword search for 1160 or filler)
with some additional hangul jamo problem. One draft has been submitted
by me (you may find that in www.i-d-n.net)
to filter out these invalid char sequences. But the draft had been
discarded . Someone argued that such filtering * complicates *
stringprep algorithms with context-sensitive filtering/prohibiting and
the problem is up to UTC/NFC not to IETF. of course, i couldn't accept that.
Anyway, we can't backtrack into 2002/Dec without giving up backward
compatibility promise of stringprep.
>
> I tried U+1160 followed by a Latin character in MSIE with i-Nav and in
> Firefox with IDN turned on, and it was displayed as a wide space. It
> is unfortunate that both implementations chose to display it as a
> space instead of deleting it.
Yes. Plugins M U S T filter out U+1160 from validated ToUnicode()ed
labels, whether or not IDNA requires that.
Soobok