[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] questions about new unicode 3.2 addition for hangul compatibilit jamo handling



Congratulation on the birth of Unicode 3.2. :-)

I found the following addition which is useful for correct handling of hangul compatibility jamo.
But, i suspect it does not help IDN nameprep, while it will clearly help ordinary hangul text applications.

The U+200B and U+FFEF ( <--- U+2060 by NFKC)  in ACE-encoded label  should  *NOT* be
resubmitted into  another nameprep process again! But, in most cases , for example, copy & paste of URL,
IDN-aware applications would apply the nameprep map-out process (1st step) on U+200B or U+FFEF in decoded
ACE label and apply the next nameprep NFKC step on bare (without those zero-width spaces) hangul
conjoining jamo sequences and make as similar unintended combined syllables as without this new unicode 3.2 addition.

In summary, if nameprep adopts this addition, it may enable correctly-rendered hangul jamo labels
to be encoded and rendered and registered, but, those labels may undergo some errors in
ACE-decode-and-re-encode process in copy&paste operations in IDN-aware operations.

I read the following document just now.  I don't know whether stringprep/nameprep will adopt Unicode 3.2.
If i have some misunderstang in the above analysis, please correct me.

Soobok Lee



http://www.unicode.org/unicode/reports/tr28/#10_4_hangul
10.4 Hangul (addition)
Hangul Compatibility Jamo
When Hangul compatibility jamo are transformed with a compatibility normalization form, NFKD or NFKC, the characters are converted
to the corresponding conjoining jamo characters. Where the characters are intended to remain in separate syllables after such
transformation, they may require separation from adjacent characters. This can be done by inserting any non-Korean character.

  a.. U+200B ZERO-WIDTH SPACE is recommended where the characters are to allow line-break.
  b.. U+2060 WORD JOINER can  be used where the characters are not to break across lines.
For example, the table below illustrates how two Hangul compatibility jamo can be separated in display, even after transforming with
NFKD or NFKC.

  Separating Jamo Characters Original  NFKD  NFKC Display

            3131
            314F

            1100
            1161

            AC00


            3131
            200B
            314F

            1100
            200B
            1161

            1100
            200B
            1161


Attachment: refglyph?24-3131
Description: Binary data

Attachment: refglyph?24-314F
Description: Binary data

Attachment: refglyph?24-1100
Description: Binary data

Attachment: refglyph?24-1161
Description: Binary data

Attachment: refglyph?24-AC00
Description: Binary data

Attachment: refglyph?24-200B
Description: Binary data