[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] stringprep comment 5: hangul conjoining sequence





> -----Original Message-----
> From: Soobok Lee
...
>  1. When  old trailing hangul jamos are included in 
> conjoining jamo sequences, UAX15(NFC)
>      performs  partial combinations to produce  "a modern 
> hangul syllable(LV) + a standalone
>       old hangul jamo(oT)" and that form satify a ridiculous 
> syllable break condition (X.T) .

There is no syllable break there.  There may be a sequence of L before
a Hangul syllable character, and a sequence of T after it, without any syllable
break inbetween.  For some of the Hangul syllable characters, there may even
be both Vs and Ts after, without there being a syllable break.

>  2. UAX15 also tries ridiculous partial combinations when it 
> met a combining sequence if
>       two or more leading hangul jamos followed by hangul 
> vowel jamos, and produce
>       a syllable break condition (L.LV)

There is no syllable break there.  [Skipping syllable characters for the moment]
Note that a Hangul syllable consists of a non-empty **sequence** of L, followed
by a non-empty **sequence** of V, followed by a (possibly empty) **sequence** of T.
Note that in many cases compatibility equivalents with regard to these were
(erroneously) made non-equivalent between Unicode 2.1 and Unicode 3.0. 

>  3. Compatibility hangul jamos are mapped into conjoining 
> jamos without any fillers.

Compatibility (non-conjoining) Hangul letters are best prohibited.  Doing
the correct mapping is not expressible in via nameprep without adding
a new, special for Hangul, mechanism.  Would there be any major problems
just prohibiting them? (Allowing only the conjoining jamo and syllable characters.)

...
>  Now I propose UTC make new normalizations (call it NFN)to 
> correct such errors and faults and
>     let stringprep include it after casefolding : that is  
> NFKC(NFN(casefold(x)).

I have suggested a solution that involves only additions to the tables in
"nameprep" (these include prohibiting non-conjoining compatibility Jamo
as well as the Hangul filler characters).  Table available upon request.

I agree that it is very unfortunate that "letter sequence" equivalent strings,
like [gg] (SSANGKIYEOK) and [g][g] (<KIYEOK, KIYEOK>) are not
formally equivalent in Unicode; indeed these should have been canonically
equivalent; but the normal forms are by now frozen, and I don't think anyone
wants to have yet another Unicode formal equivalence or normal form.

		Kind regars
		/kent k