[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] hangul question



The string in question is:

0xd000 0xd032 0xd000 0xd030 0xd000 0xd030 0xd000 0xd032 0xd000 0xd066
0xd000 0xd069 0xd000 0xd066 0xd000 0xd061 0xd0b6 0xd0d4 0xd0b4 0xd0d4
0xd0ce 0xd0f5 0xd0ce 0xd074

There's no valid Korean word or phrase represented by this string; in
fact this string includes several syllables that are not normally used
in contemporary Korean language.

However, it seems to me that this is not even a valid UTF-8 encoding,
because the string has a pattern:

1. All characters have the 0xd0 MSB.
2. The string of LSBs is (without leading 0x prefixes):

00 32 00 30 00 30 00 32 00 66 00 69 00 66 00 61 b6 d4 b4 d4 ce f5 ce 74

Grouping each pair of adjacent bytes from the beginning, we get:

0032 0030 0030 0032 0066 0069 0066 0061 b6d4 b4d4 cef5 ce74

I don't have any idea about the last four ones, but when interpreted as
UCS-16 string, the first 8 characters are: "2002FIFA". : )

Hope this helped,
Eugene

-- 
Eugene M. Kim <ab@astralblue.com>

"Is your music unpopular?  Make it popular; make music which people
like, or make people who like your music."