[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] case folding



At 10:42 11/06/00 , GIM Gyeongseog-KIM Kyongsok wrote:
>On Sun, 11 Jun 2000, RJ Atkinson wrote:
>
> > At 04:08 11/06/00 , GIM Gyeongseog-KIM Kyongsok wrote:
> > The above idea breaks other Romanised languages, such
> > as Vietnamese, so I think its really not possible to adopt.
>
>i don't know much about vietnamese.
>could you please give one or two concrete examples?

I did previously on this IDN WG list.  Here is another try,
this time with more background detail...

Background:     
         - Vietnamese only uses Romanised letters, and never uses
           ideogramatic characters.  Although spoken Vietnamese
           has some cognates with Mandarin or Cantonese (due to
           roughly 1000 years of Chinese imperialism, ask anyone
           who is Vietnamese), it is distinctly a different language 
           from Chinese and never uses Chinese characters in its 
           written form.  (e.g. the word "ma~" in Vietnamese is
           pronounced the same as the Mandarin word for "horse",
           but has a more narrow meaning of "Chess piece named horse",
           whereas a different Vietnamese word is used for the
           "animal horse".
         - Vietnamese uses 6 different diacritical tone marks 
           (roughly: accent acute ', accent grave `, horizontal 
           squiggle ~, vertical sqiggle, dot underneath, no mark) 
           and at least one non-tonal diacritical vowel modifier 
           (circumflex).   In the northern pronunciation, there are
           6 distinct tones in regular spoken use, while in the south
           the tones (horizontal squiggle ~, vertical squiggle) tend
           to be blurred together in spoken form.  The several Vietnamese
           pronunciations are mutually intelligible, perhaps more so 
           than Yorkshire English and Southern US English are)
         - Vietnamese also has some letters unique to Vietnamese 
           (e.g. there is a "D-" character which is different from,
           yet very slightly similar to the Nordic letter Eth).
         - All Vietnamese letters are recognisably Roman, not
           in any way ideograms.  
         - The Vietnamese language has been normally written in this
           Romanised (Quo^`c Ngu) form for more than 200 years and
           it is this form that is used everywhere (signs, newspapers,
           other places).   If one goes back to perhaps the Roman year
           1400, then Vietnamese were using an ideogrammatic written form
           derived from Chinese characters, but that was never widely
           used outside the educated elite and has been "dead" for
           more than 200 years.

One could imagine a URL:

         http://www.d-o^ng.vn

and its capitalised equal:

         http://WWW.D-O^NG.VN

It would be broken for users if those two URLs above did not 
go to the same web page.  If we only case-map [a-z, A-Z], 
then we would not case-map "d-" with "D-" or case-map "o^" 
with "O^", thus making the 2 fictional URLs above map to 
different web pages.  

It is not reasonable to say that the content provider needs to 
register all domain-names with the myriad case combinations 
and manually map them to the same content, though I can see
that some would-be DNS registries would very much like the
chance to charge N times for the same domain-name.

My bottom line is that we MUST do case-mapping for at least 
all of the Romanised letters.  Again, I am not discussing 
any CJK issues in this email.

Ran
rja@inet.org