[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDNA interoperability failures, once again



On Feb 13, "D. J. Bernstein" <djb@cr.yp.to> wrote:

 >Let's try another example: the ``show'' program in the UNIX MH/NMH
 >mail-handling system. Yes or no: Should this program convert domain
 >names from your special-purpose 7-bit character set to the local
 >character set? Again assume LANG=en_US.UTF-8, so this can be done.
This assumption is wrong. Most people whose primary language is not
english do not use Unicode and UTF-8 as their native character set but
some national charset like ISO-8859-1 or KOI8-R.
You may not like it, but the native character set for most people is not
UTF-8 and they will not switch to UTF-8 just to use IDN because they
would not be able to see accented letters in their plain text documents,
in email or usenet articles generated by broken microsoft software and
everywhere else a not identified charset is used.

I just looked at /usr/share/doc/locales/SUPPORTED.gz in my Debian
GNU/Linux workstation and the locales for which UTF-8 is not a second
choice are:

ar_IN
en_IN
fa_IR.UTF-8
hi_IN.UTF-8
mr_IN.UTF-8
ta_IN
te_IN
ur_PK 

md@wonderland:pts/0:~$zcat /usr/share/doc/locales/SUPPORTED.gz | sed -e 's/[\. @].*//' | sort | uniq | wc -l
    132
md@wonderland:pts/0:~$

Some locales may be missing, but 8 of 132 locales are not enough to
justify your assertion that everybody should just use UTF-8.

-- 
ciao,
Marco