[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] homograph attacks
Hi Michel,
I don't think we are so far off. My concern is that many people are abusing the term 'language' for these tables. I am just saying that creating exclusive subset of Latin characters in European context is not necessarily a bad idea but will result in future problems because they will always discover that few characters are missing from the subset.
I do not think that a few characters being missing from a particular
version of a language table is that big a deal. After all, characters
are still being added to Unicode. If the registry finds that there is a
real demand for the missing characters, it can decide to revise the
table with those characters included with little effort (as compared to
shrinking a table.)
It is reasonably easy for .de to establish a table as they did and again it is ok. It is much more challenging for a worlwide TLD such as .com to establish registration rules.
Agreed. However, I think it is a worthwhile effort for a combined effort
among gTLDs and relevant language experts to develop language tables
that are appropriate for use in a gTLD context, and can be shared by
other registries. In some cases, it may not be so difficult as ccTLDs
that share a common language with each other often work together while
establishing the set of characters to be allowed in their own IDN
rollouts (eg. .at/.ch/.de for German and .ca/.ch/.fr for French).
Typically script is a much better selector than language to establish those tables and associated rules.
A script is a very convenient selector for publishing tables and in
other applications such as localization. I don't agree that it applies
"typically", although in some cases it does work quite well (.museum and
.pl are fine examples.)
Even among the 92 characters that DENIC allows, which are all characters
from the Latin script, as Roozbeh pointed out in the last ICANN IDN
workshop that there is a visual resemblance between U+00D0 (the capital
version of U+00F0) and U+0110 (capital version of U+0111). This would be
an intra-script conflict (though the characters belong to different code
blocks), and compared to the IDN-posing-as-ASCII-domain attack it is
arguably less urgent. This serves to illustrate the importance of
restricting the characters to the smallest practicable subset in IDN
implementations. For a gTLD, the language tag comes in handy, so we
could apply different language tables to IDNs according to their
intended language. And using a conservative subset of characters for
each language table is part of the deal, in order to reduce the
possibilities of phishing attack.
Best regards,
wil.