[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] homograph attacks



No languages used in the former soviet union should require a mix of latin and cyrillic in a single dns label.
Unicode contains many latin homographs in the Cyrillic block exactly for that reason, to avoid mixing the two scripts in a single word. It is unfortunate that the exact visual match is now haunting us. However it should not be used as a rationale to accept registration of mixed Cyrillic/Latin labels by tld registries.

To answer another message in this thread, there is no definitive answer about which Unicode characters are allowed for a given languages. But in all languages that have a reasonable concept of 'words', you should never need to allow mixed script in a word, at least in the context of IDN label. There are exceptions to these rules, like in South and East Asia (Japanese comes to mind), but these languages can be detected reasonably using the Unicode script property.

Michel 

-----Original Message-----
From: owner-idn@ops.ietf.org [mailto:owner-idn@ops.ietf.org] On Behalf Of Kane, Pat

VeriSign does prevent domains with the Russian language tag from commingling A-Z with the Cyrillic characters.  It does permit 0-9 and the dash to be used.  This filter also applies to other Cyrillic based languages such as Belarusian, Ukrainian, Serbian, Macedonian and Bulgarian.  

There are other languages that are listed within ISO 639-2 that today use a combination of Latin and Cyrillic as they were originally Latin based (Tajik was Arabic prior to being Latin based), migrated to Cyrillic during the Soviet era and today are migrating back to Latin.  It is common to use Latin and Cyrillic characters in Tajik, from what I understand not being a native speaker.  Granted there are not a lot of registrations in com net that are Tajik, but this is just the point of an IDN.

Pat Kane