[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] homograph attacks



Hi Martin et al,

Martin Duerst wrote:

Very much agreed. Except for registries with very special
policies (such as the blocking used by some East Asian
registries), the language association doesn't make too
much sense.

Indeed, for registries that wish to support CJK languages by following RFC3743, it makes sense to find out the intended language of a label in order to decide whether to apply traditional/simplified Chinese bundling. Take for example a label that contains only Han characters. When used in a Japanese context, there probably wouldn't be any variant that the registrant would care about. But when the same label is used in a Chinese context, assuming that it does mean something, the registrant might want both the traditional and simplified variants of the label.


Quoting John in draft-klensin-reg-guidelines:

  ...and with different geographical and political locations
  and languages having requirements for different collections of
  characters, the optimal registration restrictions became, not a
  global matter, but ones that were different in different areas and,
  hence, in different DNS zones.

It is my belief that CJK is not the only "problematic" or special cases. As languages are being researched on in terms of their implications in IDN, more language-dependent rules may need to be accommodated.


Immagine that a gTLD registry had a few hundred language tables, and immagine that a registrant wanted to register a particular sequence of characters. It would be very easy for a registrar to set up a service that figured out a language (don't care which) that worked, and register the name with that language.

There is nothing wrong with this approach, technically. The language tag is, the way I see it, simply a hint and is more an administrative piece of information rather than concerning the operations of the DNS. It is used only at registration time, to allow the registry to make certain decisions and apply rules to it. If we could drop the word "language" and simply stick with "tag", then that tag could be a script, subset of a script, or simply a list of characters, labeled by a suitable term such as "Characters used in Germany", "Nordic languages", or "List of allowable characters for Japanese".


Take the .PL IDN program for example, as long as the desired label contains only characters from one of the allowable tables, the registration will go through. The tables are named "Latin set", "Cyrillic set", "Greek set", etc. They could jolly well launch Chinese and introduce a rule that says, if your label fits into the Chinese table, we will reserve any variants of it as well. One could see that as associating an IDN with a table, and the table may be a single language, or may represent several languages.

IMO, whether it is "language association" or "table association", they are one and the same concept.

Regards,
wil.