[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] homograph attacks

To: John C Klensin <klensin@jck.com>, idn@ops.ietf.org
Subject: RE: [idn] homograph attacks
From: Martin Duerst <duerst@w3.org>
Date: Thu, 17 Feb 2005 18:46:16 +0900
Cc: dam@icann.org
In-reply-to: <BA0B04987920471308DCD902@scan.jck.com>
References: <A313262806CD3341AC59AB108137FC8F2F74E5@dul1wnexmb01.vcorp.ad.vrsn.com> <6.1.2.0.2.20050216143507.030d1cd0@mail.jefsey.com> <BA0B04987920471308DCD902@scan.jck.com>

Hello John,

Some points below.

At 02:17 05/02/17, John C Klensin wrote:
>Responding to several messages together, rather than sending a
>series of fragmented messages...

>--On Wednesday, 16 February, 2005 15:09 +0100 "JFC (Jefsey)
>Morfin" <jefsey@jefsey.com> wrote:

>Actually, that is not the ICANN/IANA requirement, nor is it
>clear where their requirement applies (e.g., such registrations
>are optional).  Their requirement is unclear and, IMO, needs
>updating.  I hope that they will get to that updating process
>RSN.  On the other hand, I have had that hope for well over a
>year.  Progress is unlikely to be made on those subjects by
>further discussion here.

I'd be glad to hear about what you think needs updating
(I'm thinking there is a need for updating, too), in
any appropriate venue. (I subscribed to the ICANN
list you mentioned)


>In a separate note, at Tuesday, February 15, 2005 7:44 PM, you
>wrote, in part...
>
>>> Should it not be supported on the IANA server and common to
>>> all the gTLDs?

>	(ii) An interesting distinction has been identified
>	between the needs of a domain that must serve the
>	requirements of a particular country and a domain that
>	supports the language commonly associated with that
>	country.  For the first case, of which .DE is the
>	best-worked-out example, there is a legitimate
>	requirement for registration of common names, company
>	names, street names, etc., in Germany.  Given history,
>	that list will include strings and characters that don't
>	exist in the German language.  It may include strings
>	the contain combinations of characters that do no appear
>	together in any contemporary language that uses
>	Roman-based characters.   By contrast, if a gTLD creates
>	a language table defined around the German language,
>	many of the characters needed by .DE are simply invalid.
>	That contrast, which Martin has identified in the form
>	of the difference between the "German" tables used in
>	the TLDs for Austria and Switzerland relative to those
>	used in Germany) may require taking a different look at
>	the rules and guidelines (and table registration models)
>	than we have heretofore taken: either for rather
>	different guidelines for ccTLDs than for gTLDs, or for
>	rethinking the registration model, or both.

I think the idea of a gTLD serving German language needs
is not necessarily helpful. Of course they need to serve
German language needs, as well as the needs of many other
languages. But do they have to say "oh, well, .DE developped
this table for the needs in Germany, including the needs of
the German language as a subset, so we should try and find
that subset"? I'd claim that that's mostly investing work
at the wrong place. In many cases, a gTLD can just create
tables that cover a number of languages at the same time,
maybe even a whole script as proposed by Michel. But this
has to be done bottom-up, so it may take time.

>	The issue that Pat identifies with Tajik is another
>	piece of the same puzzle: many of us may believe that
>	there is no possible reason to mix the three scripts in
>	which that language can be written in a single label,
>	and I certainly trust Roozbeh's knowledge and experience
>	in that area.

Very much agreed.

>      Certainly, it would make things safer to
>	prohibit any mixing (note that IDNA's BIDI restrictions
>	essentially prohibit mixing an Arabic-derived script
>	with anything other than itself, another Arabic-derived
>	script, or Hebrew).

Good point, but a small correction:
These restrictions only prevent mixing RTL scripts (Hebrew,
Arabic, Syriac,...) with LTR scripts (all the others). They
don't prevent mixing RTL scripts among each other.

And of course, exceptions such as Japanese as well as
languages that e.g. use a few Greek characters in addition
to the Latin alphabet have to be addressed (such languages
of course use those characters from Greek that look different
from Latin characters, so there should be no, or not much,
of a phishing issue).

>      However, we have a long history of
>	DNS labels that could not possibly be words in any
>	language.  Whether or not to permit mixed-script labels
>	is presumably an issue that the registry for .TJ will
>	need to sort out (I have been told, for example, that
>	mixed Cyrillic and Latin-character labels are likely to
>	be a requirement in Serbia and Montenegro, although this
>	illustration might give them pause).  And the best
>	answer for them might or might not be the best answer
>	for a gTLD.

I'm a bit sceptical for the need for true mixed-script labels
for Serbian (i.e. something like latin-CYRILLIC). One main
issue here is that it's more work to type in mixed-script
labels.

>In addition, as Hotta-san's very helpful note points out, one
>could considerably reduce the scope of the identified
>confusion/phishing problems by aggressively applying a variant
>model across scripts, restricting the registration of homographs
>to the same registrant.  I personally suspect that will not
>prove practical, from a policy standpoint, in the collection of
>alphabetic scripts that share Old Semitic origins, but that is,
>IMO, just another argument for giving different registries the
>flexibility to develop their own policies and take
>responsibility for the consequences of those policies.

I share your suspicions. The 'variant reservation model'
was carefully worked out to address concerns about
simplified/traditional Chinese. Even the Japanese, who
participated in working out this model, haven't adopted
it for themselves, although there is a considerable
amount of simplified/traditional variants in Japanese,
too (though less than in Chinese).


>--On Wednesday, 16 February, 2005 08:07 +0100 "\"Martin v.
>L‹Øis\"" <martin@v.loewis.de> wrote, responding to Soobok Lee:
>
>>> All Cyrillic  label  "HP" (.com)  can be registered even in
>>> Russian  language pack.
>>>
>>> Cyrillic "HP".COM  in its uppercase form  looks the same as
>>> all ASCII   "HP.COM".
>>>
>>> Any Registration Process should filter out these "HP" like
>>> combinations..
>
>But the only way to do that would require that a domain that
>permits Cyrillic characters must ban ASCII characters and vice
>versa.   I would predict that will just never happen, if only
>because every domain that exists today has a long of all-ASCII
>labels in it.  It is not a very good example (see below), but I
>note that this particular example is only of the reasons why
>"identify a mixed-script label in the application" may be a
>useful tool, but is not a solution -- this is not a mixed-script
>label.

There is one possible scenario where this might happen.
If there is e.g. a Cyrillic equivalent to .ru, it is very
probable that the Cyrillic registrations will go into that
equivalent rather than into .ru, because a series of
Cyrillic labels looks better, is easier to read, is easier
to remember, and is easier to type than mixed labels.
If such a tendency gets firmly established, it could be
possible to at some point simply say that Cyrillic registations
go into the Cyrillic TLD, and Latin registrations go into
the Latin TLD. Of course, there are a lot of things that need
to happen before we could get to such a state, but it's at
least a scenario.

And please notice that spoofing in TLDs isn't a problem,
because TLDs can simply be designed to avoid homographs;
the Cyrillic equivalent of .ru would be choosen to not look
like .py (Paraguay), but maybe something like Cyrillic r
(looks like p) followed by a short i (looks like (small)
mirrored N with a hook on top).


>And I hope we can all figure out a way to work together to make
>this work.  It is important, "don't use IDNs" isn't an answer
>now and never has been, and the alternatives are just a choice
>among ways to fragment the Internet.

Yes, this is the most important point!

Regards, Martin.

References:
- RE: [idn] homograph attacks
  - From: "Kane, Pat" <pkane@verisign.com>
- RE: [idn] homograph attacks
  - From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
- RE: [idn] homograph attacks
  - From: John C Klensin <klensin@jck.com>

Prev by Date: Re: [idn] homograph attacks
Next by Date: Re: [idn] who should be doing IDN filtering
Previous by thread: Re: [idn] homograph attacks
Next by thread: Re: [idn] homograph attacks
Index(es):
- Date
- Thread