[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] dichotomies



Adam M. Costello wrote:
That (or something very similar) was a principle that went into the
IDNA spec.  I personally was inclined to define both internationalized
domain names and internationalized host names, where the former would
be completely general (allowing *all* Unicode characters, even the
invisible ones), and the latter would be much narrower (excluding most
punctuation and symbols).  This would be an analogy to traditional
domain names (which allow all ASCII characters, even control characters)
and traditional host names (which allow only the ASCII letters, digits,
and one punctuation mark, the hyphen-minus).

On the other hand, there was an argument that the traditional
distinction between domain names and host names was the source of
endless confusion and debate, and was a mistake that should not be
repeated with IDNs.  I have some sympathy for that argument.

In any case, we ended up with just one set of non-ASCII characters for
IDNs, between the two extremes: only invisible characters are excluded.
(I think there's one exception--a visible space character that is also
excluded).

Another bifurcation that could be considered somewhat analogous is that of http vs https. We might even want to consider bringing the topic of security into the ACE prefix discussion. One could imagine a world where two different ACE prefixes co-exist, one new prefix for "secure" domain labels, the other (old) prefix for less secure labels. The secure prefix would have similar encoding and decoding rules, but would not have the sometimes-confusing mappings currently found in nameprep, and would prohibit a rather large number of Unicode characters and/or character types (for future expansion).


We might then choose "xn--s" as the prefix, so that the raw Punycode form would also be more secure since there would be an 's' next to whatever follows, rather than a hyphen, which looks more like a delimiter. E.g. xn--spypal-4ve instead of xn--pypal-4ve. Note that the spypal looks quite different from pypal. Of course, this example isn't very good since the beginning of pypal doesn't resemble the beginning of paypal. A better example would be one where the 2nd 'a' of paypal was a homograph.

However, a 2nd ACE prefix might be fraught with difficulties. Just for starters, we might end up with FQDNs with 3 different encodings (if there are 3 or more labels), i.e. both ACE prefixes and the pure ASCII TLD name. And then there would also be the question of *which* ACE prefix to choose while encoding. We might just have to specify that *all* the labels use the same ACE prefix (or pure ASCII, e.g. for the TLD). This would be consistent with RFC 1591 and current conventions (except for TLDs that allow just about anything underneath them). E.g. the .jp registry might have a rule that says that *all* domain labels either use one prefix or the other, together with pure ASCII for the final ".jp" part (or any part).

Co-existence is quite different from transition. Although migration typically requires the co-existence of the old and the new during the transition period, people normally intend to complete the transition by getting rid of the old (entirely or almost entirely). However, there are probably many examples of migrations that started with good intentions but ended up with rather long periods of co-existence. One that comes to mind is HTML vs XHTML. I don't know whether we will ever be able to exterminate HTML, regardless of our "good" intentions.

Erik