[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] One profile for domain names, or many?



Let me try to sum up the issues in the ongoing debate over whether
there should be a single standard Stringprep profile for all IDNs, or
whether different profiles should be allowed for different contexts
(like different DNS resource record types).

[By the way, I think concerns that IDNA's definition of IDNs might be
overly restrictive and might cause problems in the future is very much
on-topic for this working group, and such concerns should be aired.  I
thank Eric for raising this issue and working hard to make us understand
his side of it.]

I'll start by recapping the two models.

In the single-profile model (which is used by the IDNA draft), there is
one standard profile (Nameprep) that is used for all comparisons between
textual domain labels, and for all conversions of textual domain labels
between their ASCII and non-ASCII forms.  A textual domain label is one
composed of specific well-defined characters, not an arbitrary sequence
of bytes.  The profile is intimately involved in the definitions of
which labels are equivalent and which labels are valid.  Additional
restrictions can be imposed (some valid labels might still be disallowed
for particular purposes).  IDNA would govern all comparisons between
textual domain labels, and all conversions of textual domain labels
between ACSII and non-ASCII form, but it would have nothing to say
about conversions between domain labels and other data types; those
conversions are out of scope of IDNA, and would need to be defined
by whatever specs define those other data types.  The 8-bit labels
contemplated by RFC 1035 would also be out of scope of IDNA, because
there is no standard interpretation of the bytes 80..FF as characters.

In the many-profile model (as I understand it), domain names are really
sequences of bytes, and their interpretation as characters is merely
a view created by applications for the benefit of human users.  All
comparisons are really exact comparisons except for ASCII letters; the
illusion of case insensitivity for non-ASCII letters is created by
applications applying case folding when they convert user-entered text
into domain-names-which-are-byte-sequences.  Different text-to-byte
conversions, defined by different Stringprep profiles, can be applied in
different contexts, and this will appear to users as different kinds of
sensitivities.

I think the single-profile model is inspired by the existing model of
ASCII domain names as they appear to users.  I think the many-profile
model is inspired by the existing model of 8-bit domain names as they
appear to DNS implementors.

I've had my doubts about whether the many-profile model could really
work, but Eric is confident that it would, and my doubts have lessened
somewhat.  I have no doubt that the single-profile model would work,
and I don't remember anyone claiming that it wouldn't; Eric has argued
that it would make some things less convenient or less efficient.  Let's
assume for the sake of argument that both models work, and we just have
to decide which one we prefer.

Now let's examine an implication of the many-profile model, using a
concrete example scenario (please bear with me, the punchline will be
interesting).  Suppose there is an XXX resource record type, which
uses xxxprep for the first label of its owner name, and there is a YYY
resource record type, which uses yyyprep for the first label of its
owner name.  One registrant asks the administrator of a zone to create
an XXX record with owner label x, and another registrant asks for a YYY
record with owner label y, in the same zone.

Now let's plug in the details:

    x = fullwidth Latin small letter a
    y = Latin capital letter A

    xxxprep = case-fold, then apply NFC
    yyyprep = apply NFKC, then prohibit all lowercase letters

What does the zone administrator do?  First, verify that the labels are
properly prepared:

    x == xxxprep(x) ?   Yes.
    y == yyyprep(y) ?   Yes.

Next, check for any collisions:

    xxxprep(x) == x == fullwidth Latin small letter a
    yyyprep(y) == y == Latin capital letter A
    xxxprep(y) ==      Latin small letter a
    yyyprep(x) ==      error

That's all combinations, and they yield all different answers, so there
are no collisions here in the prepared space where DNS servers live.
The two domains are created for the two registrants.

What we failed to detect, and what in general could be extremely
difficult to detect (because Stringprep is not reversible), is the
existence of a collision in the unprepared space where users live and
think and type.  Consider:

    z = fullwidth Latin capital letter A

A user could use this same string z to refer to both of these domains.
If the user asks for an XXX record, z will match one domain; if the
user asks for a YYY record, the very same string z will match the other
domain.

In other words, from the point of view of users, there is not a single
domain name space in which all domain names live, there are multiple
domain name spaces (one corresponding to each profile), and the same
name can refer to different domains in different spaces.

This is unlike ASCII domain names, which all occupy a single name space.
An ASCII name always refers to the same domain regardless of the context
in which it is typed.

The single-profile model retains this single name space characteristic,
while the many-profile model introduces a new concept of multiple
name spaces.  In this sense, the many-profile model is a more radical
departure from the existing ASCII domain name model, while the
single-profile model is more conservative.

(By the way, it's not at all clear to me what should happen if the user
asks for ANY records matching z.)

A simple/conservative approach has both advantages and disadvantages
compared to a more complex/flexible approach.  Let's look at a couple of
those.

Suppose an application has some non-domain-label data type that it
wants to map onto domain labels.  It would be convenient if the trivial
mapping could be used (that is, do nothing, just call the thing a
domain label).  This is easier in the many-profile model, because the
application can define a Stringprep profile well suited to its data
type.  In the single-profile model, the trivial mapping works only if
the original data can tolerate being nameprepped.  If it can't, the
application will have to use a non-trivial mapping from its data type
to domain labels.  IDNA is no help there (remember that IDNA applies
only to conversions of domain labels between their ASCII and non-ASCII
forms, not to conversions between domain labels and other data types),
so the application will always have to do the conversion at both ends;
it cannot benefit from any automatic IDNA conversions built into the
infrastructure (like a new resolver interface).

So that's an example of something that's more difficult (but not
impossible) in the single-profile model.

Now suppose an application simply wants to start using IDNA to allow
users to enter and see non-ASCII characters.  In the many-profile model,
you cannot put a user-entered name into a domain name slot unless you
know the profile to be used for that slot.  Eric has been talking about
resource record types, but DNS is not the only protocol with domain name
slots; we also need to assign profiles to domain name slots in other
protocols, and until we do, IDNs cannot be used in those slots.  For
example, before an email application can add IDN support, it needs to
wait for profiles to be assigned to header fields, SMTP commands, POP
commands, and IMAP commands.  Before a web browser can add IDN support,
it needs to wait for profiles to be assigned to the various URL schemes,
cookies, and SSL certificates.  With the single-profile model, there is
no question of which profile to use, so applications can add IDN support
the day after IDNA is published.

So that's an example of something that's less convenient (but not
impossible) in the many-profile model.

In conclusion, there may very well be more than one right way to
define IDNs, with tradeoffs among the options.  Based on my current
understanding of the issues (which I think is better now than ever),
I would still choose the single-profile model.  I can understand that
other people might choose differently.

AMC