[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] stringprep comment 2



This message argues both sides of the issue.  :)

Soobok Lee <lsb@postel.co.kr> wrote:

> The latter can be as catastrophic as the former.

I assume you meant that false negatives (where names don't match when
they should) can be as catastrophic as false positives (where names
match when they shouldn't).  Can you back up that claim?

> if each application vendor adopts its own different nameprep profile,
> applications behaviors may be unpredictable across applications for
> end users.

Do you have a suggestion?  What should happen when an application
encounters a name that uses code points newer than the application's
version of nameprep?  If the application prohibits unassigned code
points, then the name will never match anything, because ToASCII will
fail.  If the application allows unassigned code points, then the name
will never match the wrong thing, and might sometimes match the right
thing (in practice, I think it usually will work).  Which is preferable?
The conservative approach (never match) is more predictable, but the
other approach (match if you're lucky) might make users happier.

Wait, I just realized why we needed to avoid comparing two strings
that have both been prepared using loose stringprep.  If they both use
unassigned code points that turn out to be prohibited in future versions
of nameprep, then they might match even though they are both invalid
names.  That's a false positive, which is bad.  So we do indeed need to
avoid such comparisons.

Disregard my suggestion from my last message.

Perhaps the stringprep spec should say that applications may use loose
stringprep only if they know for sure that the name will never be
compared against a name that was also prepared using loose stringprep.
If there's no way to know, then you must use strict stringprep.

In the case of DNS, if the IDNA spec requires authoritative servers
to use strict nameprep, then clients are free to prepare queries
using loose nameprep.  Other protocols could in principle use similar
methods--requiring strict nameprep at "one end" (whatever that means for
that protocol) so that the "other end" can use loose nameprep.

But how practical is that?  Take email headers for example.  Who has any
idea what will be done with domain names that appear in email headers?

Maybe it would be a lot simpler and safer just to prohibit unassigned
code points always.  If you want to use new characters, you'll just have
to upgrade your software to the new nameprep, sorry.

Can we get some more people involved in this thread?  I think Soobok is
right that the existing wording in stringprep about "stored strings" and
"query strings" is going to be very difficult to interpret in practice,
and something needs to be done about it, but I don't know what.

AMC