[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] WG last call documents



"Eric A. Hall" <ehall@ehsco.com> wrote:

> [this is part 1 of a multi-part response]
> [this is part 2 of a multi-part response]

Do I have the last part yet?  I don't know, I'll just respond now...

> > We recommend, we don't demand.
> 
> SHOULD defines *default* behavior, it is mandatory UNLESS there is a
> reason NOT to do it.

Yes, I understand.  We don't demand that implementors apply ToUnicode,
we recommend it.  Or seen another way, we demand that they either apply
ToUnicode or think of a good reason not to.

I can see your point that maybe even that is a little too strong.

> Consider the following DEFAULT mandatory interpretation:
> 
>   From: <support>
>   To: <customer>
> 
>   Please add zz--gobbledygook to your sendmail.cf file
> 
> which is converted BY DEFAULT into
> 
>   Please add <IDN> to your sendmail.cf file

It was never our intention that applications should go digging into text
directed at humans (as opposed to protocol messages, which are directed
at the application itself) and convert things that they guess might be
domain names.

This is probably not clear from the current wording.

>   1) Connection identifiers (as used for forward lookups by
>      application clients, or as generated by reverse lookups)
>      MAY be transliterated for display purposes, although
>      careful consideration is to be given to any effects such a
>      conversion may have on any secondary applications.
> 
>   2) Domain names which are provided as protocol data MUST NOT
>      be transliterated, except where the governing
>      specification  explicitly permits and describes such usage.
> 
>   3) Domain names which are provided as structured data (such
>      as email addresses, URLs, and other common data-formats)
>      MUST NOT be transliterated, except where the governing
>      specification explicitly permits and describes such usage.
> 
>   4) Domain names which are provided as unstructured data in
>      application output MUST NOT be transliterated, except where
>      the governing  specification explicitly permits and
>      describes such usage.
>
> I think we all agree on 2 and 3.

Assuming "transliterated" means "converted from ACE to Unicode" (as
opposed to the other way around), then yes I agree on points 2 and 3,
and in fact the IDNA spec already implies points 2 and 3 in requirement
1 of section 3 (domain labels inserted into generic domain name slots
must contain only ASCII).

Point 4 could be taken care of by limiting requirement 2 to ACE labels
obtained from domain name slots (as opposed to ACE labels guessed at in
plain text).  Point 1 could be taken care of by explicitly allowing for
exceptions.

What do you think of this wording:

    2) ACE labels obtained from domain name slots SHOULD be hidden
    from users except when the use of the non-ASCII form would cause
    problems or when the ACE form is explicitly requested.  Given an
    internationalized domain name, an equivalent domain name containing
    no ACE labels can be obtained by applying the ToUnicode operation
    (see section 4) to each label.  When requirements 1 and 2 both
    apply, requirement 1 takes precedence.

I could live with that.  I don't know about Patrik and Paul.

> > That would defeat the whole point of IDNA, which is to allow
> > applications to use internationalized domain names with legacy
> > protocols and interfaces, without having to revise those protocols
> > and interfaces.
>
> I'm sorry that you are operating under that belief.  As I have pointed
> out multiple times on the mailing list, IDNA serves a single objective
> of a backwards-compatible encoding format.  It does not bring i18n
> domain names to legacy applications, protocols or data-formats by
> itself.

It does not bring i18n to legacy applications, but it *does* bring i18n
to legacy protocols and data formats.  One of the main examples driving
the design of IDNA was this one:  We should be able to update mail
user agents (like Pine) and immediately be able to use IDNs in email
addresses without having to touch mail transfer agents (like sendmail)
or DNS servers (like bind), and without having to wait for any updates
to protocol specifications (like the DNS spec and the SMTP spec) or data
format specifications (like RFC 2822).

> > ...the IDNA spec defines what is a valid internationalized domain
> > label.  Nameprep alone cannot possibly tell you whether the label is
> > too long; you can know that only after applying Punycode.
>
> This co-dependency needs to be called out at the top of the i18n
> domain name profile.

The last sentence of the nameprep abstract is:

    This profile of the stringprep protocol is used as part of a suite
    of on-the-wire protocols for internationalizing the DNS.

I agree that there should be a citation to IDNA at the end of the
abstract.  Nameprep is useless without IDNA.  I'd also like to see "the
DNS" changed to "domain names", since we're not actually doing anything
to the DNS.

> > Perhaps you think the term "generic domain name slot" is
> > counter-intuitive and should be renamed (but not redefined)?
>
> The use of "generic" is what bothers me, because it implies that the
> slot (another word I dislike) can hold any domain name.

Some domain name slots are explicitly internationalized and are thus
special; all other domain name slots, being non-special, are "generic".
That's the way I intended the word to be understood.  Other words that
could serve the same purpose are "vanilla" and "ordinary".  Would either
of those be better?  Or maybe "non-internationalized".

"RFC-1035 slot" is not inclusive enough.  If a protocol/interface
spec says that a field/argument is a domain name, and says nothing
else (makes no mention of RFC 1035 or any other standard for domain
names), then that is a generic domain name slot.  Or if it cites various
standards that govern the slot, but IDNA is not one of them, then it's a
generic domain name slot.

As for the word "slot", would you like to suggest some other word?  Here
is the concept that the term "domain name thingamabob" needs to stand
for: "a protocol element or a function argument or a return value (and
so on) explicitly designated for carrying a domain name."

> So you have an example of another WG specification where only half
> the opinions are provided, and that they provide value towards
> understanding the protocol or service? Citations Please

Not an IETF spec, but the PNG spec includes a rationale appendix, and
we've received comments saying that it was helpful and it would be nice
if other specs did the same.  I don't think we ever received comments
saying it should not have been included.

> It is also factually in error.

If that's true, it should be fixed.

> A prohibition against combining characters is not feasible. See
> <200111212258.OAA23757@birdie.sybase.com> from K Whistler.

He said that we could not simply prohibit combining characters, and
I agree.  I was suggesting that we might want to prohibit *initial*
combining characters.  Combining characters combine with the character
preceeding them, so a label beginning with a combining character is a
strange beast, and might cause bizarre and suprising behavior.

AMC