[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

host names and nameprep (was: Re: [idn] IRIs ought to use internationalized *host* names)



Hello Adam,

Sorry for the delay.
I'm splitting my answer into two. This one is on the
host name vs. domain name question.

At 03:30 02/03/27 +0000, Adam M. Costello wrote:
>James Seng/Personal <jseng@pobox.org.sg> wrote:
>
> > The discussion of the how URL is to be encoded and how Host: field are
> > to be handled is probably more relevant so lets get back to that.

Just to make sure that I don't get something wrong:

- Domain names are whatever can be used on the lookup side
   of a dns query. This includes all kinds of current and potential
   uses besides the core use that people are usually equating with
   the DNS.

- Host names are the names of machines. They are a subset of
   domain names, used in certain queries/records (e.g. A record).


>Okay.  Eventually this message will arrive at the following proposal:
>
>     Proposed repertoire for internationalized *host* labels:  All
>     characters in classes L (letter), M (mark), and N (number) are
>     allowed, and U+002D (hyphen-minus) is also allowed.  Everything else
>     is forbidden.

This is a very good first shot. There are some things that have
to be carefully checked, e.g. do some M (marks) have to be excluded,
or should some signs corresponding to the hyphen-minus be allowed.
Two examples I know would be the zero-width space which could be
desirable for Farsi, and the (idographic) middle dot, for which
several people in Japan have complained that it's not available
in XML names.

>Which characters should be allowed in internationalized host labels?
>This is an interesting question in its own right, and it's possible that
>the IESG will demand an answer.

>Notice that there is no conflict with Nameprep, because Nameprep does
>not prohibit any characters in classes L, M, or N.

I guess that if there were a conflict, the host names would
just have to satisfy conditions on both sides.


>If we were to adopt this definition of internationalized host name, it
>would best be understood as an amendment of ToASCII step 3 (which checks
>host name restrictions if applicable), tightening substep 3a from:
>
>          (a) Verify the absence of non-LDH ASCII code points; that is,
>              the absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
>
>to:
>
>          (a) Verify that the sequence contains only host code points;
>              that is, U+002D (hyphen-minus) and code points classified
>              as L (letter), M (mark), or N (number).  See appendix ? for
>              an enumeration of host code points.
>
>Or maybe the enumeration would go in Nameprep, or in a separate document
>that defines internationalized host names.

Looking back on when working on nameprep as a member of the design
team, I think the distinction between host names and domain names
wasn't clear, at least to me, and probably to several other
participants. At some point, I started to worry that having all
the symbols allowed might not have been the best choice. Of course,
if it's for domain names, then that's a bit different.

Regards,    Martin.