[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: An idn protocol for consideration in making therequirements



At 11:42 PM 2/10/00 +0900, Martin J. Duerst wrote:
>At 09:57 00/02/09 -0800, Paul Hoffman / IMC wrote:
> > At 04:43 PM 2/9/00 +0900, Martin J. Duerst wrote:
> > >The main problem I have with this draft is that as far as I understand,
> > >it introduces some restrictions on the current DNS, and/or a flag day.
> >
> > If it does, it is an error. The third paragraph of section 2.1 says:
> >
> > Note that a zone administrator can still choose to use "ph6" at the
> > beginning of a domain part even if that part does not contain
> > internationalized characters. Zone administrators SHOULD NOT create
> > domain part names that begin with "ph6" unless those names are post-
> > converted names. Creating domain part names that begin with "ph6" but
> > that are not post-converted names may cause display systems that
> > conform to this document to display the name parts in a possibly-
> > confusing fashion to users. However, creating such names will not cause
> > any DNS resolution problems; it will only cause display problems (and
> > possibly entry problems) for some users.
>
>So you mean that if a 'real' domain name starts with 'ph6', it
>will be converted according to the rules to something (longer)
>that also starts with 'ph6'? That would make sense, but could
>be said clearer in the draft.

No, it doesn't make sense. What the draft says is that a domain name part 
that begins with "ph6" (or whatever identifier is eventually chosen) is 
still perfectly valid and presents no problem except display to users who 
have idn-enabled displays.

>One main problem here is that it's very difficult to see whether
>something starting with ph6 is pre- or postconverted. Much
>more definitely than with =????=. Maybe that can be improved.
>And the chance that you have something with  ph6 is much higher
>than having to encode something with =????=.

Well, other than what you proposed are illegal in domain name parts, I 
fully agree that this makes the prefix clearer and reduces the chance that 
someone would want to use that prefix for something other than an 
internationalized part. The downside of having longer prefixes is that it 
means fewer characters for the domain part itself. I'm open to making the 
prefix longer (like 5 characters instead of 3), which would shorten the 
number of characters usable in name parts by about one.

> > > From there on, domain names with that prefix
> > >cannot be registered anymore.
> >
> > Not correct. They are explicitly allowed and explained in that paragraph.
>
>Okay, they can be registered, but not used the same way any
>other registration could unless all the infrastructure is deployed.

Sorry, that's still wrong. They could be used the same way as they always 
were. Some (hopefully most) display devices would show them differently, 
but they could be used in just the same way.

> > >Apart from the above, some minor comments:
> > >
> > >- The compression algorithm should not use a big-endian UTF-16
> > >   octet stream as input, but a stream of 16-bit values.
> >
> > Can you explain why? The compression scheme is tailored for UTF-16 and
> > could easily become an "expansion scheme" for arbitrary 16-bit input.
>
>Sorry, I should have written 'but a stream of UTF-16 16-bit values'.
>It doesn't make sense to describe an algorithm in terms of bytes
>if this just complicates things and implementation will use 16-bit
>units anyway.

Good point; I'll fix that.

> > >- Table 2 is missing.
> >
> > Nope, just hard to read because it only has one line in it.
> >
> > Table 2: Ranges of the first octet of input that use two-octet mode
> > 0x34 through 0xdf
>
>Then don't call it a table.

Fixed.

>  And after a first check, I suggest
>to start at 0x32, not 0x34. I don't know how that affects
>the rest of the algorithm, but the 0x32 and 0x33 areas are not
>really appearing in sequence, if they appear at all.

The characters in 0x32 and 0x33 are fairly much like symbols. This is a 
minor detail that can be addressed later; it doesn't affect the protocol in 
any significant way.

>One problem with the compression scheme is of course that
>Japanese hiragana crosses a block boundary.

True. Even with that (which will force hiragana name parts to do some 
switching between windows), the encoding comes out shorter than for UTF-8 
or UTF-5.

> > >- In 2.4 12), check whether you have a valid window.
> >
> > Not sure what you mean here (maybe we can do this offline).
>
>I don't have the draft, but I think I meant that if the window
>doesn't conform to table 2, reject.

Got it.

--Paul Hoffman, Director
--Internet Mail Consortium