[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Newbie's questions implementing the [IDNA]



On Mon, Dec 09, 2002 at 10:47:18AM +0900, Seungho Lee wrote:
> Implementing [IDNA] spec, I found an incomprehensible parts,
> so I ask for your helps...

This list is not for a tutorial session, but i'll answer for your
questions as a fellow korean.

> 
> Question1.
> What kind of sequence of code points can be the input of ToUnicode 
> Operation?
> Only punycoded one or all the other ones?

any legitimate labels. LDH domains or PREFIX--***.
utf-8 and legacy-encoded labels are for presentation-only, and not for
the wire format according to IDNA.

> If in case of only punycoded one, it seems that the [NAMEPREP] operation 
> is unnecessary...
> (They will be in the ASCII range...)

Necessary for verification so that the output of ToUnicode must not contains
unnameprepped/prohibited/(unassigned) codepoints. IF the output contain such ones,
the input label should be displayed as it is in the form of the punycode encoding.

> And if all the other ones are permitted, isn't it that the [NAMEPREP] 
 
 What do you mean by "all the other ones"? Only LDH and PREFIX--** ones
are allowed as valid inputs to ToUnicode according to IDNA.

 ToUnicode does not  perform "legacy to unicode conversion", which are done
before ToASCII is performed.

> can't convert it to punycoded one?
> (There is no punycode encoding process in Step2,
> and From Step3, it is solid that the input is the ACE code points 
> starting with ACE prefix.)

I can't understand your question.


> 
> Question2.
> What is the return value of DNS?
> I think it is the IP address just like the usual DNS return value...

Right.

> For example, when I send the query 'PREFIX-089a.com'(this is a Korean 
> character) to the resolver,
> the DNS will return the IP address for 'PREFIX-089a.com', won't it?

Yes. PREFIX--** works like a usual LDH label.

> 
> But the [IDNA] spec says in page 3,
> - The ToUnicode opeartion is used when displaying names to users,
> for example "names obtained from a DNS zone".
> 
> Does the DNS return a name?

"DNS zone" dones not mean the DNS server query response.
It just mean "Visual representaion of zone file contents".

You should read IDNA again. ToASCII is performed before DNS resolving.
ToUnicode is for display and verification purpose for the punycode
encoded labels later.

> 
> Question3.
> I came to think that the ToUnicode opeation is not included in the major 
> flow of 'sending query and receiving the return values'.
> This is the flow I pictured... (only when a person is to use IDN for web 
> surfing...)
> 1) An user inputs an IDN to the browser... ( Let's suppose the input was 
> '?.com' )
> 2) The IDNA client program takes the input.
> 3) Through the conversion operation ( the Step 4 would be ToASCII ), the 
> input is changed to a punycoded one.
> ( '?.com' is converted to 'zq-089a.com' ; suppose the prefix is 'zq--' )
> 4) The IDNA client program sends the query to the resolver.
> 5) The resolver sends the query to the designated name server.
> 6) The name server sends the query to the .com name server.
> ( Suppose the user-designated name server has no cache for the query and 
> also is not the authoratitive name server)
> 7) .com name server sends the query to the zq--089a.com name server.
> 8) The zq--089a.com name server returns the IP address.
> 9) The client's resolver receive the return value(IP address)
> 
> So, even though ToUnicode indispensible process of [IDNA], (because the 
> spec requires that the uses should not see
> the incomprehensible punycoded names), I think it is outside the major 
> flow of 'sending query and receiving the reutrn values'...
> 
> Is this an acceptable thought?
> 
> And where exactly the ToUnicode operation should be done?
> Between 3) and 4)?

Sorry. Nowhere in your all steps. When an user inputs an IDN label, he/she
already see native IDN label , not punycode-encoded one. Why does he
need  toUnicode  there?  If prenameprepped form and nameprepped form have
different glyphs (uppercase inputs?), step 10) will be added to perform toUnicode
for displaying the nameprepped (lowercase?) one. But even in this case,
i think respecting the user input in the display/location bar is preferrable.


Soobok Lee