[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: My prod at IDN requirements



At 11:04 04.01.00 +0800, James Seng wrote:

>I think we need to properly defined 3 case.
>
>I18N of Domain Names as represented on the client.
>I18N of Domain Names as represented in DNS packet.
>I18N of Domain Names as represented as DNS record/zones.
>
>They may be the same, or they may not be. We do not know.

I think "I18N of DNs as represented in DNS packets" is part of the 
implementation, not the requirements - if there are requirements that 
constrain the solution, we need to list them before we decide on the solution.

Doubly so for the zonefile formats.

comments on some comments:


> >    i18c in a name field of a Response or in content of a RR must be
> > uniquely identifiable as such YES/NO
>
>this is sort of related to the matching problem. but i think yes.

this is where we decide between solutions that say "ship binary gunk and 
let the recipient decide whether it's charset<x>, IPaddress, key or 
Something Else" and "ship special label that means this is i18c".

> >    it must be possible to DNSSEC sign i18c records DNS server to client 
> YES/NO
>
>yes. we should not change the existing dns system.

if we sign them, it means no conversions in intermediate resolvers.


> > More in the solution space:
> >
> >    iso 10646 characters will be enough forever for DNS purposes YES/NO
>
>UCS-4 should cover all languages including all variation in time to come.
>However, it also have a lot of problems, including the fact that it changes
>from time to time :P

UCS-4 is 1 representation - UTF-8 and UTF-16 are representations of the 
same charset. They have promised (ugh) that it is now only growing, not 
changing.


> >    a single representation for i18c must be chosen YES/NO
>
>maybe? i think different proposals will have answer to this. i think we should
>leave it open, and not limit to only iso10646 or some other encodings.
>
>
> > For matching records, Choose One:
> >
> >    it matters whether matching is consistent across all servers
> >    it doesn't matter whether matching is consistent across all servers
>
>I think obviously we need to make sure matching is consistent across all
>servers.
>
> >    i18c Cyrillic A must compare equal to Latin A
> >    i18c Cyrillic A must compare not equal to Latin A
> >    i18c A with Ring Above must compare equal to a with ring above
> >    i18c A with Ring Above must compare not equal to a with ring above
> >    i18c ASCII A must compare equal to a
> >    i18c ASCII A must compare not equal to a
> >    i18c A + COMBINING RING ABOVE must compare equal to A with Ring Above
> >    i18c A + COMBINING RING ABOVE must not compare equal to A with Ring 
> Above
>
>case-folding is not a simple problem, even for european languages as it may
>varies on context. http://www.unicode.org/unicode/reports/tr21/ is a good
>report on case mapping problem, at least for european languages.
>
> > Others are MUCH better than me in compiling example cases and requirements
> > for Korean, Japanese, Thai, Arabic, Hebrew.....
>
>in addition, there are also languages which have other problem on folding.
>chinese for example have simplified & traditional glyphs which means the same
>thing, use in the same way but given different codespace.
>
>japanese kanji also have traditional & simplified glyphs but it is usually
>considered differently. or at least that is what i have been told.
>
>this will be a problem if ISO10646 is used. because of the CJK unification
>(arggh who is the idiot?), japanese & chinese falls under the same U+4E00 
>code space. if one folds and the other not, i think it is fairly obvious 
>how messy it is going to be.

Is this a fact or a "maybe a problem"?
I think we need to be as specific as possible here....for each folding 
problem, name a glyph that has the problem, if possible.

>korean hangul if i am not wrong does not suffer from this problem :-) it is a
>very clean and well-designed language.

perhaps we should all write Korean, then :-)

                        Harald

--
Harald Tveit Alvestrand, EDB Maxware, Norway
Harald.Alvestrand@edb.maxware.no