[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] NSI Multilingual Testbed Information (fwd)



At 12:32 AM -0400 8/27/00, J. William Semich wrote:
>  I was
>strongly informed by John Klensin in Pittsburgh that the issue of patent
>disclosure WRT potential IETF standards was *very* important to the IETF.

Disclosure of both patents and pending patents, yes. Anyone who knows 
of either should certainly bring them to the group's attention.

>This WG is proposing ACE as a potential standard.

No, it is not. There are some proposals in front of the WG that use 
ACEs, and some that do not. The WG isn't proposing anything yet.

>  >So far, your case against ACE includes:
>
>Sorry, I have not been making a "case against ACE" in my previous comments.
>I have been providing information.

You have been providing it and giving legal advice about its 
applicability. Others disagree with your legal advice for various 
reasons.

>  Take it or ignore it. If that
>information works against ACE, so be it.

And if it doesn't, so be it as well.

>  >a) Software Patent by Jason which our (idns) lawyers have said it wont
>>    stand up in a claim due to prior public works. (Jason filed in
>>Australia
>>    few months after Martin's -00 I-D.)
>
>Good to hear. Will I-DNS accept liability for any future claims by Pouflis
>against any company, user or software developer who adopts an ACE as a
>solution?

This type of comment is completely out of line for an IETF WG.

>OK, here are some of the technical reasons ACE is not preferred to UTF-8:

And here is a list of responses showing that these "reasons" are mostly wrong.

>1. Usability problems: any domain part that has any non-ASCII character
>anywhere in it is
>transformed end to end to a form that is completely unreadable, causing
>problems at leakage points.

That's assuming that there are leakage points. You haven't shown any yet.

>2. The proposed ACE solutions are not ASCII-transparent at all.  In
>contrast, UTF-8 is completely
>ASCII-transparent.

They don't need to be ASCII transparent. This WG has a requirement 
that there be exactly one way to represent a host name part. If you 
can represent a name part using the RFC 1035 rules, then you MUST NOT 
also represent it as an ACE. This is the same as the requirement that 
strings encoded in UTF-8 MUST use the shortest encoding.

>3. Significant new functionality must be added to every client that is to
>use an ACE IDN system.  Not so for UTF-8.

You will have to add it to most clients. Of the few that take UTF-8 
as input for URLs now, many do it wrong. The fact that one or two Web 
browsers do it some of the time is not significant.

>4. One ACE proposal says "With internationalized names, the user
>application MUST
>convert the pre-converted name into a post-converted name so that is
>acceptable to resolvers."  This is not the case with UTF-8.

It's good to hear that most resolvers handle UTF-8 correctly. It 
would be helpful if you listed the resolvers that are known to work 
with UTF-8 input, and which don't.

>5. The ACE encoding schemes described in the ACE proposals are nearly
>always less
>efficient than the UTF-8 encoding scheme - often much less efficient
>  - even in those that utilize compression schemes.

Sorry, but this is simply not true for RACE. Please give examples of 
names for which RACE is "much less efficient" for typical names in 
any particular script. In many cases, RACE is more efficient than 
UTF-8, and even when it is less efficient, it adds fewer than 5 
octets for typical names.

>Example: UTF-8 itself exhibits a space advantage with the first 2048 code
>points,
>relative to higher codepoints. Since most common alphabets
>are in this region, in practice the overwhelming
>majority of alphabetic characters are encoded as only two bytes in the
>fully encapsulated UTF-8 version.

For most common names in scripts that use these codepoints, RACE has 
about the same compression as UTF-8.

>In fact the manner in which this is achieved with UTF-8 is
>inherently better, since it is insensitive to changes in
>the most significant byte per se - the codepoint just has
>to stay below 2048, and it will unless the DN contains Asian,
>Cyrillic, or African script or ideograms.  With RACE, it appears
>all the characters must have the same most significant byte for
>compression to work.

What you say here is flat-out wrong. Please reread the RACE draft. 
For names in non-Latin scripts, RACE is usually more efficient than 
UTF-8 or about the same.

>A quick look at the UNICODE character dictionary shows one
>reason why this reliance by RACE is problematic: Latin variant
>characters are spread across four 256 codepoint blocks, in such a way
>that most Latin-variant names with non-ASCII content will not
>compress.

Wrong again. They will compress fine, and will often be about the 
same length as they would in UTF-8. Since you said that you were 
going to give technical reasons, could you please justify what you 
say? For example, it would be great if you ran a comparison of UTF-8 
vs. RACE for the European names that you have registered in .nu. 
Again, though, please re-read the RACE draft before you do, because 
you clearly have not understood it.

>   This is unavoidable; there are just too many Latin variant
>characters to fit into a single 256 codepoint block.

So what? RACE takes this into account.

In short, the arguments above seem to be based on an incorrect 
reading of the RACE document, or a mixing up RACE with the other ACEs 
that have been proposed. Either way, your statements only weakly 
support the use of UTF-8 for on-the-wire use in the DNS.

--Paul Hoffman, Director
--Internet Mail Consortium