[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] WG last call summary



> This doesn't mean that the whole box has to use UTF-8. Frankly, I don't
> even think that's relevant. Instead, it is a question of whether or not
> the related components (searching, as in the previous example) will be
> likely to deal with UTF-8, rather than having to selectively graft an
> exraneous encoding into select portions of that service in order to
> provide simple functionality (as with 2047 and searching, again).

again, the effort to get the ASCII encoding right is trivial compared
to the effort required to do string comparisons in a way that's effective 
for searches. 

> But this is also entirely irrelevant. By your argument that the transfer
> encoding is irrelevant, I would like to hear your arguments as to how,
> say, using EBCDIC to pass ASCII data around could possibly be seen as
> reasonable design. 

it would be quite reasonable for applications that already had a 
deeply-wired assumption that character strings were in EBCDIC.
Which is why on BITNET and related networks, the applications 
generally exchanged data in EBCDIC even between ASCII-native hosts.

> Of course the native encodings are always best. 

utter hogwash.  first, there is no single "native" encoding of UCS, 
second; there's no encoding of IDNs that is native to the vast majority 
of deployed applications.  

> The fact that most of the apps are heading towards UTF-8 should tell us 
> that we should be designing for a long-term support infrastructure that
> provides the data in the format it is going to be used in. 

again, that's hogwash, because it completely ignores issues associated 
with transitioning existing apps and/or with maintaining multiple 
interfaces to DNS.

> Furthermore,
> whenever the remaining services get upgraded or replaced, they should be
> able to use something a little better than the best technology that 1968
> money can buy.

that's like saying that now that we have automobiles, we should completely 
rearrange all of those cities that were designed on a walking scale, that
we should change the width of roads so that they're more appropriate for
motor-driven vehicles than horse-driven vehicles, etc.  the fact is that 
once conventions get established, it's often more effecient overall to 
stick with a convention that is sub-optimal than it is to rebuild everything 
to be optimized for current conditions - particularly when those conditions
will probably change again anyway.  (e.g. it's much easier to build mass 
transit for cities designed on a walking scale...commuting by automobile 
sucks!)


if and when the rest of the protocol uses UTF-8, then the worst that 
happens is that there's an extra layer of encoding that happens before 
a DNS query is sent.  it's isolated to a single routine.  it's not 
going to hurt enough to matter.   and having the client do encoding
prior to lookup is less complex than the extra cruft that's required
to take advantage of having two kinds of DNS queries without introducing
a significant new source of failures or delays.

like I said, blind faith in cleanliness is no substitute for analysis.

> > Second, the portion of IDNA that does ASCII encoding is such a trivial
> > bit of code that the number of failures introduced by that code will
> > pale in comparison to those introduced by the other code needed to
> > handle 10646 (normalization, etc) which would be needed no matter what
> > encoding were used.
> 
> Getting new problems in addition to shared problems is hardly an argument
> in your favor. 

no, but you're ignoring the set of new problems that comes with either
a) providing multiple DNS query interfaces or 
b) failing to provide an ascii-compatible encoding of IDNs that can be
   tolerating by existing apps
but sure enough, if you close your eyes, you can pretend that the room 
is empty.

> 
> > Numerous examples demonstrate that transition issues are often
> > paramount in determining whether a new technology can succeed.
> 
> I agree that transitional services are important. I also think that the
> evidence shows that end-station layering works well when existing formats
> are used as carriers for *subsets*, and when it is targeted to a specific
> usage. That isn't what's being done here, though. Instead, well-known and
> commmonly-used data-types will get *extended* into broader and
> incompatible forms by default, and it will happen purposefully and
> accidentally. This is not transitional probing, it is knowing that stuff
> will break and doing it anyway.

the alternative is to say that all apps have to be rewritten to take
advantage of IDNs, and the likely result is to fragment existing apps
into >> 2 non-interoperable enclaves.  the IDNA path causes less harm.

> Cripes, why do we have to do it all in a big-bang? Can't we start with the
> transfer encoding (no required upgrades for anything), incrementally add
> transliteration where we know it will be safe and robust (some upgrades),
> and then add UTF-8 for those newer services that can make use of it (some
> more upgrades)? What is the problem with this?

it completely ignores the forces that drive deployment of upgrades. 

> The complexity required for a direct UTF-8 name-resolution service in
> conjunction with simple passthru-everywhere is minor in comparison to the
> complexity of transliterate-everywhere.

only if you have the luxury of starting from scratch, before any
applications are written.`

Keith