[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: An idn protocol for consideration in making therequirements



"Martin J. Duerst" wrote:
> Also, one potential failure from the old days is that the eigth bit
> is lost. I would like to do some checks on how many of the currently
> registered domain names could be interpreted as legal UTF-8 names
> that had their 8th bit taken off (other than the trivial identity
> case, which is of course UTF-8). If somebody can point me to some
> data, or tell me how to get at it, or otherwise collaborate on this,
> please tell me.

I am also to interested know how many UTF-8 domain name will become a
currently registered domain name if their eighth bit are been striped. I can
forsee people asking "Why is my XYZ.com going to sex.com?" :-)
 
Not every network/protocol are 8 bits clean. For example, RFC821 implied that
the mail headers should have their 8-bit striped off. See below.

-James Seng

Maynard Kang wrote:

James,

Spent tonight going through the e-mail RFCs in detail.

An interesting thing I noticed; since RFC 821 has never been superseded by
any other RFC, I tried to examine RFC 821 to look for specifications
regarding character set restriction. Apparently, in the words of the
original RFC 821:

   "Commands and replies are composed of characters from the ASCII
   character set [1].  When the transport service provides an 8-bit byte
   (octet) transmission channel, each 7-bit character is transmitted
   right justified in an octet with the high order bit cleared to zero."

[snip]
...RFC 821 explains how 7-bit characters should be represented in 8-bit
environment (pad the high bit with 0)...
[snip]