[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Figuring out what can be displayed



Gentlemen,
I have discussed here the IDNA issue. I have related with Adam Costello on the list and off list and I thank him for that. I have heard the different remarks of many. I will now tell you how I will document and implement my axcess engine system "IDNA" support.

I offer that to permit an analysis on a real project. I do not expect to be criticized on religious/theoretical grounds but to be helped towards a better end-user and OPES support, in a way creating the minimum discrepancy or no discrepancy with the standard you propose.

I do this because I fight for my own business survival in a real international world, not as a corporate manager or a distinguished academic. I only possibly paid by real customers with real users.

My premises:


1. I live for 25 years (1977) in the international data communications world. I always made my money there for ma family, except since I came into the IP world. I still accept it as a long investment period.


2. Since the very beginning I am involved with the international namespace (INS). It interconnected in 1984 with DNS (there are some differences in perspectives quoted by John Klensin who was not involved here by then but has a command of the Internet history. Probably because of the time flow, and because when you build a gateway there is by nature a male and female perceptions). This is of low interest. What is of interest is the experience: we must take advantage from the pros of both sides and avoid the cons of both sides (I understand that is what he means). This gives me an unique experience about the way the real world behaves; and some more intuitive knowledge of the needs and constraints than those who related on the matter with Govs, operators, market, corporates, law makers, end users, local tech support, end-application developers. Sorry to quote that: it is only to explain why I suppose this mail might be of use.


3. for 20 years the only thing which really worked and made the "Internet" in most people minds is the DNS part of the namespace. The reasons why it works is because it is stable, simple and most of all consistent with intuitive addressing feeling.

I co-created the namespace in a form which developed all over the world,. But with a semantic which would NOT have had the same impact as the DNS.

We started with the root names, then from LEFT to RIGHT (as in a sorts, disk directories etc.) we permitted operators to add hostname as an extension, then sub-hosts. We created the hierarchy, but it was a mess.

Because we had no zone (as in IP addresses). What the DNS brought was to respond intuitively to the need of the users in sorting out clearly that mess. The reversing from right to left at the gateway and the use of the "." as a reversing indicator did it all. RFC 920 the "prefix" of the time was the suffix ".arpa".
(When you start with an unique thing you start with two: it and not it. Two is by nature a family and a family grows).

It did it all, because the resulting addressing logic was brainware consistent with postal addressing and EDI rules. Country last. And you can keep adding details first. Like on an envelope the name (e-mail name) comes first (and as in the "vielle France" etiquette: the use the "@" [at the proper "à Monsieur," place] as it was created for this in Middle-Age (@ is latin "ad"). This means a brainware consistent system with an habitus of more than 8 centuries.

We kept BOTH system working. We interfaced a few people to ARPA in right to left hierarchy and supported in left to right the numeric names X121 hierarchy. From 1982 to 1986 I carried the same task as IDNA - but the other way : instead of expanding from 38 characters to billions, I reduced from 36 to 10. For the same public.

From experience, the reason why it worked (IMHO) is that we adapted to the least able technology. We were using names and had to support digital only addresses. We used "numeric names". But we had all the capacities of the names, so when Transpac was only using an IP-address like scheme , we had the flexibility of a complete real time, flexible and powerful database. So we proposed value added services but not a new system.

I think the only way IDNA can succeed is the same way. Unicode is the powerful system: DNS is the less powerful one. I am an seaman: the speed of a convoy is the speed of the slowest vessel less the zig-zags. IDNA must be considered as a DNS service, subject to the constants of a dual system.

IDNA can only be DNS value added, not as an alternative system, even embedded.. The only alternative system would be to study, specify, develop, experiment a new DNS. IMHO it cannot be done in two minutes (IMO it will take years and due to the political implications that it is a big, big task ahead which will involved 190 countries universities, Telcos, Govs, communities). I also think it cannot be done one shot. This is why I suggest to look into DNS.2 (improving stabilizing the system, its operational architecture, its political insertion, etc... in compatible way with the current DNS - I do not even know if we need to increase the character set). This is why I also work on top of it on extended DNS services, like IDNA, authetification, access engines, etc (DNS+). This is the rational of the Dot-Root proposition ( http://dot-root.com ) still in infancy and only partly documented in English. But with gaining interest.

So I only consider serious and conform use of the DNS: the rule and the bible. Errors/tricks in using it are of no concern to me. Errors and tricks do not make the basis for a rule (this is a basis of the Roman Law and of social life).

There is a hierarchy of the addressing information. That hierarchy is necessary to get the next info. There is no use to know an ASCII or Unicode the forename of someone if you do not know the name, the city and the country. The way you write that information does not make it different.

Question: are there existing additional hierarchies in postal (brainware) addressing? ie something the mailman need to chose between different mail destination - and or mail path?

- not the title (Mr, Mrs, Dr, M. Mme, Snr, M.M., Her, Esq. etc...)
- not the type of information: a nickname is accepted as long as the layer below accepts it. I mean than "Jack" is accepted for "John" in an Irish family and "Jefsey Morfin" will be accepted in the "Morfin" mailbox.
- partly the service : is it a hierarchy/an added information in the naming hierarchy? Postal Services all over the world use "Port Payé" or "Port du". The same as all the DNS Managers use "MX". The routing: "AirMail" or "ParAvion". (note: these hierarchies are not in naming. They are used when applying before or in parallel to the routing).
- the scripting is an intuitive (real) hierarchy: is the scripting local to the sender, to the sendee or international scripting? Usually only international and sendee scriptings are accepted. The place where the discrimination is carried is the first point where a scripting is not understood anymore. In some cases it can go beyond through an alternative path: in adding "c/o some go-between" who will translate.


4. access engine. To understand the way I am going to use IDNA+, and why, one has to understand the access engine concept. I name and develop "access engines" DNS resolvers which use an extended resolution strategy. For example my domain name is http://utel.net . There is no access engine implemented there, but http://jefsey.morfin.utel.net could accept the call as the default for utel.net and resolve the 3 and 4LD as jefsey.morfin.utel.net, morfin.jefsey, jefsey, morfin, vacations etc...

This kind of OPES provides a directory service but also an easy ULD (upper level domains) management system in using LDAP like system or more advanced directory solutions, etc...

This means that I have no problem to introduce transparently punnycode support in my existing names. http://U+xxxU+yyyyU+zzzz.utel.net are OK.

The access engine resolves in two ways. Either as a reroute or as a normal DNS server. This depends on the economics and on the user system architecture. They are transparent. http://jefsey.utel.net can resolve one day as a reroute to http://jefsey.com and the next day as a CNAME for http://jefsey.com. I note that the test for a French application supports 1.500.000 names with a target for 150.000.000, with some added value variations (abbreviations, accents,e ct) what corresponds to billions of names. The DNS could not support all this, but provide a stable, simple, existing support for the ULDs.

These access names are either free or cheap and will be legally enforced one day (we will support the national IDs, immotic telemates, etc). We cannot plan to spend much energy to support billions of names for millions of people.

Obviously these names must be transparent to different accepted writings: upper, lower, accentuated cases. "eleve" must be bijective with "élève".

That engine can to some extend accept the most common typos and the sound-alike wording. But it will not easily accept that a French "O" is replaced by a Greek "o".


5. legal issues. Due to the IETF current lack of documentation between the software domain name as an alphanumeric pointer to an IP address, and the brainware mnemonic cannonic/alias to a domain name, there are out of context laws/jurisprudence such as ACPA, Whois, etc. we must live with.

I do not want to run into endless UDRPs and "a la Joe Sims" contractual issues because a Chinese way to write something will print on all the French non-IDNA displays and printers as "iesg-ibm.fr" or as a racist text.

Not a minute nor single penny to spend helping other to understand if what will legally counts is one or the other formula.


Now the "Jefsey's solution".

6. From all this (and other economy, marketing, social etc... points) my implementation and explanation will be as follows.

a) there is an unique DNS naming sequence. That sequence utilizes the International Domain Names System Character set, named "IDNScode". Today IDNScode includes 0-9, A-Z dash and dot. Nothing prevents it to be extended by the DNS designers.

b) the extended-Unicode set (e-Unicode) includes the current Unicode version plus the current IDNScode version..

c) the support of natural Internet names is organized through specialized sub-domains the charter of which defines the e-Unicode supported sub-set, and the bijective e-Unicode/IDNScode reading/writing function. Their ULD (upper level domain) will be of the form ".prefix--suffix.tld", where

- prefix indicates the conversion system
- the suffix indicate the used e-Unicode restriction.

The prefix will be the "iesg--" prefix for the IDNA system. The null suffix will default to "IDNScode".

d) registrations will use current NIC management systems with a transwritting using a punnycode routine after a sub-namespace character filter.

e) abbreviated name presentation will be supported to possibly hide/insert the script sub-domain. This will be insured by "prefix--suffix" loaded or embedded plug-ins (on the system) or as OPESes.


Discussion


7. I will use the French name "élève" (pupil) fro this discussion, since it is supported by my keyboard and should probably be by all of yours.

a) support of international registration by AFNIC : http://eleve.fr

b) support of the French registration by AFNIC:
- uppercase scripting : http://ELEVE.FR - existing as http://eleve.fr
- accentuated lower case : http://élève.iesg--fr.fr
entered also as :
- http://eleve.fr (decision to be taken by AFNIC to comply with French law)
- http://eleve.iesg--fr.fr (until the IDNScode == Unicode)

c) update of the e-Unicode character set
- one shot ^parallel registration of the new scripting equivalent
- management of the possible conflicts
- de registration of the obsolete e-Unicode scripts once every user supports the new version.

d) work to be done to support this service
- creation of the "iesg--fr" domain name
- adding of a French character set list in the Punnycode C code.
- delay to the market: 24 hours after the release of the "iesg--" characters.

e) user implementation
- this is transparent to the current situation
- e-Unicode support can be provided by who wants and different natural character sets can be implemented depending on the user keyboard.

f) legal implications
- none
- the e-Unicode scripting is by nature 3+LD names outside of the WIPO area of influence.
- jurisprudence will develop probably on brainware mnemonic second level, but it should be related to the documented good, rather than a good by itself. if the good is the domain name: the standard UDRP apply on the registered DNS entry: nothing changed. If it applies on any other good: domain names are not even only more considered as such.

g) extensibility
- there is no addition to the DNS which can be as freely extended as before
- there is an unlimited extension of the natural character set through the suffix. There an unlimited capacity of extension on the processes through the prefix. There is a quasi unlimited capacity of extension though another semantics.

h) load
- the load forecast on the DNS imposed by current IETF proposition may be tremendous a TLD may become a new worldwide ".com" with hundred of millions of entries, on the whim of a fashion. The sub-domain approach obviously permits the transparent load of billions of Internet names.
- however the load shifts towards the sub-domain information management within the work frame of the new loads imposed on the root server system and on the DNS. This load is to be considered all together with the load imposed by Microsoft's "dynamical" DNS management, portables, DNS lookup leaks, racing multiple root calls for speed of some resolvers, ENUM support, security concerns etc..



Summary.

The whole Recommendation is that the Internet Community agrees that a domain name including a double dash "--" means a domain name which can been hidden/created by applications should the corresponding set of recursive processes (defined by its charter) it describes, have been applied.


Some examples:

- iesg-- : this domain specializes in IDNScode scripts
- iesg-fr this domain specializes in "French names".
- fm-- : this domain specializes in telephone follow-me services
- iesg--fm-fr : this domain specializes in telephone follow-me services in French accentuated names.
- enum--fr: enum names tel://139500510.enum--.fr
- tel--fr: enum through french name names : tel://eleve.tel--fr.fr entered on my mobile as "éléve.fr" that may be transcoded into tel://eleve.fr, tel://élève.tel--.fr.fr http://eleve.iesg-tel--fr etc. depending on the services organized by the operator.


I certainly accept that it is rather different from the current wording and concerns. But I think it is compatible. I hope it may help. I will keep posting on the implementation ... as I find the funding.

jfc