...assuming we can make the language tag available via some dns tricks or some API...
I don't see that happening. The IDN working group decided quite deliberately that domain names would not contain any meta-info like language tags; they're just text strings.
Right. If you want to re-engineer the IDN bits-on-the-wire protocol in ways that were considered and rejected, feel free to submit a new Internet Draft and see if there is community interest.
First, I cannot speak for Eric here, but it seems to me that "DNS tricks" could include having a (new?) DNS record for the language(s). This would not embed the language tag in the domain label itself. The language tag could be looked up in DNS. (However, I don't like this whole language business anyway, as I indicated in other emails.)
Second, with all due respect, Paul, Adam et al, stringprep and nameprep are 2 amazing pieces of work, but I feel they did not go far enough. I think it is probably worth exploring whether we might map homographs to "base" characters in a new "prep", perhaps another profile of stringprep, to combat the IDN spoofing problem.
Third, there seems to be some resistance to continuing IDN work in the IETF space.
So, an Internet Draft may not be appropriate.
Michel has indicated that the Unicode Consortium is working on the homograph problem, so maybe we can place our bets there.
Alternatively, if some registries (e.g. CJK) wish to have their own rules (i.e. RFC 3743 "JET"), then they can of course do so. I certainly do not want to have a balkanized net, but the degree of balkanization is what matters. I.e. if we only have 2 methods (CJK vs the rest), that's better than having numerous methods.
--Paul Hoffman, Director --Internet Mail Consortium