[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Minutes from INTLOC BOF

To: intloc-discuss@ops.ietf.org
Subject: Minutes from INTLOC BOF
From: Harald Tveit Alvestrand <harald@alvestrand.no>
Date: Fri, 14 Dec 2001 22:01:00 -0700
Those of you who were present - please check if these minutes fit with your 
memory of the occasion.

Thansk to David Lawrence for taking these notes.

------------------------------------------------------------------
Scribes Notes for the intloc BOF at IETF 52 in Salt Lake City
David Lawrence <tale@nominum.com>

In these notes, when "(?)" appears it means I was not entirely clear
on what was said.  Some personal names might also be misspelled; my
apologies.

Note that although notes were taken on things that are in existing
IETF documents, they are not entirely complete and I left out much
that can be found in those documents.

================================================================
Monday, 10 December 2001, 1530-1730

Agenda:
* Introduction - What problem are we trying to solve?  What problem are
  we NOT trying to solve here?
* Internationalization HOWTO
* Sorting and matching
* Relations to other bodies
  - ISO, Unicode Consortium, W3C, others
* Discussion -- areas not covered, areas not required
* Further work -- WG charter, design team volunteers, other?
* Discussion on further work


What problems are we trying to solve?
     - Consistent behavior across IETF protocols
     - Appropriate behavior depending on what behavior is needed
       (Fuzzy vs strict matching)

Can we even avoid this problem?  Like by mandating all IETF protocols
can only use numbers?  (Clearly said in jest, but the basic question
still needs to be considered.)

Can we push this off to another organization, thereby avoiding the
problem?  Not likely.

Document list:
         draft-alvestrand-i18n-howto (i18n handbook)
         draft-hoffman-i18n-terms (terminology)
         draft-faltstrom-i18n-sorting (sorting and searching issues)
         I18N Case Studies (not present -- not yet addressed)

James Seng: Both NFC and NFKC are inadequate.  Sometimes they overdue
what is desired and other times they underdo it.  Regardless, it is
not for IETF to change NFs, that is a Unicode issue.

Bill Manning: I see that there is an attempt in the two documents I
have read to characterize many of the problems being with equivalent
visual string representation.  I fear we are going to run into
serious problems with this, since in English we have some of them and
other languages have similar issues but worse.

James Seng: (?)

The Handbook
* Goal: Advice on how to internationalize
  - things like case conversion, knowing what characters you sending,
  knowing what language if needed
* Avoid common pitfalls
* Be aware of common tools
* Not the answer to everything

If you don't have a good reason not to do so, current IETF rules say
to use UTF-8.

Examples
* Separate the identifiers from the text
* Case conversion: downcase ok, round-trip case conversion bad because
  you might not end up with same string you started with
* Charsets: multiple charsets are a mess -- use one if you can
* Comparison: Binary, NFKC, casefolding

Next steps:
* Mailing list: intloc-discuss@ops.ietf.org
* Use intloc-discuss-request to join
* Do we need a working group?
* What do we need to work on?
* Do we have volunteers to continue work?

Rick Wesson: Harald, if you could have a group that would provide some
sort of wide, forward-looking input to other groups like idn about
whether things are withing their scope would be invaluable.

Harald: Like a standing advice directorate?

Rick: As long as we are going to have these issues keep coming up in
working groups, having some body, whether a WG or committee or
whatever, would be very valuable, particularly if the other WGs are
compelled to listen to it.

Pete: I think a standing WG is a bad idea, since we have had many bad
experiences with them IETF.  Would be better to have something like
how the Security AD pops up in your group if you have a
security-related issue.

Keith Moore: (?)

Harald: The comments I am hearing is that having documents would be
nice, but having a real live body would be nicer.

Rick Wesson: If you look at some of the proposals put before IDN you
see that they required a lot of learning to even understand the issues
they were trying to address.  This area is just too vague to expect
each group to try to do it on its own, would be much better to come to
a group which had the expertise to provide useful advice.

Ohta: Recommendation to IETF to people trying to internationalize
protocols: don't do that.  The problem is too hard because it is tied
to closely to localization.  Unicode is ok for the United States but
it is not good for the rest of the world.

Harald: I don't really understand what you mean by
internationalization and localization in the sense that use them,
since I have never seen any systems like the pie in the sky way you
seem to describe them.

Michel Suignard: Something was said that was just so untrue, Unicode is not
just a US thing because it is incorporated in the US, it was created
by ISO with people from all over the world and is just not a US-only
thing.

Dave Crocker: Solve the specific problems first so we can then worry
about the more general ones.

Harald: That's what we're trying to do, that's why we have a case
studies document in the works.  We have already tried to solve
specific problems.

Keith Moore: In a sense, I do not think internationalization is as
difficult as security is, even though there are some subtle points to
consider.  But I think that reusing i18n tech will be eaiser than
reusing security tech.  The advice will come one way or another, the
real question is do we keep roping people into it ad hoc or do we have
a standing committee.

Paul Hoffman: The person who said we don't need to internationalize is
just being disingenuous, because while it is true that we don't
physically need to do this we do need to do it if we don't want to
create islands of lack of interoperability.  Most people do have the
desire to at least have the text they write appear correctly to other
people, even if it can't be understood.  But it just isn't true that
you can just use one character set and it will just work for people

James Seng: Want to clarify that what Ohta-san apparently believes is
i18n and l10n is not the same way most other people think about it.
Based on past discussion, he seems to think that i18n is using English
because that is available around the world and using other scripts is
localization.

Ohta: (?)

Harald: So are you saying that i18n does not exist and should not
exist?

Ohta: It could exist if you do it wisely, but that does not seem possible.

Harald: I will have to think about that.

Paul Hoffman: This is kind of like Mike O'Dell, "if we can't really,
really do it right, we really, really shouldn't do it."

Rob Austein: The real definition appears to be that l10n is what each
individual user implicitly wants to happen but can't really describe,
and i18n is our bad, one-size-fits all of an approximation of a
solution to this problem.

Harald:  What I would like to see work is this example: if someone
gets an instant message in Hebrew and then pastes that into email then
sends it back to the first person, when the first person searches for
that string in the email they should be able to find it.

Pete: This is an operating system thing on each end and we at the IETF
just can't handle this, we only handle what is in between them.

Harald: I don't disagree.

Dave Crocker: I am struck that this exchange finally happened so far
on in to this meeting.  It was excellent, and a better way to look at
this since for most of the last half hour the problem has been "what
problem is trying to be solved" but focused around the definition of
terms.  The problem is very confused unless we agree on terms and that
exchange was better because it didn't even use the terms that people
are confused on.

Ohta: IETF protocols just cannot be internationalized.

Harald: They can be improved; recognizing that they will not be
perfect is useful but does not obviate improving them.

Marc Blanchet: We really need to publicize the mailing list IETF-wide
where people can come if they have i18n/l10n question.

Harald (asking people in the room to vote): Should we create such a list?
[Very many people in room say yes.]

Pete: I am concerned that such a mailing list will just become an
uninformative rathole with completely conflicting advice on the issues
that arise.

...

Ohta: (?)

Harald: Getting the information from one end to another is really the
easy part of the problem.  But the overall problem is much more
difficult than that, it encompasses on making sure they are properly
understood by both the machine and user on each end.

James Seng: This is just a red herring, because Ohta-san just means
something totally different than everyone else does with his
terminology.

Ohta-san once again tried to make some point about how you Latin
characters are what people use internationally, such as on passports,
and thereby used a definition at odds with whatever everyone else
wants to be talking about.

Pete: Suggestion: in this room, let "dog" mean "cat" and
"internationalization" mean "the ability to do multiple localizations
over the same protocol."

Then theere was an extended discussion of the i18n-guidelines
document's descriptions of the various protocol elements need to be
considered for internationalization, such as managed namespace
identifiers (DNS, URLs), local scope identifiers (login names,
filenames) and text fields (email messages).

Somethings we should not even think about in the IETF are user
interfaces and APIs (except for the rare case of APIs to a
protocol).

Ohta got up and said some more confusing things.  Harald suggested
that it did not seem that Ohta was not even listening to him and then
politely said that Harald therefore will not be listening to him
anymore.

Dave Crocker: There seem to be two areas of work: (1) packaging and
labeling, where we wrap data in the external information that allows
each end to understand the data that is being shipped, and (2) having
the protocol world actually interpret these strings in the semantic
context which they are intended.  It seems we have already been
through the exercise in (1) with MIME and that we have a pretty good
handle on it.  How necessary is 2?

Harald: Important.  Consider the important function of being able to
tell whether two strings are the same even if they have come from
different locales.  Getting sorting right is also very important, as
are canonical forms for security and not corrupting data during
truncation by chopping in the middle of a multibyte character.

...

Harald: The three things I wanted answered after this BOF:
        1) Do we have people willing to work on this?
                A score of people raised hands.
        2) What is it that we need to do work on?
        3) Does it make sense to make an official IETF WG?
Now is the time to make comments about what the next step is.

?: My observation is that this discussion is focused very much on
issues that are currently facing us in Europe, so it is very good for
us to pursue this.  But it looks like we still have problems with
Eastern peoples [refers to Ohta] and we really need to involve them.

Marc Blanchet: Do we need a working group?  I don't care, but I really
do think we need some sort of open advisory group.

Paul Hoffman: I do not think we need a working group, because this is
just unlike what a working group is in the IETF; it has no final goal
at which time it is disbanded, and perpetual working groups have just
failed in the IETF.  We do need a place for people to go, though, and
based on experience with the mailing list unicode@unicode.org I think
it needs to be a moderated mailing list.

Pete: This seems like BCP (Best Current Practices) type of work we are
doing here, which is fine but not something that should take up the
working group time.

Harald: Who wants a working group? (No one.)
        Who wants a mailing list? (Few dozen.)
        Who wants nothing? (Zita. :)
      Ok, so it is a mailing list.
        Who wants moderation? (Around 20 or so.)
        Who wants it totally open? (Around 20 or so.)
      Hmm, very even split.

General discussion pointed out that unicode@unicode.org provided an
open list outlet, and that there are forms of moderation that do not
include very active manual moderation.

Paul Hoffman and James Seng volunteered to be moderators.  Following
the discussion:
        Who wants moderation? (Around 20 or so.)
        Who wants no moderation? (One.)

Close of meeting.
Prev by Date: Re: IDN and Kerberos
Next by Date: RE: Minutes from INTLOC BOF
Previous by thread: Re: IDN and Kerberos
Next by thread: RE: Minutes from INTLOC BOF
Index(es):
- Date
- Thread