[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] BIG5< -> Unicode roundtrip compatibility



I enclose an excerpt from
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT.

It says that some TC char X cannot be converted into UCS chars without losing
roundtrip compatibility. ie, In a roundtrip conversion   X --> UCS(X) --> X',  
It may arise that X != X'.

Any legacy-encoded IRI(IDN)s including X, may fail to be compared
successfully if they had undergone conversions into/from unicode.

I will appreaciate if anyone present the history of BIG5 versions and its 
round-trip compatilibity problems in more detail.

Soobok Lee
--------------------------------------------------------------------------------


#	Name:             BIG5 to Unicode table (complete)
#	Unicode version:  1.1
#	Table version:    0.0d3
#	Table format:     Format A
#	Date:             11 February 1994
#
#	Copyright (c) 1991-1994 Unicode, Inc.  All Rights reserved.

(snip)
 
# If you have carefully considered the fact that the mappings in
# this table are only one possible set of mappings between BIG5 and
# Unicode and have no normative status, but still feel that you
# have located an error in the table that requires fixing, you may
# report any such error to errata@unicode.org.
#
#	WARNING!  It is currently impossible to provide round-trip compatibility
#		between BIG5 and Unicode.  
#
#	A number of characters are not currently mapped because
#		of conflicts with other mappings.  They are as follows:
#
#       BIG5        Description                    Comments
#
#       0xA15A      SPACING UNDERSCORE             duplicates A1C4
#       0xA1C3      SPACING HEAVY OVERSCORE        not in Unicode
#       0xA1C5      SPACING HEAVY UNDERSCORE       not in Unicode
#       0xA1FE      LT DIAG UP RIGHT TO LOW LEFT   duplicates A2AC
#       0xA240      LT DIAG UP LEFT TO LOW RIGHT   duplicates A2AD
#       0xA2CC      HANGZHOU NUMERAL TEN           conflicts with A451 mapping
#       0xA2CE      HANGZHOU NUMERAL THIRTY        conflicts with A4CA mapping
#
#	We currently map all of these characters to U+FFFD REPLACEMENT CHARACTER.
#		It is also possible to map these characters to their duplicates, or to
#		the user zone.  
#	
#	Notes:
#
#	1. In addition to the above, there is some uncertainty about the
#       mappings in the range C6A1 - C8FE, and F9DD - F9FE.  The ETEN
#	version of BIG5 organizes the former range differently, and adds
#	additional characters in the latter range.  The correct mappings
#	these ranges need to be determined.
#
#	2.  There is an uncertainty in the mapping of the Big Five character
#	0xA3BC.  This character occurs within the Big Five block of tone marks
#	for bopomofo and is intended to be the tone mark for the first tone in
#	Mandarin Chinese.  We have selected the mapping U+02C9 MODIFIER LETTER
#	MACRON (Mandarin Chinese first tone) to reflect this semantic.  
#	However, because bopomofo uses the absense of a tone mark to indicate
#	the first Mandarin tone, most implementations of Big Five represent
#	this character with a blank space, and so a mapping such as U+2003 EM
#	SPACE might be preferred.  
#
#	Format:  Three tab-separated columns
#		 Column #1 is the BIG5 code (in hex as 0xXXXX)
#		 Column #2 is the Unicode (in hex as 0xXXXX)
#		 Column #3  is the Unicode name (follows a comment sign, '#')
#			The official names for Unicode characters U+4E00
#			to U+9FA5, inclusive, is "CJK UNIFIED IDEOGRAPH-XXXX",
#			where XXXX is the code point.  Including all these
#			names in this file increases its size substantially
#			and needlessly.  The token "" is used for the
#			name of these characters.  If necessary, it can be
#			expanded algorithmically by a parser or editor.
#
#	The entries are in BIG5 order
#
#