m17nCharset(3m17n) | The m17n Library | m17nCharset(3m17n) |
m17nCharset_-_Cha - - Charset objects and API for them.
#define MCHAR_INVALID_CODE
Invalid code-point.
MSymbol mchar_define_charset (const char *name,
MPlist *plist)
MSymbol mchar_resolve_charset (MSymbol symbol)
Resolve charset name. int mchar_list_charset (MSymbol **symbols)
List symbols representing charsets. int mchar_decode (MSymbol
charset_name, unsigned code)
Decode a code-point. unsigned mchar_encode (MSymbol charset_name, int
c)
Encode a character code. int mchar_map_charset (MSymbol charset_name,
void(*func)(int from, int to, void *arg), void *func_arg)
Call a function for all the characters in a specified charset.
MSymbol Mcharset
Each of the following symbols represents a predefined charset.
MSymbol Mcharset_ascii
Symbol representing the charset ASCII. MSymbol Mcharset_iso_8859_1
Symbol representing the charset ISO/IEC 8859/1. MSymbol
Mcharset_unicode
Symbol representing the charset Unicode. MSymbol Mcharset_m17n
Symbol representing the largest charset. MSymbol Mcharset_binary
Symbol representing the charset for ill-decoded characters.
These are the predefined symbols to use as parameter keys for the function mchar_define_charset() (which see).
MSymbol Mmethod
MSymbol Mdimension
MSymbol Mmin_range
MSymbol Mmax_range
MSymbol Mmin_code
MSymbol Mmax_code
MSymbol Mascii_compatible
MSymbol Mfinal_byte
MSymbol Mrevision
MSymbol Mmin_char
MSymbol Mmapfile
MSymbol Mparents
MSymbol Msubset_offset
MSymbol Mdefine_coding
MSymbol Maliases
These are the predefined symbols that can be a value of the Mmethod parameter of a charset used in an argument to the mchar_define_charset() function.
A method specifies how code-points and character codes are converted. See the documentation of the mchar_define_charset() function for the details.
MSymbol Moffset
MSymbol Mmap
Symbol for the map type method of charset. MSymbol Munify
Symbol for the unify type method of charset. MSymbol Msubset
MSymbol Msuperset
Symbol for the superset type method of charset.
Charset objects and API for them.
The symbol Mcharset.
The m17n library uses charset objects to represent a coded character sets (CCS). The m17n library supports many predefined coded character sets. r, application programs can add other charsets. A character can belong to multiple charsets.
The m17n library distinguishes the following three concepts:
Each charset object defines how characters are converted between
code-points and character codes. To encode means converting
code-points to character codes and to decode means converting
character codes to code-points.
Any decoded M-text has a text property whose key is the predefined symbol Mcharset. The name of Mcharset is 'charset'.
Invalid code-point. The macro MCHAR_INVALID_CODE gives the invalid code-point.
Symbol representing the charset ASCII. The symbol Mcharset_ascii has name 'ascii' and represents the charset ISO 646, USA Version X3.4-1968 (ISO-IR-6).
Symbol representing the charset ISO/IEC 8859/1. The symbol Mcharset_iso_8859_1 has name 'iso-8859-1' and represents the charset ISO/IEC 8859-1:1998.
Symbol representing the charset Unicode. The symbol Mcharset_unicode has name 'unicode' and represents the charset Unicode.
Symbol representing the largest charset. The symbol Mcharset_m17n has name 'm17n' and represents the charset that contains all characters supported by the m17n library.
Symbol representing the charset for ill-decoded characters. The
symbol Mcharset_binary has name 'binary' and represents the fake
charset which the decoding functions put to an M-text as a text property
when they encounter an invalid byte (sequence).
See Code Conversion for more details.
@brief Symbol for the offset type method of charset. The symbol #Moffset has the name <tt>"offset"</tt> and, when used as a value of @b Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by this calculation:
CHARACTER-CODE = CODE-POINT - MIN-CODE + MIN-CHAR
where, MIN-CODE is a value of @b Mmin_code parameter of the charset, and MIN-CHAR is a value of @b Mmin_char parameter.
Symbol for the map type method of charset. The symbol Mmap has the name 'map' and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up. The map must be given by Mmapfile parameter.
Symbol for the unify type method of charset. The symbol Munify has the name 'unify' and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up and offsetting. The map must be given by Mmapfile parameter. For this kind of charset, a unique continuous character code space for all characters is assigned.
If the map has an entry for a code-point, the conversion is done by looking up the map. Otherwise, the conversion is done by this calculation:
CHARACTER-CODE = CODE-POINT - MIN-CODE + LOWEST-CHAR-CODE
where, MIN-CODE is a value of @b Mmin_code parameter of the charset, and LOWEST-CHAR-CODE is the lowest character code of the assigned code space.
@brief Symbol for the subset type method of charset. The symbol #Msubset has the name <tt>"subset"</tt> and, when used as a value of @b Mmethod parameter of a charset, it means that the charset is a subset of a parent charset. The parent charset must be given by @b Mparents parameter. The conversion of code-points and character codes of the charset is done conceptually by this calculation:
CHARACTER-CODE = PARENT-CODE (CODE-POINT) + SUBSET-OFFSET
where, PARENT-CODE is a pseudo function that returns a character code of CODE-POINT in the parent charset, and SUBSET-OFFSET is a value given by @b Msubset_offset parameter.
Symbol for the superset type method of charset. The symbol Msuperset has the name 'superset' and, when used as a value of Mmethod parameter of a charset, it means that the charset is a superset of parent charsets. The parent charsets must be given by Mparents parameter.
Generated automatically by Doxygen for The m17n Library from the source code.
Copyright (C) 2001 Information-technology Promotion Agency (IPA)
Copyright (C) 2001-2011 National Institute of Advanced Industrial Science and
Technology (AIST)
Permission is granted to copy, distribute and/or modify this document under
the terms of the GNU Free Documentation License
<http://www.gnu.org/licenses/fdl.html>.
Mon Sep 25 2023 | Version 1.8.4 |