JNTA converter functions and classes
- class jntajis.ConversionMode
Specifies the encoding conversion mode.
- SISO
Instructs it to encode the given string into JIS X 0213 with the ISO 2022 escape sequences SI (
\x0e) and SO (\x0f) for the extended plane selection.
- MEN1
Instructs it to encode the given string into JIS X 0213 characters designated in the primary plane, which would theoretically contain all the JIS X 0208 level 1 and 2 characters in addition to some level 3 and 4 characters. Characters belonging to the extended plane will result in conversion failure.
- JISX0208
Instructs it to encode the given string into JIS X 0208 level 1 and 2 characters. Non-0208 characters will result in conversion failure.
- JISX0208_TRANSLIT
Instructs it to encode the given string into JIS X 0208 level 1 and 2 characters. Non-0208 characters will be tried the transliteration against.
- jntajis.jnta_encode(encoding, in_, conv_mode)
Encode a given Unicode string into JIS X 0208:1997 / JIS X 0213:2012.
- Parameters:
encoding (str) – The encoding name that is to appear in
UnicodeEncodeError.in (str) – The string to encode.
conv_mode (int) – The conversion mode. For the possible values, refer to
ConversionMode.
- Returns:
The encoded JIS character sequence.
- jntajis.jnta_decode(encoding, in_)
Decode a given JIS character sequence into a Unicode string.
- Parameters:
encoding (str) – The encoding name that is to appear in
UnicodeDecodeError.in (bytes) – The encoded JIS characters.
- Returns:
The decoded Unicode string.
- class jntajis.IncrementalEncoder(encoding, conv_mode)
An
IncrementalEncoderimplementation.For the description of each method, please see the Python’s codec documentation.
- Parameters:
encoding (str) – The encoding name that is to appear in
UnicodeEncodeError.conv_mode (int) – The conversion mode. For the possible values, refer to
ConversionMode.
- encode(in_, final)
- reset()
- getstate()
- setstate(state)
Transliteration functions
Transliteration based on the MJ character table and MJ shrink conversion map
The MJ character table (MJ文字一覧表) defines a vast set of Kanji (漢字) characters used in information processing of Japanese texts initially developed by Information-technology Promotion Agency.
The MJ shrink conversion map (MJ縮退マップ) was also developed alongside for the sake of interoperability between MJ-aware systems and systems based on Unicode, which is used to transliterate complex, less-frequently-used character variants to commonly-used, more-used ones. It defines four different transliteration scheme, and you can specify any combinations of those by the flags defined in MJShrinkSchemeCombo.
- class jntajis.MJShrinkSchemeCombo
Stores constants that specify the transliteration scheme.
- JIS_INCORPORATION_UCS_UNIFICATION_RULE
Instructs it to transliterate the given characters according to JIS incorporation and UCS unification rule (a.k.a. JIS包摂規準・UCS統合規則) if applicable.
- INFERENCE_BY_READING_AND_GLYPH
Instructs it to transliterate the given characters according to the CITPC-defined rule based on analogy from readings and glyphs of characters (読み・字形による類推.)
- MOJ_NOTICE_582
Instructs it to transliterate the given characters according to the appendix table proposed in Japan Ministry of Justice (MOJ) notice no. 582 (法務省告示582号別表第四.)
- MOJ_FAMILY_REGISTER_ACT_RELATED_NOTICE
Instructs it to transliterate the given characters according to the Family Register Act (戸籍法) and related MOJ notices (法務省戸籍法関連通達・通知.)
- jntajis.mj_shrink_candidates(in_, combo, limit=100)
- Parameters:
in (str) – The string to transliterate.
combo (int) – The transliteration scheme to use. Specify any combination of the members in
MJShrinkSchemeCombo.limit (int) – Maximum number of candidates to return. Specifying a negative number allows it to calculate all possible combinations, which may end up with memory exhaustion.
- Returns:
The list of possible transliteration forms built from the cartesian product of candidates for each character.
Transliteration based on NTA shrink mappings
- jntajis.jnta_shrink_translit(in_, replacement='\ufffe', passthrough=False)
Transliterate a Unicode string according to the NTA shrink mappings.
- Parameters:
in (str) – The string to transliterate.
replacement (str) – The characters that will be placed when the transliteration is not feasible.
passthrough (bool) – Instructs the transliterator to put the input character occurrence as is when the character does not exist in the mappings, instead of placing the replacement characters.
- Returns:
The transliterated characters.