JNTA converter functions and classes

class jntajis.ConversionMode

Specifies the encoding conversion mode.

SISO: Instructs it to encode the given string into JIS X 0213 with the ISO 2022 escape sequences SI (\x0e) and SO (\x0f) for the extended plane selection.

MEN1: Instructs it to encode the given string into JIS X 0213 characters designated in the primary plane, which would theoretically contain all the JIS X 0208 level 1 and 2 characters in addition to some level 3 and 4 characters. Characters belonging to the extended plane will result in conversion failure.

JISX0208: Instructs it to encode the given string into JIS X 0208 level 1 and 2 characters. Non-0208 characters will result in conversion failure.

JISX0208_TRANSLIT: Instructs it to encode the given string into JIS X 0208 level 1 and 2 characters. Non-0208 characters will be tried the transliteration against.

jntajis.jnta_encode(encoding, in_, conv_mode)

Encode a given Unicode string into JIS X 0208:1997 / JIS X 0213:2012.

Parameters:

encoding (str) – The encoding name that is to appear in UnicodeEncodeError.
in (str) – The string to encode.
conv_mode (int) – The conversion mode. For the possible values, refer to ConversionMode.

Returns:

The encoded JIS character sequence.

jntajis.jnta_decode(encoding, in_)

Decode a given JIS character sequence into a Unicode string.

Parameters:

encoding (str) – The encoding name that is to appear in UnicodeDecodeError.
in (bytes) – The encoded JIS characters.

Returns:

The decoded Unicode string.

class jntajis.IncrementalEncoder(encoding, conv_mode)

An IncrementalEncoder implementation.

For the description of each method, please see the Python’s codec documentation.

Parameters:

encoding (str) – The encoding name that is to appear in UnicodeEncodeError.
conv_mode (int) – The conversion mode. For the possible values, refer to ConversionMode.

encode(in_, final)

reset()

getstate()

setstate(state)

Transliteration functions

Transliteration based on the MJ character table and MJ shrink conversion map

The MJ character table (MJ文字一覧表) defines a vast set of Kanji (漢字) characters used in information processing of Japanese texts initially developed by Information-technology Promotion Agency.

The MJ shrink conversion map (MJ縮退マップ) was also developed alongside for the sake of interoperability between MJ-aware systems and systems based on Unicode, which is used to transliterate complex, less-frequently-used character variants to commonly-used, more-used ones. It defines four different transliteration scheme, and you can specify any combinations of those by the flags defined in MJShrinkSchemeCombo.

class jntajis.MJShrinkSchemeCombo

Stores constants that specify the transliteration scheme.

JIS_INCORPORATION_UCS_UNIFICATION_RULE: Instructs it to transliterate the given characters according to JIS incorporation and UCS unification rule (a.k.a. JIS包摂規準・UCS統合規則) if applicable.

INFERENCE_BY_READING_AND_GLYPH: Instructs it to transliterate the given characters according to the CITPC-defined rule based on analogy from readings and glyphs of characters (読み・字形による類推.)

MOJ_NOTICE_582: Instructs it to transliterate the given characters according to the appendix table proposed in Japan Ministry of Justice (MOJ) notice no. 582 (法務省告示582号別表第四.)

MOJ_FAMILY_REGISTER_ACT_RELATED_NOTICE: Instructs it to transliterate the given characters according to the Family Register Act (戸籍法) and related MOJ notices (法務省戸籍法関連通達・通知.)

jntajis.mj_shrink_candidates(in_, combo, limit=100)

Parameters:

in (str) – The string to transliterate.
combo (int) – The transliteration scheme to use. Specify any combination of the members in MJShrinkSchemeCombo.
limit (int) – Maximum number of candidates to return. Specifying a negative number allows it to calculate all possible combinations, which may end up with memory exhaustion.

Returns:

The list of possible transliteration forms built from the cartesian product of candidates for each character.

Transliteration based on NTA shrink mappings

jntajis.jnta_shrink_translit(in_, replacement='\ufffe', passthrough=False)

Transliterate a Unicode string according to the NTA shrink mappings.

Parameters:

in (str) – The string to transliterate.
replacement (str) – The characters that will be placed when the transliteration is not feasible.
passthrough (bool) – Instructs the transliterator to put the input character occurrence as is when the character does not exist in the mappings, instead of placing the replacement characters.

Returns:

The transliterated characters.