JNTA converter functions and classes

class jntajis.ConversionMode

Specifies the encoding conversion mode.

SISO

Instructs it to encode the given string into JIS X 0213 with the ISO 2022 escape sequences SI (\x0e) and SO (\x0f) for the extended plane selection.

MEN1

Instructs it to encode the given string into JIS X 0213 characters designated in the primary plane, which would theoretically contain all the JIS X 0208 level 1 and 2 characters in addition to some level 3 and 4 characters. Characters belonging to the extended plane will result in conversion failure.

JISX0208

Instructs it to encode the given string into JIS X 0208 level 1 and 2 characters. Non-0208 characters will result in conversion failure.

JISX0208_TRANSLIT

Instructs it to encode the given string into JIS X 0208 level 1 and 2 characters. Non-0208 characters will be tried the transliteration against.

jntajis.jnta_encode(encoding, in_, conv_mode)

Encode a given Unicode string into JIS X 0208:1997 / JIS X 0213:2012.

Parameters:
  • encoding (str) – The encoding name that is to appear in UnicodeEncodeError.

  • in (str) – The string to encode.

  • conv_mode (int) – The conversion mode. For the possible values, refer to ConversionMode.

Returns:

The encoded JIS character sequence.

jntajis.jnta_decode(encoding, in_)

Decode a given JIS character sequence into a Unicode string.

Parameters:
  • encoding (str) – The encoding name that is to appear in UnicodeDecodeError.

  • in (bytes) – The encoded JIS characters.

Returns:

The decoded Unicode string.

class jntajis.IncrementalEncoder(encoding, conv_mode)

An IncrementalEncoder implementation.

For the description of each method, please see the Python’s codec documentation.

Parameters:
  • encoding (str) – The encoding name that is to appear in UnicodeEncodeError.

  • conv_mode (int) – The conversion mode. For the possible values, refer to ConversionMode.

encode(in_, final)
reset()
getstate()
setstate(state)

Transliteration functions

Transliteration based on the MJ character table and MJ shrink conversion map

The MJ character table (MJ文字一覧表) defines a vast set of Kanji (漢字) characters used in information processing of Japanese texts initially developed by Information-technology Promotion Agency.

The MJ shrink conversion map (MJ縮退マップ) was also developed alongside for the sake of interoperability between MJ-aware systems and systems based on Unicode, which is used to transliterate complex, less-frequently-used character variants to commonly-used, more-used ones. It defines four different transliteration scheme, and you can specify any combinations of those by the flags defined in MJShrinkSchemeCombo.

class jntajis.MJShrinkSchemeCombo

Stores constants that specify the transliteration scheme.

JIS_INCORPORATION_UCS_UNIFICATION_RULE

Instructs it to transliterate the given characters according to JIS incorporation and UCS unification rule (a.k.a. JIS包摂規準・UCS統合規則) if applicable.

INFERENCE_BY_READING_AND_GLYPH

Instructs it to transliterate the given characters according to the CITPC-defined rule based on analogy from readings and glyphs of characters (読み・字形による類推.)

MOJ_NOTICE_582

Instructs it to transliterate the given characters according to the appendix table proposed in Japan Ministry of Justice (MOJ) notice no. 582 (法務省告示582号別表第四.)

Instructs it to transliterate the given characters according to the Family Register Act (戸籍法) and related MOJ notices (法務省戸籍法関連通達・通知.)

jntajis.mj_shrink_candidates(in_, combo, limit=100)
Parameters:
  • in (str) – The string to transliterate.

  • combo (int) – The transliteration scheme to use. Specify any combination of the members in MJShrinkSchemeCombo.

  • limit (int) – Maximum number of candidates to return. Specifying a negative number allows it to calculate all possible combinations, which may end up with memory exhaustion.

Returns:

The list of possible transliteration forms built from the cartesian product of candidates for each character.

Transliteration based on NTA shrink mappings

jntajis.jnta_shrink_translit(in_, replacement='\ufffe', passthrough=False)

Transliterate a Unicode string according to the NTA shrink mappings.

Parameters:
  • in (str) – The string to transliterate.

  • replacement (str) – The characters that will be placed when the transliteration is not feasible.

  • passthrough (bool) – Instructs the transliterator to put the input character occurrence as is when the character does not exist in the mappings, instead of placing the replacement characters.

Returns:

The transliterated characters.