Phoneme-to-Grapheme

Phoneme to Grapheme related modeling class

class pororo.tasks.grapheme_conversion.PororoP2gFactory(task: str, lang: str, model: Optional[str])[source]

Bases: pororo.tasks.utils.base.PororoFactoryBase

Conduct phoneme to grapheme conversion

Japanese (p2g.ja)

  • dataset: jawiki-20180420 + romkan

  • metric: TBU

Chinese (p2g.zh)

  • dataset: zhwiki-20180420 + g2pM

  • metric: TBU

Examples

>>> p2g_zh = Pororo(task="p2g", lang="zh")
>>> p2g_zh(['ran2', 'er2', ',', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', ',', 'ta1', 'jing4', 'tui4', 'chu1', 'le5', 'da4', 'jia1', 'de5', 'shi4', 'xian4', '。'])
['然', '而', ',', '他', '红', '了', '20', '年', '乙', '后', ',', '他', '敬', '退', '出', '了', '大', '家', '的', '市', '县', '。']
>>> p2g_ja = Pororo(task="p2g", lang="ja")
>>> p2g_ja("python ga daisuki desu。")
pythonが大好きです。
static get_available_langs()[source]
static get_available_models()[source]
load(device: str)[source]

Load user-selected task-specific model

Parameters

device (str) – device information

Returns

User-selected task-specific model

Return type

object

class pororo.tasks.grapheme_conversion.PororoP2GZh(model, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str, **kwargs)str[source]

Conduct grapheme to phoneme conversion

Parameters

texts (List[str]) – list of graphemes

Returns

converted phoeme string list

Return type

List[str]

class pororo.tasks.grapheme_conversion.PororoP2GJa(model, config)[source]

Bases: pororo.tasks.utils.base.PororoGenerationBase

predict(text: str, beam: int = 5, temperature: float = 1.0, top_k: int = - 1, top_p: float = - 1, no_repeat_ngram_size: int = 4, len_penalty: float = 1.0, **kwargs)str[source]

Conduct paraphrase generation using Transformer Seq2Seq

Parameters
  • text (str) – input sentence

  • beam (int) – beam search size

  • temperature (float) – temperature scale

  • top_k (int) – top-K sampling vocabulary size

  • top_p (float) – top-p sampling ratio

  • no_repeat_ngram_size (int) – no repeat ngram size

  • len_penalty (float) – length penalty ratio

Returns

generated paraphrase

Return type

str