Machine Translation

Machine-translation related modeling class

class pororo.tasks.machine_translation.PororoTranslationFactory(task: str, lang: str, model: Optional[str], tgt: Optional[str] = None)[source]

Bases: pororo.tasks.utils.base.PororoFactoryBase

Machine translation using Transformer models

Multi (transformer.large.multi.mtpg)

  • dataset: Train (Internal data) / Test (Multilingual TED Talk)

  • metric: BLEU score

    Source Language

    Target Language

    BLEU score

    Average

    X

    10.00

    English

    Korean

    15

    English

    Japanese

    8

    English

    Chinese

    8

    Korean

    English

    15

    Korean

    Japanese

    10

    Korean

    Chinese

    4

    Japanese

    English

    11

    Japanese

    Korean

    13

    Japanese

    Chinese

    4

    Chinese

    English

    16

    Chinese

    Korean

    10

    Chinese

    Japanese

    6

  • ref: http://www.cs.jhu.edu/~kevinduh/a/multitarget-tedtalks/

  • note: This result is about out of domain settings, TED Talk data wasn’t used during model training.

Multi (transformer.large.multi.fast.mtpg)

  • dataset: Train (Internal data) / Test (Multilingual TED Talk)

  • metric: BLEU score

    Source Language

    Target Language

    BLEU score

    Average

    X

    8.75

    English

    Korean

    13

    English

    Japanese

    6

    English

    Chinese

    7

    Korean

    English

    15

    Korean

    Japanese

    11

    Korean

    Chinese

    10

    Japanese

    English

    3

    Japanese

    Korean

    13

    Japanese

    Chinese

    4

    Chinese

    English

    15

    Chinese

    Korean

    8

    Chinese

    Japanese

    4

  • ref: http://www.cs.jhu.edu/~kevinduh/a/multitarget-tedtalks/

  • note: This result is about out of domain settings, TED Talk data wasn’t used during model training.

Parameters
  • text (str) – input text to be translated

  • beam (int) – beam search size

  • temperature (float) – temperature scale

  • top_k (int) – top-K sampling vocabulary size

  • top_p (float) – top-p sampling ratio

  • no_repeat_ngram_size (int) – no repeat ngram size

  • len_penalty (float) – length penalty ratio

Returns

machine translated sentence

Return type

str

Examples

>>> mt = Pororo(task="translation", lang="multi")
>>> mt("케빈은 아직도 일을 하고 있다.", src="ko", tgt="en")
'Kevin is still working.'
>>> mt("死神は りんごしか食べない。", src="ja", tgt="ko")
'사신은 사과밖에 먹지 않는다.'
>>> mt("人生的伟大目标,不是知识而是行动。", src="zh", tgt="ko")
'인생의 위대한 목표는 지식이 아니라 행동이다.'
static get_available_langs()[source]
static get_available_models()[source]
load(device: str)[source]

Load user-selected task-specific model

Parameters

device (str) – device information

Returns

User-selected task-specific model

Return type

object

class pororo.tasks.machine_translation.PororoTransformerTransMulti(model, config, tokenizer, sent_tokenizer, langtok_style)[source]

Bases: pororo.tasks.utils.base.PororoGenerationBase

predict(text: str, src: str, tgt: str, beam: int = 5, temperature: float = 1.0, top_k: int = - 1, top_p: float = - 1, no_repeat_ngram_size: int = 4, len_penalty: float = 1.0, **kwargs)str[source]

Conduct machine translation

Parameters
  • text (str) – input text to be translated

  • beam (int) – beam search size

  • temperature (float) – temperature scale

  • top_k (int) – top-K sampling vocabulary size

  • top_p (float) – top-p sampling ratio

  • no_repeat_ngram_size (int) – no repeat ngram size

  • len_penalty (float) – length penalty ratio

Returns

machine translated sentence

Return type

str