Grammatical Error Correction

Grammatical Error Correction related modeling class

class pororo.tasks.grammatical_error_correction.PororoGecFactory(task: str, lang: str, model: Optional[str])[source]

Bases: pororo.tasks.utils.base.PororoFactoryBase

Grammatical error correction

English (transformer.base.en.gec)

  • dataset: FCE, W&I+LOCNESS

  • metric: TBU

English (transformer.base.en.char_gec)

Korean (charbert.base.ko.spacing)

  • dataset: Internal data (based on Wikipedia)

  • metric: F1 (89.51)

Parameters
  • text (str) – input sentence to fix grammatical error

  • beam (int) – size of beam search

  • temperature (float) – temperature for sampling

  • top_k (int) – variable for top k sampling

  • top_p (float) – variable for top p sampling

  • no_repeat_ngram_size (int) – no repeat ngram size

  • len_penalty (float) – length penalty ratio

Examples

>>> gec = Pororo(task="gec", lang="en")
>>> gec("This apple are so sweet.")
"This apple is so sweet."
>>> gec("'I've love you, before I meet her!'")
"'I've loved you, before I met her!"
>>> # It works better if I use two modules in succession with `correct_spell` option
>>> # Of course, it requires more computation and time.
>>> gec("Travel by bus is exspensive , bored and annoying .") # bad result
'Travel by bus is exspensive, boring and annoying.'
>>> gec("Travel by bus is exspensive , bored and annoying .", correct_spell=True) # better result
'Travelling by bus is expensive, boring, and annoying.'
>>> spacing = Pororo(task="gec", lang="ko")
>>> spacing("카 카오브 레인에서는 무슨 일을 하 나 요?")
'카카오브레인에서는 무슨 일을 하나요?'
>>> spacing("아버지가방에들어간다.")
'아버지가 방에 들어간다.'

Notes

Korean error correction is beta version. It only supports spacing correction currently.

static get_available_langs()[source]
static get_available_models()[source]
load(device: str)[source]

Load user-selected task-specific model

Parameters

device (str) – device information

Returns

User-selected task-specific model

Return type

object

class pororo.tasks.grammatical_error_correction.PororoTransformerGecChar(model, config)[source]

Bases: pororo.tasks.utils.base.PororoGenerationBase

predict(text: str, beam: int = 5, temperature: float = 1.0, top_k: int = - 1, top_p: float = - 1, no_repeat_ngram_size: int = 4, len_penalty: float = 1.0)[source]

Conduct grammar error correction

Parameters
  • text (str) – input sentence

  • beam (int) – beam search size

  • temperature (float) – temperature scale

  • top_k (int) – top-K sampling vocabulary size

  • top_p (float) – top-p sampling ratio

  • no_repeat_ngram_size (int) – no repeat ngram size

  • len_penalty (float) – length penalty ratio

Returns

grammartically corrected sentence

Return type

str

class pororo.tasks.grammatical_error_correction.PororoTransformerGec(model, tokenizer, device, config)[source]

Bases: pororo.tasks.utils.base.PororoGenerationBase

predict(text: str, beam: int = 5, temperature: float = 1.0, top_k: int = - 1, top_p: float = - 1, no_repeat_ngram_size: int = 4, len_penalty: float = 1.0, **kwargs)[source]

Conduct grammar error correction

Parameters
  • text (str) – input sentence

  • beam (int) – beam search size

  • temperature (float) – temperature scale

  • top_k (int) – top-K sampling vocabulary size

  • top_p (float) – top-p sampling ratio

  • no_repeat_ngram_size (int) – no repeat ngram size

  • len_penalty (float) – length penalty ratio

Returns

grammartically corrected sentence

Return type

str

Examples

>>> gec = Pororo(task="gec", model="transformer.base.en.gec", lang="en")
>>> gec("This apple are so sweet.")
"This apple is so sweet."
>>> gec("'I've love you, before I meet her!'")
"'I've loved you, before I met her!"
class pororo.tasks.grammatical_error_correction.PororoBertSpacing(model, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(text: str, **kwargs) → Union[List[str], str][source]

Conduct spacing correction

Parameters

text – (str) sentence to be spacing error corrected

Returns

spacing error corrected sentence

Return type

str