Grammatical Error Correction¶

Grammatical Error Correction related modeling class

class pororo.tasks.grammatical_error_correction.PororoGecFactory(task: str, lang: str, model: Optional[str])[source]¶

Bases: pororo.tasks.utils.base.PororoFactoryBase

Grammatical error correction

English (transformer.base.en.gec)

dataset: FCE, W&I+LOCNESS

metric: TBU

English (transformer.base.en.char_gec)

dataset: xfspell

metric: TBU

ref: http://www.realworldnlpbook.com/blog/unreasonable-effectiveness-of-transformer-spell-checker.html

Korean (charbert.base.ko.spacing)

dataset: Internal data (based on Wikipedia)

metric: F1 (89.51)

Parameters

text (str) – input sentence to fix grammatical error
beam (int) – size of beam search
temperature (float) – temperature for sampling
top_k (int) – variable for top k sampling
top_p (float) – variable for top p sampling
no_repeat_ngram_size (int) – no repeat ngram size
len_penalty (float) – length penalty ratio

Examples

>>> gec = Pororo(task="gec", lang="en")
>>> gec("This apple are so sweet.")
"This apple is so sweet."
>>> gec("'I've love you, before I meet her!'")
"'I've loved you, before I met her!"
>>> # It works better if I use two modules in succession with `correct_spell` option
>>> # Of course, it requires more computation and time.
>>> gec("Travel by bus is exspensive , bored and annoying .") # bad result
'Travel by bus is exspensive, boring and annoying.'
>>> gec("Travel by bus is exspensive , bored and annoying .", correct_spell=True) # better result
'Travelling by bus is expensive, boring, and annoying.'
>>> spacing = Pororo(task="gec", lang="ko")
>>> spacing("카 카오브 레인에서는 무슨 일을 하 나 요?")
'카카오브레인에서는 무슨 일을 하나요?'
>>> spacing("아버지가방에들어간다.")
'아버지가 방에 들어간다.'

Notes

Korean error correction is beta version. It only supports spacing correction currently.

static get_available_langs()[source]¶

static get_available_models()[source]¶

load(device: str)[source]¶

Load user-selected task-specific model

Parameters: device (str) – device information
Returns: User-selected task-specific model
Return type: object

class pororo.tasks.grammatical_error_correction.PororoTransformerGecChar(model, config)[source]¶

Bases: pororo.tasks.utils.base.PororoGenerationBase

predict(text: str, beam: int = 5, temperature: float = 1.0, top_k: int = - 1, top_p: float = - 1, no_repeat_ngram_size: int = 4, len_penalty: float = 1.0)[source]¶

Conduct grammar error correction

Parameters

text (str) – input sentence
beam (int) – beam search size
temperature (float) – temperature scale
top_k (int) – top-K sampling vocabulary size
top_p (float) – top-p sampling ratio
no_repeat_ngram_size (int) – no repeat ngram size
len_penalty (float) – length penalty ratio

Returns

grammartically corrected sentence

Return type

str

class pororo.tasks.grammatical_error_correction.PororoTransformerGec(model, tokenizer, device, config)[source]¶

Bases: pororo.tasks.utils.base.PororoGenerationBase

predict(text: str, beam: int = 5, temperature: float = 1.0, top_k: int = - 1, top_p: float = - 1, no_repeat_ngram_size: int = 4, len_penalty: float = 1.0, **kwargs)[source]¶

Conduct grammar error correction

Parameters

text (str) – input sentence
beam (int) – beam search size
temperature (float) – temperature scale
top_k (int) – top-K sampling vocabulary size
top_p (float) – top-p sampling ratio
no_repeat_ngram_size (int) – no repeat ngram size
len_penalty (float) – length penalty ratio

Returns

grammartically corrected sentence

Return type

str

Examples

>>> gec = Pororo(task="gec", model="transformer.base.en.gec", lang="en")
>>> gec("This apple are so sweet.")
"This apple is so sweet."
>>> gec("'I've love you, before I meet her!'")
"'I've loved you, before I met her!"

class pororo.tasks.grammatical_error_correction.PororoBertSpacing(model, config)[source]¶

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(text: str, **kwargs) → Union[List[str], str][source]¶

Conduct spacing correction

Parameters: text – (str) sentence to be spacing error corrected
Returns: spacing error corrected sentence
Return type: str