Grammatical Error Correction¶
Grammatical Error Correction related modeling class
-
class
pororo.tasks.grammatical_error_correction.
PororoGecFactory
(task: str, lang: str, model: Optional[str])[source]¶ Bases:
pororo.tasks.utils.base.PororoFactoryBase
Grammatical error correction
English (transformer.base.en.gec)
dataset: FCE, W&I+LOCNESS
metric: TBU
English (transformer.base.en.char_gec)
dataset: xfspell
metric: TBU
ref: http://www.realworldnlpbook.com/blog/unreasonable-effectiveness-of-transformer-spell-checker.html
Korean (charbert.base.ko.spacing)
dataset: Internal data (based on Wikipedia)
metric: F1 (89.51)
- Parameters
text (str) – input sentence to fix grammatical error
beam (int) – size of beam search
temperature (float) – temperature for sampling
top_k (int) – variable for top k sampling
top_p (float) – variable for top p sampling
no_repeat_ngram_size (int) – no repeat ngram size
len_penalty (float) – length penalty ratio
Examples
>>> gec = Pororo(task="gec", lang="en") >>> gec("This apple are so sweet.") "This apple is so sweet." >>> gec("'I've love you, before I meet her!'") "'I've loved you, before I met her!" >>> # It works better if I use two modules in succession with `correct_spell` option >>> # Of course, it requires more computation and time. >>> gec("Travel by bus is exspensive , bored and annoying .") # bad result 'Travel by bus is exspensive, boring and annoying.' >>> gec("Travel by bus is exspensive , bored and annoying .", correct_spell=True) # better result 'Travelling by bus is expensive, boring, and annoying.' >>> spacing = Pororo(task="gec", lang="ko") >>> spacing("카 카오브 레인에서는 무슨 일을 하 나 요?") '카카오브레인에서는 무슨 일을 하나요?' >>> spacing("아버지가방에들어간다.") '아버지가 방에 들어간다.'
Notes
Korean error correction is beta version. It only supports spacing correction currently.
-
class
pororo.tasks.grammatical_error_correction.
PororoTransformerGecChar
(model, config)[source]¶ Bases:
pororo.tasks.utils.base.PororoGenerationBase
-
class
pororo.tasks.grammatical_error_correction.
PororoTransformerGec
(model, tokenizer, device, config)[source]¶ Bases:
pororo.tasks.utils.base.PororoGenerationBase
-
predict
(text: str, beam: int = 5, temperature: float = 1.0, top_k: int = - 1, top_p: float = - 1, no_repeat_ngram_size: int = 4, len_penalty: float = 1.0, **kwargs)[source]¶ Conduct grammar error correction
- Parameters
- Returns
grammartically corrected sentence
- Return type
Examples
>>> gec = Pororo(task="gec", model="transformer.base.en.gec", lang="en") >>> gec("This apple are so sweet.") "This apple is so sweet." >>> gec("'I've love you, before I meet her!'") "'I've loved you, before I met her!"
-