Named Entity Recognition

Named Entity Recognition related modeling class

class pororo.tasks.named_entity_recognition.PororoNerFactory(task: str, lang: str, model: Optional[str])[source]

Bases: pororo.tasks.utils.base.PororoFactoryBase

Conduct named entity recognition

English (roberta.base.en.ner)

  • dataset: OntoNotes 5.0

  • metric: F1 (91.63)

Korean (charbert.base.ko.ner)

Japanese (jaberta.base.ja.ner)

Chinese (zhberta.base.zh.ner)

  • dataset: OntoNotes 5.0

  • metric: F1 (79.06)

Parameters

sent – (str) sentence to be sequence labeled

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]

Examples

>>> ner = Pororo(task="ner", lang="en)
>>> ner("It was in midfield where Arsenal took control of the game, and that was mainly down to Thomas Partey and Mohamed Elneny.")
[('It', 'O'), ('was', 'O'), ('in', 'O'), ('midfield', 'O'), ('where', 'O'), ('Arsenal', 'ORG'), ('took', 'O'), ('control', 'O'), ('of', 'O'), ('the', 'O'), ('game', 'O'), (',', 'O'), ('and', 'O'), ('that', 'O'), ('was', 'O'), ('mainly', 'O'), ('down', 'O'), ('to', 'O'), ('Thomas Partey', 'PERSON'), ('and', 'O'), ('Mohamed Elneny', 'PERSON'), ('.', 'O')]
>>> ner = Pororo(task="ner", lang="ko")
>>> ner("손흥민은 28세의 183 센티미터, 77 킬로그램이며, 현재 주급은 약 3억 원이다.")
[('손흥민', 'PERSON'), ('은', 'O'), (' ', 'O'), ('28세', 'QUANTITY'), ('의', 'O'), (' ', 'O'), ('183 센티미터', 'QUANTITY'), (',', 'O'), (' ', 'O'), ('77 킬로그램', 'QUANTITY'), ('이며,', 'O'), (' ', 'O'), ('현재', 'O'), (' ', 'O'), ('주급은', 'O'), (' ', 'O'), ('약 3억 원', 'QUANTITY'), ('이다.', 'O')]
>>> # `apply_wsd` : for korean, you can use Word Sense Disambiguation module to get more specific tag
>>> ner("손흥민은 28세의 183 센티미터, 77 킬로그램이며, 현재 주급은 약 3억 원이다.", apply_wsd=True)
[('손흥민', 'PERSON'), ('은', 'O'), (' ', 'O'), ('28세', 'AGE'), ('의', 'O'), (' ', 'O'), ('183 센티미터', 'LENGTH/DISTANCE'), (',', 'O'), (' ', 'O'), ('77 킬로그램', 'WEIGHT'), ('이며,', 'O'), (' ', 'O'), ('현재', 'O'), (' ', 'O'), ('주급은', 'O'), (' ', 'O'), ('약 3억 원', 'MONEY'), ('이다.', 'O')]
>>> ner = Pororo(task="ner", lang="zh")
>>> ner("毛泽东(1893年12月26日-1976年9月9日),字润之,湖南湘潭人。中华民国大陆时期、中国共产党和中华人民共和国的重要政治家、经济家、军事家、战略家、外交家和诗人。")
[('毛泽东', 'PERSON'), ('(', 'O'), ('1893年12月26日-1976年9月9日', 'DATE'), (')', 'O'), (',', 'O'), ('字润之', 'O'), (',', 'O'), ('湖南', 'GPE'), ('湘潭', 'GPE'), ('人', 'O'), ('。', 'O'), ('中华民国大陆时期', 'GPE'), ('、', 'O'), ('中国共产党', 'ORG'), ('和', 'O'), ('中华人民共和国', 'GPE'), ('的', 'O'), ('重', 'O'), ('要', 'O'), ('政', 'O'), ('治', 'O'), ('家', 'O'), ('、', 'O'), ('经', 'O'), ('济', 'O'), ('家', 'O'), ('、', 'O'), ('军', 'O'), ('事', 'O'), ('家', 'O'), ('、', 'O'), ('战', 'O'), ('略', 'O'), ('家', 'O'), ('、', 'O'), ('外', 'O'), ('交', 'O'), ('家', 'O'), ('和', 'O'), ('诗', 'O'), ('人', 'O'), ('。', 'O')]
>>> ner = Pororo(task="ner", lang="ja")
>>> ner("豊臣 秀吉、または羽柴 秀吉は、戦国時代から安土桃山時代にかけての武将、大名。天下人、武家関白、太閤。三英傑の一人。")
[('豊臣秀吉', 'PERSON'), ('、', 'O'), ('または', 'O'), ('羽柴秀吉', 'PERSON'), ('は', 'O'), ('、', 'O'), ('戦国時代', 'DATE'), ('から', 'O'), ('安土桃山時代', 'DATE'), ('にかけて', 'O'), ('の', 'O'), ('武将', 'O'), ('、', 'O'), ('大名', 'O'), ('。', 'O'), ('天下', 'O'), ('人', 'O'), ('、', 'O'), ('武家', 'O'), ('関白', 'O'), ('、太閤', 'O'), ('。', 'O'), ('三', 'O'), ('英', 'O'), ('傑', 'O'), ('の', 'O'), ('一', 'O'), ('人', 'O'), ('。', 'O')]
static get_available_langs()[source]
static get_available_models()[source]
load(device)[source]

Load user-selected task-specific model

Parameters

device (str) – device information

Returns

User-selected task-specific model

Return type

object

class pororo.tasks.named_entity_recognition.PororoBertNerEn(model, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str, **kwargs)[source]

Conduct named entity recognition with english RoBERTa

Parameters

sent – (str) sentence to be sequence labeled

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]

class pororo.tasks.named_entity_recognition.PororoBertCharNer(model, sent_tokenizer, wsd_dict, device, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

apply_dict(tags: List[Tuple[str, str]])[source]

Apply pre-defined dictionary to get detail tag info

Parameters

tags (List[Tuple[str, str]]) – inference word-tag pair result

Returns

dict-applied result

Return type

List[Tuple[str, str]]

predict(text: str, **kwargs)[source]

Conduct named entity recognition with character BERT

Parameters
  • text – (str) sentence to be sequence labeled

  • apply_wsd – (bool) whether to apply wsd to get more specific label information

  • ignore_labels – (list) labels to be ignored

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]

class pororo.tasks.named_entity_recognition.PororoBertNerZh(model, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str, **kwargs)[source]

Conduct named entity recognition with Chinese RoBERTa

Parameters

sent – (str) sentence to be sequence labeled

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]

class pororo.tasks.named_entity_recognition.PororoBertNerJa(model, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str, **kwargs)[source]

Conduct named entity recognition with Japanese RoBERTa

Parameters

sent – (str) sentence to be sequence labeled

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]