Named Entity Recognition¶

Named Entity Recognition related modeling class

class pororo.tasks.named_entity_recognition.PororoNerFactory(task: str, lang: str, model: Optional[str])[source]¶

Bases: pororo.tasks.utils.base.PororoFactoryBase

Conduct named entity recognition

English (roberta.base.en.ner)

dataset: OntoNotes 5.0

metric: F1 (91.63)

Korean (charbert.base.ko.ner)

dataset: https://corpus.korean.go.kr/ 개체명 분석 말뭉치

metric: F1 (89.63)

Japanese (jaberta.base.ja.ner)

dataset: Kyoto University Web Document Leads Corpus

metric: F1 (76.74)

ref: https://github.com/ku-nlp/KWDLC

Chinese (zhberta.base.zh.ner)

dataset: OntoNotes 5.0

metric: F1 (79.06)

Parameters: sent – (str) sentence to be sequence labeled
Returns: token and its predicted tag tuple list
Return type: List[Tuple[str, str]]

Examples

>>> ner = Pororo(task="ner", lang="en)
>>> ner("It was in midfield where Arsenal took control of the game, and that was mainly down to Thomas Partey and Mohamed Elneny.")
[('It', 'O'), ('was', 'O'), ('in', 'O'), ('midfield', 'O'), ('where', 'O'), ('Arsenal', 'ORG'), ('took', 'O'), ('control', 'O'), ('of', 'O'), ('the', 'O'), ('game', 'O'), (',', 'O'), ('and', 'O'), ('that', 'O'), ('was', 'O'), ('mainly', 'O'), ('down', 'O'), ('to', 'O'), ('Thomas Partey', 'PERSON'), ('and', 'O'), ('Mohamed Elneny', 'PERSON'), ('.', 'O')]
>>> ner = Pororo(task="ner", lang="ko")
>>> ner("손흥민은 28세의 183 센티미터, 77 킬로그램이며, 현재 주급은 약 3억 원이다.")
[('손흥민', 'PERSON'), ('은', 'O'), (' ', 'O'), ('28세', 'QUANTITY'), ('의', 'O'), (' ', 'O'), ('183 센티미터', 'QUANTITY'), (',', 'O'), (' ', 'O'), ('77 킬로그램', 'QUANTITY'), ('이며,', 'O'), (' ', 'O'), ('현재', 'O'), (' ', 'O'), ('주급은', 'O'), (' ', 'O'), ('약 3억 원', 'QUANTITY'), ('이다.', 'O')]
>>> # `apply_wsd` : for korean, you can use Word Sense Disambiguation module to get more specific tag
>>> ner("손흥민은 28세의 183 센티미터, 77 킬로그램이며, 현재 주급은 약 3억 원이다.", apply_wsd=True)
[('손흥민', 'PERSON'), ('은', 'O'), (' ', 'O'), ('28세', 'AGE'), ('의', 'O'), (' ', 'O'), ('183 센티미터', 'LENGTH/DISTANCE'), (',', 'O'), (' ', 'O'), ('77 킬로그램', 'WEIGHT'), ('이며,', 'O'), (' ', 'O'), ('현재', 'O'), (' ', 'O'), ('주급은', 'O'), (' ', 'O'), ('약 3억 원', 'MONEY'), ('이다.', 'O')]
>>> ner = Pororo(task="ner", lang="zh")
>>> ner("毛泽东（1893年12月26日－1976年9月9日），字润之，湖南湘潭人。中华民国大陆时期、中国共产党和中华人民共和国的重要政治家、经济家、军事家、战略家、外交家和诗人。")
[('毛泽东', 'PERSON'), ('（', 'O'), ('1893年12月26日－1976年9月9日', 'DATE'), ('）', 'O'), ('，', 'O'), ('字润之', 'O'), ('，', 'O'), ('湖南', 'GPE'), ('湘潭', 'GPE'), ('人', 'O'), ('。', 'O'), ('中华民国大陆时期', 'GPE'), ('、', 'O'), ('中国共产党', 'ORG'), ('和', 'O'), ('中华人民共和国', 'GPE'), ('的', 'O'), ('重', 'O'), ('要', 'O'), ('政', 'O'), ('治', 'O'), ('家', 'O'), ('、', 'O'), ('经', 'O'), ('济', 'O'), ('家', 'O'), ('、', 'O'), ('军', 'O'), ('事', 'O'), ('家', 'O'), ('、', 'O'), ('战', 'O'), ('略', 'O'), ('家', 'O'), ('、', 'O'), ('外', 'O'), ('交', 'O'), ('家', 'O'), ('和', 'O'), ('诗', 'O'), ('人', 'O'), ('。', 'O')]
>>> ner = Pororo(task="ner", lang="ja")
>>> ner("豊臣 秀吉、または羽柴 秀吉は、戦国時代から安土桃山時代にかけての武将、大名。天下人、武家関白、太閤。三英傑の一人。")
[('豊臣秀吉', 'PERSON'), ('、', 'O'), ('または', 'O'), ('羽柴秀吉', 'PERSON'), ('は', 'O'), ('、', 'O'), ('戦国時代', 'DATE'), ('から', 'O'), ('安土桃山時代', 'DATE'), ('にかけて', 'O'), ('の', 'O'), ('武将', 'O'), ('、', 'O'), ('大名', 'O'), ('。', 'O'), ('天下', 'O'), ('人', 'O'), ('、', 'O'), ('武家', 'O'), ('関白', 'O'), ('、太閤', 'O'), ('。', 'O'), ('三', 'O'), ('英', 'O'), ('傑', 'O'), ('の', 'O'), ('一', 'O'), ('人', 'O'), ('。', 'O')]

static get_available_langs()[source]¶

static get_available_models()[source]¶

load(device)[source]¶

Load user-selected task-specific model

Parameters: device (str) – device information
Returns: User-selected task-specific model
Return type: object

class pororo.tasks.named_entity_recognition.PororoBertNerEn(model, config)[source]¶

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str, **kwargs)[source]¶

Conduct named entity recognition with english RoBERTa

Parameters: sent – (str) sentence to be sequence labeled
Returns: token and its predicted tag tuple list
Return type: List[Tuple[str, str]]

class pororo.tasks.named_entity_recognition.PororoBertCharNer(model, sent_tokenizer, wsd_dict, device, config)[source]¶

Bases: pororo.tasks.utils.base.PororoSimpleBase

apply_dict(tags: List[Tuple[str, str]])[source]¶

Apply pre-defined dictionary to get detail tag info

Parameters: tags (List[Tuple[str, str]]) – inference word-tag pair result
Returns: dict-applied result
Return type: List[Tuple[str, str]]

predict(text: str, **kwargs)[source]¶

Conduct named entity recognition with character BERT

Parameters

text – (str) sentence to be sequence labeled
apply_wsd – (bool) whether to apply wsd to get more specific label information
ignore_labels – (list) labels to be ignored

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]

class pororo.tasks.named_entity_recognition.PororoBertNerZh(model, config)[source]¶

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str, **kwargs)[source]¶

Conduct named entity recognition with Chinese RoBERTa

Parameters: sent – (str) sentence to be sequence labeled
Returns: token and its predicted tag tuple list
Return type: List[Tuple[str, str]]

class pororo.tasks.named_entity_recognition.PororoBertNerJa(model, config)[source]¶

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str, **kwargs)[source]¶

Conduct named entity recognition with Japanese RoBERTa

Parameters: sent – (str) sentence to be sequence labeled
Returns: token and its predicted tag tuple list
Return type: List[Tuple[str, str]]