Collocation

Collocation related modeling class

class pororo.tasks.collocation.PororoCollocationFactory(task: str, lang: str, model: Optional[str])[source]

Bases: pororo.tasks.utils.base.PororoFactoryBase

Conduct collocation search using index file

English (collocate.en)

  • dataset: enwiki-20180420

  • metric: N/A

Korean (kollocate)

  • dataset: kowiki-20200720

  • metric: N/A

Chinese (collocate.zh)

  • dataset: zhwiki-20180420

  • metric: N/A

Japanse (collocate.ja)

  • dataset: jawiki-20180420

  • metric: N/A

Parameters

text (str) – text to be inputted for collocation search

Returns

searched collocation splitted by part of speech

Return type

dict

Examples

>>> col = Pororo(task="col", lang="ko")
>>> col("먹")
먹 as verb
noun 것(39), 수(29), 음식(23), 등(16), 고기(14), ..
verb 하(33), 않(21), 살(17), 즐기(11), 굽(9), ..
adverb 많이(10), 주로(7), 다(5), 같이(4), 잘(4), ...
determiner 다른(5), 그(2), 여러(1), 세(1), 몇몇(1), 새(1)
adjective 싶(5), 어리(1), 편하(1), 작(1), 좋(1), 손쉽(1), 못하(1)
먹 as noun
noun 붓(3), 종이(2), 묘선(1), 청자(1), 은장도(1), 제조(1), ..
verb 의하(1), 그리(1), 찍(1), 차(1), 늘어놓(1)
adverb 하지만(1)
>>> col = Pororo(task="collocation", lang="ja")
>>> col("東京")
{'noun': {'noun': [('都', 137), ('家', 21), ('年', 18), ('府', 17), ('市', 12), ('式', 12), ('デザイナー', 10), ('日', 10), ('都立', 9), ('県', 9), ('出身', 8), ('証券', 8), ('後', 6)]}}
>>> col = Pororo(task="col", lang="en")
>>> col("george")
{'noun': {'noun': [('washington', 13), ('gen.', 7)]}}
>>> col = Pororo(task="col", lang="zh")
>>> col("世界杯")
{'noun': {'noun': [('2002年', 72), ('足球赛', 71), ('冠军', 53), ('2006年', 39), ('決賽', 35), ('决赛', 30), ('1998年', 26), ('外圍賽', 25), ('2010年', 23), ('2018年', 22), ('冠軍', 21), ...}}
static get_available_langs()[source]
static get_available_models()[source]
load(device: str)[source]

Load user-selected task-specific model

Parameters

device (str) – device information

Returns

User-selected task-specific model

Return type

object

class pororo.tasks.collocation.PororoCollocate(model, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(text: str, **kwargs)str[source]

Conduct collocation search using index file

Parameters

text (str) – text to be inputted for collocation search

Returns

searched collocation splitted by part of speech

Return type

dict