Contextualized Embedding¶

Contextualized Embedding related modeling class

class pororo.tasks.contextualized_embedding.PororoContextualFactory(task: str, lang: str, model: Optional[str])[source]¶

Bases: pororo.tasks.utils.base.PororoFactoryBase

Conduct contextualized embedding

English (roberta.base.en)

dataset: N/A

metric: N/A

Korean (brainbert.base.ko)

dataset: N/A

metric: N/A

Japanese (jaberta.base.ja)

dataset: N/A

metric: N/A

Chinese (zhberta.base.zh)

dataset: N/A

metric: N/A

Parameters: sent (str) – input sentence to be contextualized embedded
Returns: sentence embedding with subword units
Return type: np.array

Examples

>>> cse = Pororo(task="cse", lang="ko")
>>> cse("하늘을 나는 새")
array([[92.53, 20.24, 32.32, ...],
    ...,
    [63.24, 53.19, 45.78, ...]], dtype=float32)  # (len(subwords), hidden_dim)
>>> cse = Pororo(task="cse", lang="zh")
>>> cse("一群人抬头看着建筑物屋顶边缘的3人。")
array([[ 0.61136365,  0.24613665,  0.6259908 , ...,  0.32798234,
        0.10512973, -0.06808531],...,
    [-0.00931012, -0.04459633,  1.0253953 , ...,  0.30732906,
    0.22213839,  0.25226325]], dtype=float32)
>>> cse = Pororo(task="cse", lang="ja")
>>> cse("おはようございます")
array([[-0.26724914, -0.23364174, -0.07206455, ...,  0.30293447,
        -0.36008322,  0.24684878], ...,
    [-0.7470922 , -0.30342472, -0.64015895, ..., -0.17556943,
        0.10660946, -0.17191087]], dtype=float32)

static get_available_langs()[source]¶

static get_available_models()[source]¶

load(device: str)[source]¶

Load user-selected task-specific model

Parameters: device (str) – device information
Returns: User-selected task-specific model
Return type: object

class pororo.tasks.contextualized_embedding.PororoBertContextualized(model, config, device)[source]¶

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str, **kwargs)[source]¶

Conduct contextualized embedding

Parameters: sent (str) – input sentence to be contextualized embedded
Returns: sentence embedding with subword units
Return type: np.array