Sentence Embedding

Sentence Embedding related modeling class

class pororo.tasks.sentence_embedding.PororoSentenceFactory(task: str, lang: str, model: Optional[str])[source]

Bases: pororo.tasks.utils.base.PororoFactoryBase

Sentence embedding based embedding vector

English (stsb-roberta-base, stsb-roberta-large, stsb-bert-base, stsb-bert-large, stsb-distllbert-base)

  • dataset: N/A

  • metric : N/A

Korean (brainsbert.base.ko.kornli.korsts)

  • dataset: N/A

  • metric: N/A

Japanese (jasbert.base.ja.nli.sts)

  • dataset: N/A

  • metric: N/A

Chinese (zhsbert.base.zh.nli.sts)

  • dataset: N/A

  • metric: N/A

Examples

>>> se = Pororo(task="sentence_embedding", lang="ko")
>>> se("나는 동물을 좋아하는 사람이야")
[128.78, 200.12, 245.321, ...]  # (1, hidden dim)
static get_available_langs()[source]
static get_available_models()[source]
load(device: str)[source]

Load user-selected task-specific model

Parameters

device (str) – device information

Returns

User-selected task-specific model

Return type

object

class pororo.tasks.sentence_embedding.PororoSBertSentence(model, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

find_similar_sentences(query: str, cands: List[str]) → Dict[source]

Conduct find similar sentences

Parameters
  • query (str) – query sentence to be acted as anchor

  • cands (List[str]) – candidate sentences to be compared

Returns

list of tuple containing candidate sentence and its score

Return type

Dict[str, List[Tuple[str, float]]]

Examples

>>> se = Pororo(task="sentence_embedding")
>>> query = "He is the tallest person in the world"
>>> cands = [
>>>     "I hate this guy.",
>>>     "You are so lovely!.",
>>>     "Tom is taller than Jim."
>>> ]
>>> se.find_similar_sentences(query, cands)
{
    'query': 'He is the tallest person in the world',
    'ranking': [(2, 'Tom is taller than Jim.', 0.49), (1, 'You are so lovely!.', 0.47), (0, 'I hate this guy.', 0.22)]
}
>>> se = Pororo(task="sentence_embedding", lang="ko")
>>> query = "고양이가 창 밖을 바라본다"
>>> cands = [
>>>    "고양이가 카메라를 켠다",
>>>    "남자와 여자가 걷고 있다",
>>>    "고양이가 개를 만지려 하고 있다",
>>>    "두 마리의 고양이가 창문을 보고 있다",
>>>    "테이블 위에 앉아 있는 고양이가 창밖을 내다보고 있다",
>>>    "창밖을 내다보는 고양이"
>>> ]
>>> se.find_similar_sentences(query, cands)
{
    'query': '고양이가 창 밖을 바라본다',
     'ranking': [(5, '창밖을 내다보는 고양이', 0.93), (4, '테이블 위에 앉아 있는 고양이가 창밖을 내다보고 있다', 0.91), (3, '두 마리의 고양이가 창문을 보고 있다', 0.78), (0, '고양이가 카메라를 켠다', 0.74), (2, '고양이가 개를 만지려 하고 있다', 0.41)]
}
>>> se = Pororo(task="sentence_embedding", lang="ja")
>>> query = "おはようございます"  # Good morning
>>> cands = ["こんにちは", "失礼します", "こんばんは"]  # Hello | Please Excuse Me (for Leaving) | Good evening
>>> se.find_similar_sentences(query, cands)
{
    'query': 'おはようございます',
    'ranking': [(0, 'こんにちは', 0.58), (2, 'こんばんは', 0.48), (1, '失礼します', 0.27)]
}
>>> se = Pororo(task="sentence_embedding", lang="zh")
>>> query = "欢迎光临"  # Welcome
>>> cands = ["你好。", "你会说英语吗?", "洗手间在哪里?"]  # Hello | Do you speak English? | Where is the bathroom?
>>> se.find_similar_sentences(query, cands)
{
    'query': '欢迎光临',
    'ranking': [(0, '你好。', 0.53), (2, '洗手间在哪里?', 0.2), (1, '你会说英语吗?', 0.09)]
}
predict(sent: str, **kwargs)[source]

Conduct sentence embedding

Parameters

sent (str) – input sentence to be sentence embedded

Returns

embedded sentence array

Return type

np.array