Sentence Embedding¶
Sentence Embedding related modeling class
-
class
pororo.tasks.sentence_embedding.
PororoSentenceFactory
(task: str, lang: str, model: Optional[str])[source]¶ Bases:
pororo.tasks.utils.base.PororoFactoryBase
Sentence embedding based embedding vector
English (stsb-roberta-base, stsb-roberta-large, stsb-bert-base, stsb-bert-large, stsb-distllbert-base)
dataset: N/A
metric : N/A
Korean (brainsbert.base.ko.kornli.korsts)
dataset: N/A
metric: N/A
Japanese (jasbert.base.ja.nli.sts)
dataset: N/A
metric: N/A
Chinese (zhsbert.base.zh.nli.sts)
dataset: N/A
metric: N/A
Examples
>>> se = Pororo(task="sentence_embedding", lang="ko") >>> se("나는 동물을 좋아하는 사람이야") [128.78, 200.12, 245.321, ...] # (1, hidden dim)
-
class
pororo.tasks.sentence_embedding.
PororoSBertSentence
(model, config)[source]¶ Bases:
pororo.tasks.utils.base.PororoSimpleBase
-
find_similar_sentences
(query: str, cands: List[str]) → Dict[source]¶ Conduct find similar sentences
- Parameters
- Returns
list of tuple containing candidate sentence and its score
- Return type
Examples
>>> se = Pororo(task="sentence_embedding") >>> query = "He is the tallest person in the world" >>> cands = [ >>> "I hate this guy.", >>> "You are so lovely!.", >>> "Tom is taller than Jim." >>> ] >>> se.find_similar_sentences(query, cands) { 'query': 'He is the tallest person in the world', 'ranking': [(2, 'Tom is taller than Jim.', 0.49), (1, 'You are so lovely!.', 0.47), (0, 'I hate this guy.', 0.22)] } >>> se = Pororo(task="sentence_embedding", lang="ko") >>> query = "고양이가 창 밖을 바라본다" >>> cands = [ >>> "고양이가 카메라를 켠다", >>> "남자와 여자가 걷고 있다", >>> "고양이가 개를 만지려 하고 있다", >>> "두 마리의 고양이가 창문을 보고 있다", >>> "테이블 위에 앉아 있는 고양이가 창밖을 내다보고 있다", >>> "창밖을 내다보는 고양이" >>> ] >>> se.find_similar_sentences(query, cands) { 'query': '고양이가 창 밖을 바라본다', 'ranking': [(5, '창밖을 내다보는 고양이', 0.93), (4, '테이블 위에 앉아 있는 고양이가 창밖을 내다보고 있다', 0.91), (3, '두 마리의 고양이가 창문을 보고 있다', 0.78), (0, '고양이가 카메라를 켠다', 0.74), (2, '고양이가 개를 만지려 하고 있다', 0.41)] } >>> se = Pororo(task="sentence_embedding", lang="ja") >>> query = "おはようございます" # Good morning >>> cands = ["こんにちは", "失礼します", "こんばんは"] # Hello | Please Excuse Me (for Leaving) | Good evening >>> se.find_similar_sentences(query, cands) { 'query': 'おはようございます', 'ranking': [(0, 'こんにちは', 0.58), (2, 'こんばんは', 0.48), (1, '失礼します', 0.27)] } >>> se = Pororo(task="sentence_embedding", lang="zh") >>> query = "欢迎光临" # Welcome >>> cands = ["你好。", "你会说英语吗?", "洗手间在哪里?"] # Hello | Do you speak English? | Where is the bathroom? >>> se.find_similar_sentences(query, cands) { 'query': '欢迎光临', 'ranking': [(0, '你好。', 0.53), (2, '洗手间在哪里?', 0.2), (1, '你会说英语吗?', 0.09)] }
-