Speech Synthesis¶

Speech Synthesis related modeling class

class pororo.tasks.speech_synthesis.PororoTtsFactory(task: str, lang: str = 'multi', model: Optional[str] = None)[source]¶

Bases: pororo.tasks.utils.base.PororoFactoryBase

Synthesis text to speech using trained model Output audio’s sample rate is 22050

Multi (tacotron)

dataset: TBU

metric: TBU

Parameters

text (str) – text for speech synthesis
lang (str) – text’s language Ex) how are you?: en, 안녕하세요.: ko
speaker (str) – designate a speaker such as ko, en, zh etc.. (default: lang)

Returns

waveform of speech signal

Return type

ndarray

Examples

>>> import IPython
>>> from IPython.display import Audio
>>> model = Pororo(task="tts", lang="multi")
>>> # Typical TTS
>>> wave = model("how are you?", lang="en")
>>> IPython.display.display(IPython.display.Audio(data=wave, rate=22050))
>>> # Voice Style Transfer
>>> model = Pororo(task="tts", lang="multi")
>>> wave = model("저는 미국 사람이에요.", lang="ko", speaker="en")
>>> IPython.display.display(IPython.display.Audio(data=wave, rate=22050))
>>> # Code-Switching
>>> wave = model("저는 미국 사람이에요.", lang="ko", speaker="en-15,ko")
>>> IPython.display.Audio(data=wave, rate=22050)

Notes

Currently 11 languages supports. Supported Languages: English, Korean, Japanese, Chinese, Jejueo, Dutch, German, Spanish, French, Russian, Finnish This task can designate a speaker such as ko, en, zh etc.

static get_available_langs()[source]¶

static get_available_models()[source]¶

load(device: str)[source]¶

Load user-selected task-specific model

Parameters: device (str) – device information
Returns: User-selected task-specific model
Return type: object

class pororo.tasks.speech_synthesis.PororoTTS(synthesizer, device, romanize, jejueo_romanize, convert_from_numerical_pinyin, config)[source]¶

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(text: str, speaker: str) → numpy.ndarray[source]¶

Conduct speech synthesis on given text

Parameters

text (str) – text for tts
speaker (speaker) – designation of speaker

Returns

waveform of speech signal

Return type

ndarray