Speech Synthesis

Speech Synthesis related modeling class

class pororo.tasks.speech_synthesis.PororoTtsFactory(task: str, lang: str = 'multi', model: Optional[str] = None)[source]

Bases: pororo.tasks.utils.base.PororoFactoryBase

Synthesis text to speech using trained model Output audio’s sample rate is 22050

Multi (tacotron)

  • dataset: TBU

  • metric: TBU

Parameters
  • text (str) – text for speech synthesis

  • lang (str) – text’s language Ex) how are you?: en, 안녕하세요.: ko

  • speaker (str) – designate a speaker such as ko, en, zh etc.. (default: lang)

Returns

waveform of speech signal

Return type

ndarray

Examples

>>> import IPython
>>> from IPython.display import Audio
>>> model = Pororo(task="tts", lang="multi")
>>> # Typical TTS
>>> wave = model("how are you?", lang="en")
>>> IPython.display.display(IPython.display.Audio(data=wave, rate=22050))
>>> # Voice Style Transfer
>>> model = Pororo(task="tts", lang="multi")
>>> wave = model("저는 미국 사람이에요.", lang="ko", speaker="en")
>>> IPython.display.display(IPython.display.Audio(data=wave, rate=22050))
>>> # Code-Switching
>>> wave = model("저는 미국 사람이에요.", lang="ko", speaker="en-15,ko")
>>> IPython.display.Audio(data=wave, rate=22050)

Notes

Currently 11 languages supports. Supported Languages: English, Korean, Japanese, Chinese, Jejueo, Dutch, German, Spanish, French, Russian, Finnish This task can designate a speaker such as ko, en, zh etc.

static get_available_langs()[source]
static get_available_models()[source]
load(device: str)[source]

Load user-selected task-specific model

Parameters

device (str) – device information

Returns

User-selected task-specific model

Return type

object

class pororo.tasks.speech_synthesis.PororoTTS(synthesizer, device, romanize, jejueo_romanize, convert_from_numerical_pinyin, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(text: str, speaker: str) → numpy.ndarray[source]

Conduct speech synthesis on given text

Parameters
  • text (str) – text for tts

  • speaker (speaker) – designation of speaker

Returns

waveform of speech signal

Return type

ndarray