Rate this page:

Speech synthesis / Text to Speech (TTS)

Voximplant provides a list of different voices you can use when creating a TTSPlayer (see the TTSPlayerOptions) or calling the call.say method (see the sayOptions).

All the available languages along with distinct voice options are listed in VoiceList enums grouped by text-to-speech engine: VoiceList.Amazon, VoiceList.Google, VoiceList.Tinkoff, VoiceList.Yandex. VoiceList.Amazon.en_US_Joanna is the default language-voice pair.

This is how you can create a TTSPlayer instance with one of Google’s voices:

createTTSPlayer

createTTSPlayer

SSML

SSML stands for Speech Synthesis Markup Language and is used for fine-tuning of speech synthesis: pronunciation, volume, pitch, rate, etc. These aspects are specified in the ttsOptions of the Call.say and createTTSPlayer methods.

Currently, three attributes are supported:

  • Pitch
  • Rate
  • Volume

If you want to apply any or all of them to the whole text, you can specify attributes as the ttsoptions argument:

ttsOptions

ttsOptions

However, there could be two additional cases:

  • You need to apply one of these SSML attributes to a small part of the text only.
  • You want to use other SSML tags.

To address both cases, you have to use the tag right in the text passed to say/createTTSPlayer.

Supported tags

Please note that the lists of the supported tags and attributes depend on the speech synthesis providers, you can find the lists on the official providers’ websites. For unsupported combinations the PlaybackFinished event will be triggered with error 400.

For example, Amazon requires usage of the tag to control volume, rate, or pitch:

<prosody> tag

<prosody> tag