Speech synthesis

You can synthesize speech and send it to your interlocutor before and during an active call or conference.

Speech synthesis usage
Configuring the voice
Passing parameters directly to the provider

Speech synthesis usage

To synthesize speech during an incoming call, first answer a call. You can use the startEarlyMedia method to broadcast speech before the call is answered to create a greeting or a voicemail prompt.

To synthesize speech, use the VoxEngine.createTTSPlayer method in your scenario. Pass the text string to synthesize as the first parameter and the options as the second parameter. See the code example below to understand how it works:

Synthesize speech

Configuring the voice

You can choose the voice for speech synthesis from one of these lists: VoiceList. The default voice is VoiceList.Amazon.en_US_Joanna.

If you have a custom voice on one of the providers from the VoiceList, please contact support to activate this feature for your account and receive the appropriate setup instructions. For example, for VoiceList.Yandex you need to specify the voice folder ID using yandexCustomModelName property.

You can also configure other speech synthesis options, such as pitch, rate, and volume. To specify them, use the CallSayParameters.ttsOptions parameter for the Call.say method or the TTSPlayerParameters.ttsOptions parameter for the VoxEngine.createTTSPlayer method. See the code example below.

Configure the voice options

The options have the following values:

pitch (voice pitch) with the following acceptable ranges: numbers followed by "Hz" from 0.5Hz to 2Hz, or the following values: x-low, low, medium, high, x-high, default
rate (speech speed) with the following possible values: x-slow, slow, medium, fast, x-fast, default
volume (speech volume) with the possible values: silent, x-soft, soft, medium, loud, x-loud, default

If you want to set them for the whole text, you do not have to use the speak tag. If you want to use specific attributes for a part of the text, specify the speak tag manually.

The supported tag list depends on the language provider. You can find these lists on their official websites. If you use a not-supported tag, the PlaybackFinished event is triggered with the 400 error.

For example, if we choose VoiceList.Amazon, we have to use the prosody tag to control volume, rate, or pitch of the selected text fragment. Here is how we make this fragment sound higher:

Amazon's prosody tag

Here is another example of a specific Amazon's tag say-as:

Amazon's say-as tag

Passing parameters directly to the provider

There are two ways of passing speech synthesis parameters to your provider. You can fill the ttsOptions parameters on the Voximplant side, as it is explained in this article, so the platform converts them to the provider's format and sends them to your provider. Alternatively, you can provide the parameters directly to the provider via the CallSayParameters.request parameter for the Call.say method or the TTSPlayerParameters.request parameter for the VoxEngine.createTTSPlayer method.

You need to specify the parameters in the specific format that your provider accepts. Different providers use different formats. Refer to your provider's API reference to learn about the formats.

Here is the full scenario example of how to use the request parameter with VoiceList.Google:

Request parameter for Google TTS

Here are examples of the request parameter for the most common providers:

Request parameter examples

Different providers may mark the different parameters as required. In some cases, without specifying these parameters, you may get an error; in other cases, default values may be used. For example, the required model_id parameter for the ElevenLabs is used to specify the selected model. If you do not pass it, the default value eleven_multilingual_v2 will be used. We advise you to carefully read the documentation of your chosen provider for more detailed information.

3rd-party speech synthesis providers' documentation links

In addition to the examples provided, you can refer to the documentation of third-party speech synthesis providers to gain insights into constructing the request parameter in your specific scenario.

Alternatively, you can use the Media player to integrate 3rd-party voice providers, such as OpenAI TTS.

Contents