You can synthesise speech to your interlocutor before and during an active call or conference.
To synthesize speech, use the call.say() method or the createTTSPlayer() method in your scenario. Pass the text string to synthesize in the first parameter and options in the second parameter. See the code example below to understand how it works:
Here is an example of the createTTSPlayer() method:
If you have a custom Yandex engine voice, please contact support to activate this feature for your account, then specify the voice folder ID to the yandexCustomModelName property.
You can also configure other speech synthesis options, such as pitch, rate and volume. To specify them, list them in the ttsOptions parameter of the call.say() method or the createTTSPlayer() method. See the code example below:
The options have the following values:
pitch (voice pitch) with the following acceptable ranges: 1) the numbers followed by "Hz" from 0.5Hz to 2Hz 2) x-low, low, medium, high, x-high, default
rate (speech speed) with the following possible values: x-slow, slow, medium, fast, x-fast, default
volume (speech volume) with the possible values: silent, x-soft, soft, medium, loud, x-loud, default
If you want to set them for the whole text, you do not have to use the
speak tag. If you want to use specific attributes for a part of the text, specify the
speak tag manually.
The supported tag list depend on the language provider. You can find these lists on their official websites. If you use a not supported tag, the PlaybackFinished event is triggered with the 400 error.
For example, if we choose Amazon, we have to use the
prosody tag to control volume, rate, or pitch of the selected text fragment. Here is how we make this fragment sound higher:
Here's another example of a specific Amazon's tag say-as: