Text-to-speech

Use this block to articulate the specified text in your scenario. Voximplant Kit supports voice-over from one of the following vendors:

Yandex (Russian, English (USA), Turkish)
Google
Amazon
Tinkoff (only Russian)
Microsoft
IBM

It is necessary to add the speak tag when working with Microsoft.

You can select neural voices in different languages. Neural voices are created based on Neural Text-to-Speech Technology (NTTS). They sound more realistic and natural than standard voices.

To configure the text-to-speech service, do the following:

On the canvas, connect the block using the Out port.
Double-click the Text-to-speech block.
Select the language from the Synth language field.
Select one of the voices in the Voice field.

For specific TTS-providers, you can also configure advanced settings:

Voice pitch - Configure the synthesized voice pitch (Google). Available options: x-low, low, medium, high, x-high, default.
Speech volume - Set the speech volume (Google). Available options: silent, x-soft, soft, medium, loud, x-loud, default.
Speech rate - Set the synthesized speech speed (Google, Yandex). Available options: x-slow, slow, medium, fast, x-fast, default.
Emotions - Configure the synthesized voice sentiment (applicable for specific Yandex voices). Available options: neutral, good, evil.

If you select the ElevenLabs TTS-provider, additional speech synthesis settings are available:

Language model - The model understands which language you use and generates audio accordingly.
Stability - Allows you to adjust the degree of the voice emotionality. The higher the stability, the more restrained and calm the voice becomes. Lowering the setting introduces a broader emotional range.
Similarity boost - Allows you to adjust the level of clarity and similarity of the voice. If similarity is set too high, the AI may reproduce artifacts from low-quality audio.
Style exaggeration - Allows you to amplify the style of the original speaker.
Speaker boost - When enabled, allows you to make TTS speech sound more human-like. Enabling the setting may slightly increase the speech synthesis time.

Enter the text that should be articulated. You can use SSML-tags to add more realistic features to the voice.
Click Save.