Rate this page:

Text-to-speech

Use this block to articulate the specified text in your scenario. Voximplant Kit supports voice-over from one of the following vendors:

  • Yandex (Russian, English (USA), Turkish)

  • Google

  • Amazon

  • Tinkoff (only Russian)

  • Microsoft

  • IBM

Note

It is necessary to add the speak tag when working with Microsoft.

You can select neural voices in different languages. Neural voices are created based on Neural Text-to-Speech Technology (NTTS). They sound more realistic and natural than standard voices.

To configure the text-to-speech service, do the following:

  1. On the canvas, connect the block by using the Out port.
  2. Double-click the Text-to-speech block.
  3. Select the language from the Synth language field.
  4. Select one of the voices in the Voice field.

For specific TTS-providers, you can also configure advanced settings:

  • Voice pitch - Configure the synthesized voice pitch (Google). Available options: x-low, low, medium, high, x-high, default.

  • Speech volume - Set the speech volume (Google). Available options: silent, x-soft, soft, medium, loud, x-loud, default.

  • Speech rate - Set the synthesized speech speed (Google, Yandex). Available options: x-slow, slow, medium, fast, x-fast, default.

  • Emotions - Configure the synthesized voice sentiment (applicable for specific Yandex voices). Available options: neutral, good, evil.

Text-to-speech

If you select the ElevenLabs TTS-provider, additional speech synthesis settings are available:

  • Language model - The model understands which language you use and generates audio accordingly.

  • Stability - Allows you to adjust the degree of the voice emotionality. The higher the stability, the more restrained and calm the voice becomes. Lowering the setting introduces a broader emotional range.

  • Similarity boost - Allows you to adjust the level of clarity and similarity of the voice. If similarity is set too high, the AI may reproduce artifacts from low-quality audio.

  • Style exaggeration - Allows you to amplify the style of the original speaker.

  • Speaker boost - When enabled, allows you to make TTS speech sound more human-like. Enabling the setting may slightly increase the speech synthesis time.

11Labs settings
  1. Enter the text that should be articulated. You can use SSML-tags to add more realistic features to the voice.
  2. Click Save.