Realtime speech synthesis

In addition to traditional text-to-speech synthesis, when you input a specific text and select a voice from a predefined list to synthesize speech, some providers offer a unique feature called realtime speech synthesis.

This means that the synthesis process occurs in a streaming manner, as the source text is continuously updated. When dealing with multiple sources, such as Large Language Models (LLM) like ChatGPT, which provide text in chunks, realtime speech synthesis becomes particularly advantageous.

In this article, we provide an example how to use realtime speech synthesis from the ElevenLabs provider, which offers a range of realtime voice options.

Usage

To use the ElevenLabs realtime speech synthesis, require the Modules.ElevenLabs module from VoxEngine in your scenario.

Use the ElevenLabs.createRealtimeTTSPlayer method to create a realtime TTS player and provide desired parameters. Use the *.sendMedia or VoxEngine.sendMediaBetween methods to send media between Call and ElevenLabs.RealtimeTTSPlayer.

Listen to the ElevenLabs.RealtimeTTSPlayer events (via the ElevenLabs.RealtimeTTSPlayer event list) and implement desired application business logic.

Here is the complete scenario for your reference.

Usage

ElevenLabs Realtime TTS