Realtime speech synthesis
In addition to traditional text-to-speech synthesis, when you input a specific text and select a voice from a predefined list to synthesize speech, some providers offer a unique feature called realtime speech synthesis.
This means that the synthesis process occurs in a streaming manner, as the source text is continuously updated. When dealing with multiple sources, such as Large Language Models (LLM) like ChatGPT, which provide text in chunks, realtime speech synthesis becomes particularly advantageous.
In this article, we provide an example how to use realtime speech synthesis from the ElevenLabs provider, which offers a range of realtime voice options.
Usage
To use the ElevenLabs realtime speech synthesis, require the Modules.ElevenLabs
module from VoxEngine in your scenario.
Use the ElevenLabs.createRealtimeTTSPlayer
method to create a realtime TTS player and provide desired parameters. Use the *.sendMedia
or VoxEngine.sendMediaBetween
methods to send media between Call
and ElevenLabs.RealtimeTTSPlayer
.
Listen to the ElevenLabs.RealtimeTTSPlayer
events (via the ElevenLabs.RealtimeTTSPlayer
event list) and implement desired application business logic.
Here is the complete scenario for your reference.