Live API client
Gemini, an large language model (LLM) developed by Google, can seamlessly integrate into Voximplant applications through the Gemini module.
This module offers a client interface that enables real-time communication with Gemini models over WebSockets. It can generate both audio and text transcriptions.
For more information, refer to the Gemini Live API.
Usage
To use Gemini, require the module as it is shown in the code example. Create a geminiLiveAPIClient
instance (via the Gemini.Experimental.createLiveAPIClient
method) and provide your authentication Multimodal live API key.
To send media between the Call
and the geminiLiveAPIClient
, use the VoxEngine.sendMediaBetween
method. Listen to the geminiLiveAPIClient
events, (see the Gemini.Experimental.LiveAPIEvents
event list) and implement your application's business logic.
The following code example shows in detail how to connect incoming calls to Google Gemini live API:
To connect Gemini live API models to a 3rd-party application, Google uses a feature called Function calling. Instead of generating text, the model understands when to call functions and provides parameters to execute real-world actions. It acts as a bridge between natural language and real-world actions and data.
This example shows how to process function calling in a VoxEngine scenario:
In addition to the described functionality, you can use a 3rd-party speech synthesis providers with Gemini live API. You can find more information in the Speech synthesis and Realtime speech synthesis articles in our documentation.
In this scenario example, we show how to use the realtime TTS from the ElevenLabs provider in combination with Gemini live API. It can synthesise speech in a streaming manner, as the source text is continuously updated.