Rate this page:

Live API client

Gemini, an large language model (LLM) developed by Google, can seamlessly integrate into Voximplant applications through the Gemini module.

This module offers a client interface that enables real-time communication with Gemini models over WebSockets. It can generate both audio and text transcriptions.

For more information, refer to the Gemini Live API.

Usage

Copy URL

To use Gemini, require the module as it is shown in the code example. Create a geminiLiveAPIClient instance (via the Gemini.Experimental.createLiveAPIClient method) and provide your authentication Multimodal live API key.

To send media between the Call and the geminiLiveAPIClient, use the VoxEngine.sendMediaBetween method. Listen to the geminiLiveAPIClient events, (see the Gemini.Experimental.LiveAPIEvents event list) and implement your application's business logic.

The following code example shows in detail how to connect incoming calls to Google Gemini live API:

Connecting to Gemini live API

Connecting to Gemini live API

To connect Gemini live API models to a 3rd-party application, Google uses a feature called Function calling. Instead of generating text, the model understands when to call functions and provides parameters to execute real-world actions. It acts as a bridge between natural language and real-world actions and data.

This example shows how to process function calling in a VoxEngine scenario:

Function calling

Function calling

In addition to the described functionality, you can use a 3rd-party speech synthesis providers with Gemini live API. You can find more information in the Speech synthesis and Realtime speech synthesis articles in our documentation.

In this scenario example, we show how to use the realtime TTS from the ElevenLabs provider in combination with Gemini live API. It can synthesise speech in a streaming manner, as the source text is continuously updated.

Custom TTS (ElevenLabs provider)

Custom TTS (ElevenLabs provider)