OpenAI recently launched a beta version of their Realtime API for the Platform, marking a significant advancement in the conversational AI market. This innovation enables large language models (LLMs) to function in a true speech-to-speech mode, offering several benefits over the traditional methods used to build LLM-powered voice assistants. Notably, the new approach reduces total latency to mere milliseconds, facilitates more human-like interactions (including agent interruptions), and produces highly realistic voices.

Previously, developing voicebots required the integration of automatic speech recognition (ASR), LLMs, and text-to-speech (TTS) services—a complex process that platforms like Voximplant support. Now, with OpenAI’s Realtime API, these components are combined within a single voice-enabled model capable of processing and generating speech in real time. As advocates of voice-enabled agents for various tasks and automation scenarios, we at Voximplant are excited to introduce our best-in-class integration with this new API. Our Realtime API Client seamlessly connects any Voximplant-powered call to OpenAI’s sophisticated model in just a few minutes.

What makes our integration stand out? The answer lies in Voximplant’s serverless architecture, which allows developers to connect calls within a VoxEngine scenario using just a few lines of JavaScript code. We’ve streamlined audio codec negotiation, enabling the platform to automatically configure sessions with OpenAI’s API and select the optimal codec based on the connected endpoint. Our solution also manages audio playback according to OpenAI’s Voice Activity Detection (VAD) data to handle agent interruptions. Developers can control the conversation flow using a simple JavaScript API in VoxEngine and react to real-time events from OpenAI. Additionally, they have the flexibility to use any TTS vendor already integrated with Voximplant if OpenAI’s speech synthesis doesn’t meet their preferences.

Our team at Voximplant has invested considerable effort to optimize the user experience for voice agents. For example, we’ve implemented a special audio buffer mode that clears correctly when an agent is interrupted by a person speaking. Given that the OpenAI API generates audio faster than real-time playback, these refinements are critical. Looking ahead, we plan to introduce a feature that supports continued conversations, so if a call is disconnected, the conversation can resume seamlessly if there is a follow-up call with the same participant.

Take a look at this simple example of how a call can be connected to an OpenAI agent using the Realtime API Client:

require(Modules.OpenAI);

VoxEngine.addEventListener(AppEvents.CallAlerting, async ({ call }) => {
    let realtimeAPIClient = undefined;
    let greetingPlayed = false;

    call.answer();
    const callBaseHandler = () => {
        if (realtimeAPIClient) realtimeAPIClient.close();
        VoxEngine.terminate();
    };
    call.addEventListener(CallEvents.Disconnected, callBaseHandler);
    call.addEventListener(CallEvents.Failed, callBaseHandler);

    const OPENAI_API_KEY = 'PUT_YOUR_OPENAI_API_KEY_HERE';
    const MODEL = "gpt-4o-realtime-preview";

    const onConnectionClose = (event) => {
        // Connection to OpenAI has been closed 
        VoxEngine.terminate();
    };

    try {
        // Create realtime client instance
        realtimeAPIClient = await OpenAI.Beta.createRealtimeAPIClient({ apiKey: OPENAI_API_KEY, model: MODEL, onWebSocketClose: onConnectionClose });
        // Start sending media between to the call from an OpenAI agent   
        realtimeAPIClient.sendMediaTo(call);

        // Choose one of the OpenAI voices and enable input audio transcription
        const session_update = {
            //"instructions": "You are a helpful assistant.",
            "voice": "sage",
            "input_audio_transcription": {
                "model": "whisper-1"
            }
        };
        realtimeAPIClient.sessionUpdate(session_update);

        // Force the agent to start the conversation
        const response = {
            'instructions': 'Hello!'
        };
        realtimeAPIClient.responseCreate(response);

        // Start sending media to an OpenAI agent from the call after playback
        realtimeAPIClient.addEventListener(OpenAI.Beta.RealtimeAPIEvents.ResponseAudioTranscriptDone, (event) => {
            if (!greetingPlayed) {
                greetingPlayed = true;
                VoxEngine.sendMediaBetween(call, realtimeAPIClient);
            }
        });

    } catch (error) {
        // Something went wrong
        Logger.write(error);
        VoxEngine.terminate();
    }
});

 

To experience the OpenAI Realtime API in action, call 1-888-852-0965 and try our demo.