Realtime voice agents have become more intelligent and responsive, but many still struggle with the part callers notice immediately: how the agent actually sounds. A phone agent can answer correctly but still feel flat, scripted, or disconnected from the caller’s tone.
Voximplant now supports Inworld's Realtime API, making it possible to bring Inworld's expressive, conversation-aware agents into real phone calls, SIP networks, and WhatsApp without custom media infrastructure.
What is Inworld Realtime API
Inworld's Realtime API is a speech-to-speech interface that runs STT, LLM routing, and TTS on a single persistent connection. Audio goes in, audio comes out, and the pipeline handles the layers between without requiring you to wire them together or manage separate providers for each.
Inworld is especially strong where voice quality, delivery, and persona matter. Its Realtime TTS-2 model is designed for conversational speech, with support for expressive voice direction, non-verbal cues, and consistent voice identity across 200+ languages. That makes it useful for agents that need to sound more natural, branded, and aware of the flow of the conversation.
See it in action
In the demo, Voxi, a Voximplant and Inworld support agent, takes a call over SIP using a standard softphone.
The exchange goes like this:
User: Demonstrate your expressive capabilities. Respond to "I like Hawaiian pizza."
Voxi: [laugh] Okay, controversial opinion, but hey, you do you!
User: Now give me a negative reaction.
Voxi: [sigh] Pineapple on pizza… I just - I can’t get behind that.
The laughter and the sigh are rendered by TTS-2 from inline delivery tags on a live call, not added in post-processing. The level of expressiveness can be controlled in the API.
What Voximplant adds
Inworld's Realtime API is built for browser and app environments. Voximplant extends it to the channels where most voice interactions actually happen: phone calls, SIP networks, and WhatsApp Business Calling, without requiring a custom media gateway or separate streaming infrastructure.
VoxEngine handles telephony, media conversion, and call routing, so the Inworld integration sits inside the same environment where you manage the rest of your call logic, with session management, escalation, transfers, and fallbacks all in one place.
Getting started
To get started, you need an Inworld API key and a Voximplant account with basic app and route setup. The integration runs on Inworld's TTS-2, their most recent voice model, released earlier this year.
The full code walkthrough and setup guide in documentation will take you through the process.
Resources
Inworld Realtime API Product page




