Voximplant now includes a native Grok module that connects any Voximplant call to xAI’s Grok Voice Agent API for real-time, speech-to-speech conversations. With a single VoxEngine scenario, you can interact via audio with Grok over phone numbers, SIP trunks and infrastructure, WhatsApp Business, or WebRTC into Grok — all without building custom media gateways or WebSocket streaming infrastructure.


This integration is delivered as a Voice AI Connector inside VoxEngine. Developers define Grok voice, turn-taking, system instructions, tools, and knowledge access using a single sessionUpdate message. Voximplant handles telephony, media conversion, and streaming WebSockets so you can focus on agent behavior instead of infrastructure.


The Grok connector follows the same design principles as Voximplant’s other Voice AI Connectors: simple configuration, low latency, and full control over call logic inside VoxEngine.

Highlights

  • Grok Voice Agent on any call path — Bridge PSTN, SIP, WhatsApp, or WebRTC calls from Voximplant directly into Grok Voice Agent using a single VoxEngine scenario—no custom media gateway required.
  • Low-latency, interruption-friendly conversations — Keep conversations telephony-natural with turn-taking and barge-in patterns: stream audio continuously, detect caller speech, and stop TTS playback immediately when the user interrupts.
  • Grounded answers with xAI Collections (RAG) — Use semantic retrieval over your internal documentation by enabling the file_search tool and pointing it at your xAI collection ID (via vector_store_ids).
  • Web + X search as first-class tools — Enable web_search and/or x_search so Grok can pull fresh context when appropriate, while you still control call flow and how results are used in the conversation.
  • Function calling for escalation and call control — Let Grok decide when to invoke functions like forward_to_agent or hangup_call, and handle those tool calls inside your VoxEngine scenario for clean escalation and deterministic outcomes.
  • Consistent Voice AI Connector model — If you’ve built with Voximplant’s other Voice AI connectors, the Grok integration uses familiar patterns: a single scenario, a session setup message, and a clear realtime event stream you can act on.

Developer notes

  • Native VoxEngine module — Load the integration with require(Modules.Grok); and create a client via Grok.createVoiceAgentAPIClient(...).
  • Session setup: a single `sessionUpdate` object — Configure voice, turn detection, system instructions, tools, and knowledge access using voiceAgentAPIClient.sessionUpdate({ session: { ... } }).
  • Collections Search (RAG) — Add { type: "file_search", vector_store_ids: ["collection_…"], max_num_results: N } to your tools array to ground answers in your own documentation.
  • Web and X search — Add { type: "web_search" } and/or { type: "x_search" } to enable search. Search invocations will appear in the realtime event stream and can be logged alongside the rest of the call.
  • Barge-in and media control — Listen for Grok.VoiceAgentAPIEvents.InputAudioBufferSpeechStarted and call voiceAgentAPIClient.clearMediaBuffer() to cancel current TTS audio when the user interrupts, keeping the dialog natural and responsive.
  • Function calling — Define functions as tools with JSON schema (for example forward_to_agent, hangup_call). When Grok invokes them, handle Grok.VoiceAgentAPIEvents.ResponseFunctionCallArgumentsDone and respond by sending a function_call_output item via voiceAgentAPIClient.conversationItemCreate({ item: { ... } }), then continue with voiceAgentAPIClient.responseCreate({}).
  • Event stream visibility — Subscribe to realtime events (transcription, response streaming, tool calls, and errors) to debug behavior and capture analytics on turn timing and interruptions. Keep “log-only” handlers consolidated so the scenario stays readable.

Demo video
See the video below for a demonstration of a live phone call connected to Grok Voice Agent through Voximplant.

Pricing and availability
The Grok Voice Agent connector costs  -  $0.001 per 15 seconds, which includes both inbound and outbound audio. Voximplant charges for real-time audio streaming, while usage of the Grok Voice Agent API is billed directly by xAI under their standard Voice Agent pricing. This is the same pricing model as Voximplant’s other Voice AI connectors. 


Quick start: Grok Voice Agent VoxEngine scenario
The following VoxEngine scenario connects an incoming call to a Grok Voice Agent, enables xAI Collections search, supports barge-in, and demonstrates simple function calling. 

/**
* Voximplant + Grok Voice Agent connector demo (OUTBOUND)
* Scenario: place an outbound PSTN call and bridge it to Grok Voice Agent.
*
* Expect script_custom_data (VoxEngine.customData) as JSON string, e.g.:
*   { "destination": "+15551234567","callerId": "+15557654321","callerName": "Voximplant"}
*/


require(Modules.Grok);
require(Modules.ApplicationStorage);


const COLLECTION_ID = "collection_4c5a63ab-f739-4c13-93d2-05b74095c34a";


const SYSTEM_PROMPT = `
Your name is Voxi. You are a helpful English-speaking voice assistant on an outbound phone call designed to
update users on the X/Twitter posts from Voximplant and Alexey Aylarov, Voximplant's CEO (handle: aylarov)
and answer questions about the company.
Keep your turns short and telephony-friendly (usually 1–2 sentences).


Always call 'forward_to_agent' if the user asks for a live agent/operator.
Always call 'hangup_call' if the user says goodbye or asks you to hang up.


Use x_search when asked about posts. Use the knowledge base tool for company questions.`;


VoxEngine.addEventListener(AppEvents.Started, async () => {
 let call;
 let voiceAgentAPIClient;
 let hangupCall = false;
 let forwardToLiveAgent = false;


 try {
   // Read outbound call params from scenario custom data
   const { destination, callerId } = JSON.parse(VoxEngine.customData());
   Logger.write(`===OUTBOUND_PARAMS===>${JSON.stringify({ destination, callerId })}`);


   // Place outbound PSTN call
   call = VoxEngine.callPSTN(destination, callerId);


   call.addEventListener(CallEvents.Failed, ()=>VoxEngine.terminate());
   call.addEventListener(CallEvents.Disconnected, ()=>VoxEngine.terminate());


   // When callee answers, connect Grok and bridge audio
   call.addEventListener(CallEvents.Connected, async () => {
     call.record({ hd_audio: true, stereo: true }); // optional record
  
     // Create client and wire media after the callee answers
     voiceAgentAPIClient = await Grok.createVoiceAgentAPIClient({
       xAIApiKey: (await ApplicationStorage.get("XAI_API_KEY")).value,
       onWebSocketClose: (e) => {
         Logger.write(`===ON_WEB_SOCKET_CLOSE===>${JSON.stringify(e)}`);
         VoxEngine.terminate();
       },
     });


     // -------------------- Core flow --------------------


     voiceAgentAPIClient.addEventListener(Grok.VoiceAgentAPIEvents.ConversationCreated, (event) => {
       Logger.write(`===${event.name}===>${JSON.stringify(event)}`);


       voiceAgentAPIClient.sessionUpdate({
         session: {
           voice: "Ara",
           turn_detection: { type: "server_vad" },
           instructions: SYSTEM_PROMPT,
           tools: [
             // { type: "web_search" },
             { type: "x_search", allowed_x_handles: ["voximplant", "aylarov"] },
             { type: "file_search", vector_store_ids: [COLLECTION_ID], max_num_results: 5 },
             { type: "function",
               name: "forward_to_agent",
               description: "Forward the user to a live agent",
               parameters: { type: "object", properties: {}, required: [] },
             },
             { type: "function",
               name: "hangup_call",
               description: "Hang up the call",
               parameters: { type: "object", properties: {}, required: [] },
             },
           ],
         },
       });
     });


     voiceAgentAPIClient.addEventListener(Grok.VoiceAgentAPIEvents.SessionUpdated, (event) => {
       Logger.write(`===${event.name}===>${JSON.stringify(event)}`);


       VoxEngine.sendMediaBetween(call, voiceAgentAPIClient);


       // Outbound opening line (requested)
       voiceAgentAPIClient.responseCreate({
         instructions: "Hi, I’m Voxi. I’m calling to tell you we have a new post."
       });
     });


     // ---------- Barge-in: user starts speaking → cancel any buffered TTS audio ----------
     voiceAgentAPIClient.addEventListener(Grok.VoiceAgentAPIEvents.InputAudioBufferSpeechStarted, (event) => {
       Logger.write(`===${event.name}===>${JSON.stringify(event)}`);
       voiceAgentAPIClient.clearMediaBuffer();
     });


     // -------------------- Function calling --------------------
     voiceAgentAPIClient.addEventListener(Grok.VoiceAgentAPIEvents.ResponseFunctionCallArgumentsDone, (event) => {
       Logger.write(`===${event.name}===>${JSON.stringify(event)}`);


       const { name, call_id } = event?.data?.payload || {};
       let output;


       // Ignore server-side tools like collections_search / web_search / x_search
       if (name !== "forward_to_agent" && name !== "hangup_call") {
         Logger.write(`===Unhandled function call: ${name}===>${JSON.stringify(event)}`);
         return;
       }


       if (name === "forward_to_agent") {
         forwardToLiveAgent = true;
         output = { result: "Forwarding your call to a live agent. Please hold on." };
       } else if (name === "hangup_call") {
         hangupCall = true;
         output = { result: "Have a great day, goodbye!" };
       }


       // Create a conversationItem and send it
       voiceAgentAPIClient.conversationItemCreate({
         item: {
           type: "function_call_output",
           call_id,
           output: JSON.stringify(output),
         },
       });
       voiceAgentAPIClient.responseCreate({});
     });


     // WebSocket media ended: good place to end the scenario after goodbye, etc.
     voiceAgentAPIClient.addEventListener(Grok.Events.WebSocketMediaEnded, (event) => {
       Logger.write(`===${event.name}===>${JSON.stringify(event)}`);


       if (hangupCall) {
         VoxEngine.terminate();
       } else if (forwardToLiveAgent) {
         voiceAgentAPIClient.close();
         // TODO: bridge call to your agent queue / SIP / PSTN number (not shown: optionally with blind transfer)
         // VoxEngine.forwardCallToPSTN(call, "+1XXXXXXXXXX");
         // VoxEngine.callWhatsappUser(whatsAppParams);
         // VoxEngine.callSIP("user@sipaddress.org", sipParams);
         // VoxEngine.callUser("supervisor");


         call.say("Here is where I could forward the call", { voice: VoiceList.ElevenLabs.Jessica });
       }
     });


     // -------------------- Log Other Events --------------------
     [
       CallEvents.FirstAudioPacketReceived,
       Grok.Events.WebSocketMediaStarted,
       Grok.VoiceAgentAPIEvents.ConnectorInformation,
       Grok.VoiceAgentAPIEvents.ResponseCreated,
       Grok.VoiceAgentAPIEvents.ResponseOutputItemAdded,
       Grok.VoiceAgentAPIEvents.ResponseOutputItemDone,
       Grok.VoiceAgentAPIEvents.ResponseOutputAudioTranscriptDelta,
       Grok.VoiceAgentAPIEvents.ResponseOutputAudioTranscriptDone,
       Grok.VoiceAgentAPIEvents.ResponseOutputAudioDone,
       Grok.VoiceAgentAPIEvents.ResponseDone,
       Grok.VoiceAgentAPIEvents.InputAudioBufferSpeechStopped,
       Grok.VoiceAgentAPIEvents.InputAudioBufferCommitted,
       Grok.VoiceAgentAPIEvents.ConversationItemAdded,
       Grok.VoiceAgentAPIEvents.ConversationItemInputAudioTranscriptionCompleted,
       Grok.VoiceAgentAPIEvents.WebSocketError,
       Grok.VoiceAgentAPIEvents.Unknown,
     ].forEach((evtName) => {
       voiceAgentAPIClient.addEventListener(evtName, (e) => {
         Logger.write(`===${e.name}===>${JSON.stringify(e)}`);
       });
     });
   });
 } catch (e) {
   Logger.write("===UNHANDLED_ERROR===");
   Logger.write(e);
   VoxEngine.terminate();
 }
});


Note: this example is intentionally streamlined to focus on the core concepts and is not intended for direct production use..

 

References
Voximplant Grok Voice Agent API Client Guide

Voximplant Grok Voice Agent API Client Reference

xAI Grok Voice Agent API announcement

xAI Grok Voice Agent API documentation