Rate this page:

Calls and sessions

Usually, people consider a phone call as a connection between two endpoints (a caller and a callee), but in the telecom world, a call consists of two connections called "call legs": from a phone to a telecom operator’s hardware/software and from a telecom operator’s hardware/software to another phone.

Call scheme

For Voximplant each call leg is simply called "a call", and each call can be controlled independently by the JavaScript code deployed in the Voximplant cloud. Our cloud can be considered as a telecom operator from the scheme above, where a call can arrive not only from the phone network but also from the IP network (SIP device or application, Voximplant Web or Mobile SDK). You tell the Voximplant cloud what to do with each call using JavaScript: connect to another call, join a conference, record voice or video, etc. Similarly, a call can be sent from the Voximplant cloud to the different types of endpoints.

Call scheme 2

Voximplant serverless is implemented with the help of a JavaScript runtime environment called VoxEngine. Developers write JavaScript code that manages calls and then upload it to the cloud. That code is executed on Voximplant media servers where calls are processed. This approach enables both real-time call control and real-time debugging. JavaScript implementation used in VoxEngine is ES2017/2018 compliant.

VoxEngine provides a variety of classes and functions that are used to implement required call control logic and to manage Voximplant built-in features like call recording, speech recognition, speech synthesis, and much more. Since VoxEngine provides full-featured JavaScript support and built-in functions for data exchange with external web services (for making and receiving HTTP requests or handling websocket connections), some business logic can be implemented along with the call control logic.

Voximplant session execution lifetime is rather long compared to typical serverless functions, the session usually lasts longer than the calls the function controls. The session context is available throughout its lifetime.


VoxEngine is fully asynchronous: there are no waiting or blocking operations. All method calls start corresponding operations and return values immediately. If a developer places two say methods one after another, the second playback immediately replaces the first one. Basically, it is similar to event-driven architectures that other JavaScript platforms like Node.JS use. To maintain real-time execution, there are some limits for each JavaScript session.

Please note

Voximplant considers each call as a separate session and handles it separately from other calls.

MediaUnit concept

Copy URL

VoxEngine treats any object with an audio/video stream as a separate media unit. Thus, a media unit can be a call itself or a conference, as well as an instances of ASR, Player, and Recorder.

A call can send multiple media (voice and video) streams to other calls, but receive only one stream at a time. A new stream sent to a connected call replaces a previous one. If you want to mix two or more media streams, use a conference instead. A conference can both receive and send multiple streams at once.

Use the sendMediaTo method to send media from a call to the media unit specified in targetMediaUnit.