New article about Voximplant by Deborah Balshem, Mergermarket
Launched in November 1996, Adobe Flash was the best service for the longest time that provided video and audio communications between two browsers, but there was a problem – transferred data had to go through a separate server using RTMP protocol and that not only led to network latencies but also required installation of Adobe Flash Player.
With the implementation of the WebRTC standard, however, it became possible to enable direct browser-to-browser communications without the need for additional plug-ins, licensing, and downloads.
Nowadays, WebRTC service has completely replaced Flash for real-time communications. The technology is already used in Firefox, Opera, Chrome, Safari, Edge, iOS, and Android. Let’s figure out how it works.
What is WebRTC
WebRTC stands for Web Real Time Communications which is an open-source technology that enables P2P audio, video, and data transfers between browsers and apps. Thanks to this technology, data is transferred directly between users and processed by endpoints.
The Voximplant platform uses WebRTC to provide calls from any endpoint to the cloud and from the cloud to any endpoint.
The MediaStream API accesses a user’s camera and microphone to capture and transfer audio and video. Using this API allows you to create rules of how data will be streamed. For instance, you can manage the frame rate or size of the video.
The PeerConnection API provides a way to send captured data to any endpoint across the Internet avoiding the need for an intermediary server. It allows users to maintain, monitor, and close the connection.
RTCDataChannel enables the bi-directional transfer of arbitrary data through an established connection. This includes text chats and file sharing as well as other non-audiovisual forms of data.
Before starting a communication session, you need to establish a connection between two clients through signaling.
Signaling refers to the process of exchanging metadata between two endpoints. To begin talking, endpoints have to share streaming details such as IP addresses, session keys, and bandwidths. The mechanism is processed through a signaling server.
The two endpoints exchange data using the Session Description Protocol (SDP). In practice, it's a text file that contains all of the information needed to establish a connection.
NAT Traversal - ICE, TURN, and STUN
Most computers are ‘hidden’ behind routers on private networks. This way, only the router’s external IP address is exposed. To allow a computer to communicate with external devices, the private address must be replaced with a public one using NAT (Network Address Translation) traversal. There are three key processes involved:
1. Interactive Connectivity Establishment (ICE) – selects the most efficient way to connect computers.
2. Session Traversal Utilities for NAT (STUN) – enables computers to find their public IP address by requesting a STUN server.
3. Traversal Using Relays around NAT (TURN) – relays real-time media data exchange between computers instead of signaling data.
The fact is that raw data is too large to be transferred over the Internet. A media codec compresses the media before sending. When the second endpoint receives this data, the media codec decompresses it. There are several audio and video codecs added to the standard.
The WebRTC standard initially covered only peer-to-peer (or mesh) topology but there are several more complex network scenarios when you need to create a call with multiple peers. In this case, using the mesh topology becomes undesirable. Server-based topologies help to deal with conference calls.
A peer-to-peer or mesh topology uses a serverless approach. Participants communicate with each other directly. Thus, it’s cheap and easy to implement.
The drawback is that as the number of participants increases, more bandwidth and CPU processing will be required. The mesh topology suits communications with 2-3 participants but also has low latency and no recording ability.
In a Selective Forwarding (SFU) topology, each session participant uploads an encoded video stream to a separate server. Then, this server distributes streams to other connected participants. All of the processing of media is operated on the server side enabling recording and transcoding.
The main disadvantage of this topology is that the network entirely depends on the server. If the server goes down, the entire network will go down. SFU is the best way to connect 4-10 participants.
MCU stands for Multipoint Control and it is quite similar to a Selective Forwarding topology. Each participant sends its stream to a server. The MCU decodes each stream and adds them to a unified one. Then, a server encodes this unified stream and sends it to all participants.
This makes MCUs a reliable solution for large numbers of participants or poor network conditions.
Building a hybrid architecture allows you to mix all of the aforementioned topologies to save costs. If there are only two call participants, you can use a mesh topology. If someone joins them, you have the ability to ‘upgrade’ the call to communicate using SFU.
WebRTC as technology is secure. There are several measures to secure your data.
Since WebRTC doesn’t require any plug-ins, you can avoid installing malware or other undesirable software. Also, because it is implemented within browsers, any potential security threats are eliminated by automatic browser updates.
Each time you start a session, you have to enable use of your camera and microphone so it is impossible to gain access to a device without your permission.
You might think that security risks arise during the transfer of data between browsers; however, encryption is a compulsory feature for all operations of establishing and maintaining a connection. There are two standardized encrypting protocols:
1. Datagram Transport Layer Security (DTLS) – built-in browser protocol used to securely exchange key data.
2. Secure Real-Time Transport Protocol (SRTP) – used to encrypt and decrypt media streams.
No matter which industry your company operates in, WebRTC is here to help you enable live video, voice, and messaging communications in your services. Moreover, its security and flexibility allow for app customization without fear of data leakage. Try the technology on the Voximplant platform and set up real-time communications with your customers and colleagues.