Launched in November 1996, Adobe Flash was the best service for the longest time that provided video and audio communications between two browsers, but there was a problem – transferred data had to go through a separate server using RTMP protocol and that not only led to network latencies but also required  installation of Adobe Flash Player.

With the implementation of the WebRTC standard, however, it became possible to enable direct browser-to-browser communications without the need for additional plug-ins, licensing, and downloads.

Nowadays, WebRTC service has completely replaced Flash for real-time communications. The technology is already used in Firefox, Opera, Chrome, Safari, Edge, iOS, and Android. Let’s figure out how it works.

What is WebRTC

WebRTC stands for Web Real Time Communications which is an open-source technology that enables P2P audio, video, and data transfers between browsers and apps. Thanks to this technology, data is transferred directly between users and processed by endpoints.

The Voximplant platform uses WebRTC to provide calls from any endpoint to the cloud and from the cloud to any endpoint.

WebRTC APIs

There are three main Javascript APIs that enable data streaming.

MediaStream

The MediaStream API accesses a user’s camera and microphone to capture and transfer audio and video. Using this API allows you to create rules of how data will be streamed. For instance, you can manage the frame rate or size of the video.

RTCPeerConnection

The PeerConnection API provides a way to send captured data to any endpoint across the Internet avoiding the need for an intermediary server. It allows users to maintain, monitor, and close the connection. 

RTCDataChannel

RTCDataChannel enables the bi-directional transfer of arbitrary data through an established connection. This includes text chats and file sharing as well as other non-audiovisual forms of data.

Establishing Connections

Before starting a communication session, you need to establish a connection between two clients through signaling. 

Signaling

Signaling refers to the process of exchanging metadata between two endpoints. To begin talking, endpoints have to share streaming details such as IP addresses, session keys, and bandwidths. The mechanism is processed through a signaling server. 

The two endpoints exchange data using the Session Description Protocol (SDP). In practice, it's a text file that contains all of the information needed to establish a connection.

NAT Traversal - ICE, TURN, and STUN

Most computers are ‘hidden’ behind routers on private networks. This way, only the router’s external IP address is exposed. To allow a computer to communicate with external devices, the private address must be replaced with a public one using NAT (Network Address Translation) traversal. There are three key processes involved:

1. Interactive Connectivity Establishment (ICE) – selects the most efficient way to connect computers.
2. Session Traversal Utilities for NAT (STUN) – enables computers to find their public IP address by requesting a STUN server.
3. Traversal Using Relays around NAT (TURN) – relays real-time media data exchange between computers instead of signaling data.

Codecs

The fact is that raw data is too large to be transferred over the Internet. A media codec compresses the media before sending. When the second endpoint receives this data, the media codec decompresses it. There are several audio and video codecs added to the standard. 

WebRTC Topologies

The WebRTC standard initially covered only peer-to-peer (or mesh) topology but there are several more complex network scenarios when you need to create a call with multiple peers. In this case, using the mesh topology becomes undesirable. Server-based topologies help to deal with conference calls.

Peer-to-peer

A peer-to-peer or mesh topology uses a serverless approach. Participants communicate with each other directly. Thus, it’s cheap and easy to implement.

The drawback is that as the number of participants increases, more bandwidth and CPU processing will be required. The mesh topology suits communications with 2-3 participants but also has low latency and no recording ability.

Selective Forwarding

In a Selective Forwarding (SFU) topology, each session participant uploads an encoded video stream to a separate server. Then, this server distributes streams to other connected participants. All of the processing of media is operated on the server side enabling recording and transcoding.

The main disadvantage of this topology is that the network entirely depends on the server. If the server goes down, the entire network will go down. SFU is the best way to connect 4-10 participants.

Multipoint Control

MCU stands for Multipoint Control and it is quite similar to a Selective Forwarding topology. Each participant sends its stream to a server. The MCU decodes each stream and adds them to a unified one. Then, a server encodes this unified stream and sends it to all participants. 

This makes MCUs a reliable solution for large numbers of participants or poor network conditions.

Hybrid Topology

Building a hybrid architecture allows you to mix all of the aforementioned topologies to save costs. If there are only two call participants, you can use a mesh topology. If someone joins them, you have the ability to ‘upgrade’ the call to communicate using SFU.

WebRTC Security

WebRTC as technology is secure. There are several measures to secure your data.


Browser Protection

Since WebRTC doesn’t require any plug-ins, you can avoid installing malware or other undesirable software. Also, because it is implemented within browsers, any potential security threats are eliminated by automatic browser updates.


Media Access

Each time you start a session, you have to enable use of your camera and microphone so it is impossible to gain access to a device without your permission.  

Encryption

You might think that security risks arise during the transfer of data between browsers; however, encryption is a compulsory feature for all operations of establishing and maintaining a connection. There are two standardized encrypting protocols:

1. Datagram Transport Layer Security (DTLS) – built-in browser protocol used to securely exchange key data.
2. Secure Real-Time Transport Protocol (SRTP) – used to encrypt and decrypt media streams.

Conclusion

No matter which industry your company operates in, WebRTC is here to help you enable live video, voice, and messaging communications in your services. Moreover, its security and flexibility allow for app customization without fear of data leakage. Try the technology on the Voximplant platform and set up real-time communications with your customers and colleagues.