Demand for live streaming video content is exploding. Live stream content consumption increased 99% between April 2019 and 2020, according to Stream Elements. The trend is driven by gaming and other forms of live entertainment. As a result, software developers are increasingly being asked to add live streaming capabilities to their organization’s applications.

Communications Platforms as a Service (CPaaS) can help developers answer this call. They provide tools and services developers can use to add real time voice and video communications to their applications. When combined with a content delivery network (CDN), developers can easily create a wide range of high quality live streaming services. Examples include consumer gaming applications, company meeting broadcast services, and webinar hosting platforms.

In this blog, we’ll provide a live streaming technology overview focused on its intersection with real time communications. We’ll describe the infrastructure and protocols that enable live streaming; list popular use cases in the business and consumer markets; explain how CPaaS platforms simplify streaming application development; and disclose the live streaming features available in the Voximplant platform.

What is live streaming?

Simply put, live streaming is the transmission of voice, video and/or screen content from one endpoint device to many others within moments of the live event. The “live” characteristics can vary, from latency of less than a second to more than 30 seconds, depending upon the quality of the infrastructure and specific protocols used.
Content delivery networks have made it easy to live stream various types of content to specific communities. For example, Facebook Live enables businesses and consumers to live stream video using a mobile device and the Facebook app, or a computer and browser. Businesses can reach and engage their followers and consumers can stream to friends. YouTube offers similar capabilities for live streaming to a user’s channel.

We’ll focus on applications that integrate a content delivery network with a separate communications application. In this scenario, the communications application is the content source and the CDN handles distribution to the viewer community. The application captures voice, video or screen data and sends it to the CDN via an open interface. 

How does live streaming work?

Live streaming applications require specific infrastructure and protocols developed to encode, transmit and play real time media. HTML5 embeds much of the player technology into browsers, making it seamless for users to access live streams. Encoding and transmission technology have also advanced over the past few years.

Live streaming infrastructure

The fundamental live streaming Infrastructure is composed of a media source, encoder, streaming servers and media players.

Media is captured by an encoder that converts raw camera/microphone feeds into a compressed bit stream. Complete codecs that encode and decode media are embedded in web browsers and SIP communications endpoints, including IP phones. Popular codecs are:

Video: H.264, VP8
Audio: AAC, AAC-LC, MP3, Opus, G.711

In a communications application, media has already been encoded by the endpoints participating in a communications session. For example, a CPaaS platform commonly receives and processes WebRTC video in H.264 format and audio in Opus format. Similarly, SIP endpoints transmit audio in G.711 format.

Streaming servers are specialized infrastructure designed to distribute the live stream to large scale audiences. CDNs manage a global server network optimized to deliver content with low latency, high quality and high reliability. They implement adaptive bit rate protocols that handle the difficult challenge of optimizing each user’s live stream for the bandwidth available between the server and endpoint. In the case of video content, the protocol also adapts to the endpoint’s screen size, sending only the amount of data that can be displayed by the screen. 

Today, media players are embedded in web browsers and mobile devices. Only a few years ago, the Adobe Flash player was the defacto standard for accessing a live stream with a browser. The plugin was clumsy and the underlying protocol could be blocked by firewalls, making user experiences less than seamless. When HTML5 was introduced in 2014, browsers added native media support using adaptive bit rate protocols over standard web ports - 80 and 443. This eliminated friction and improved media quality.

There are lots of resources available to help mobile application developers add a media player to their app. CDNs offer SDKs that make it easy to embed a player. Alternatively, open source HTML5 compatible players are available.

Live streaming protocols

Specialized streaming protocols connect the elements in the infrastructure over an IP network. Because the characteristics for CDN ingest are different than distribution, most CDNs support multiple protocols. They often use the Real Time Message Protocol (RTMP) to ingest live streams and a specialized web protocol to distribute them to users.

Caption: Typical live streaming architecture

RTMP protocol

Developed by Macromedia (acquired by Adobe) to support the Flash Player, RTMP is a fast, low latency protocol that runs over TCP. It is ideal for connecting a CDN to an encoder or CPaaS platform because it is compatible with a wide range of video and audio codecs and it provides reliable media delivery over networks of varying quality.
However, RTMP isn’t a good fit for last mile distribution for many reasons:

  • CDNs can’t easily scale their infrastructure because RTMP requires specialized media servers
  • Variations in device capabilities and available network bandwidth can impair user experiences
  • Use of non-standard ports can make it susceptible to blocking by firewalls

This is why most CDNs use one of multiple web protocols designed for last mile distribution at scale.

Web protocols

Several protocols have been developed to stream media using web servers. This helps CDNs scale their networks by leveraging a common platform across multiple services. In addition, these HTTP-based protocols deliver the best user experience possible regardless of the quality of the last mile connection, or device capabilities; and they run over standard web ports for easy access across firewalls. The HTTP-based protocols most commonly used by CDNs are MPEG-DASH and Apple’s HLS.

HTTP Live Stream (HLS) uses adaptive bitrate streaming techniques to optimize viewer experience. Notwithstanding its Apple heritage, HLS is supported by a wide range of platforms, from Android and Linux devices to Microsoft, and Google Chrome browsers. Of course, iOS and MacOS devices support HLS, too. It features:

  • Compatible with H.264 and H.265 video codecs and many audio codecs, including AAC-LC and Apple Lossless
  • Latency up to 30 seconds

The Moving Pictures Expert Group (MPEG), has developed Dynamic Adaptive Streaming over HTTP (DASH) as an open standard alternative to HLS. It is widely supported by browsers and devices, except on Apple where development is lagging. It features:

  • Audio and video codec agnostic
  • Latency up to 30 seconds

CDNs adapt source media for distribution

You’re probably wondering how you can send media to a CDN with one protocol (RTMP) and have it distributed to users via a different protocol (HLS). This is because CDNs provide a gateway that converts content in real time. The source content is removed from the RTMP wrapper and converted to an adaptive bit rate protocol for distribution.
In addition, CDN media servers apply transcoding, transrating and transizing techniques to optimize the stream for each user. Adaptive bit rate protocols adjust the screen size and resolution to deliver the optimal experience over the available bandwidth. 

Live streaming use cases

Combining live streaming capabilities with a full-feature CPaaS platform opens the door to a wide range of potential new applications. The CPaaS provides truly real time interactive communication between a small group of participants with latency measured in milliseconds while a live streaming CDN can broadcast content to a large audience with latency of a few seconds. Here are some examples:

Gaming Competitions: The CPaaS connects the competitors while the CDN broadcasts the event to an audience.

Entertainment: Leverage gated CDNs to monetize content that is created on the CPaaS platform by connecting live entertainer video calling with fans for viewing by a large audience.

Large Event Broadcasts: The CPaaS can connect a small group of interactive speakers in remote locations who wish to broadcast their content to a larger audience. For example, corporate executives communicating to employees, or virtual conference speakers sharing their ideas with attendees.

Voximplant simplifies live streaming app development

Voximplant offers solutions for live streaming over a range of compatible CDNs, with specific guides for YouTube, Twitch,, or a generic RTMP CDN. Using our platform, you can capture real time video from a mobile device, web client or SIP endpoint and stream it to one of our CDN partners.
VoxEngine media servers can fork any H.264 video session to an RTMP interface. The streaming interface is controlled by your JavaScript application code that runs in the VoxEngine serverless cloud. Simply start a StreamingAgent and specify the stream parameters like name, URL, backup URL, and key frame settings.

const streaming = VoxImplant.createStreamingAgent({ protocol: "RTMP", url: "rtmp://", streamName: "live_********************", keyframeInterval: 4 }); 

See our real time streaming guides for more information.

Caption: Voximplant architecture for Live streaming

CPaaS and CDN: A powerful live streaming combination

CDNs transmit content at massive scale while a CPaaS can create interactive content in real time. Together, the two cloud services enable application developers to build many innovative new services.

The Voximplant CPaaS platform offers an RTMP interface that connects to some of the leading CDNs in the industry. Developers can leverage the complementary services to engage users with video content created in real time on VoxEngine.