Listen to Episode 7 of the Product Management Leaders Podcast to uncover strategies and tactics for building world-class products.
A couple weeks ago, Amazon Web Services (AWS) experienced an outage in its US-EAST-1 region. As so many services rely on AWS, this outage had a broader impact, causing outages and various issues with Amazon’s own Ring services, online retailers, and even the New York City MTA. In addition, a couple major Communications Platform as a Service (CPaaS) providers also reported issues (Voximplant was not impacted), potentially impacting the communications of many of their customers.
With this in mind, now is a good time to look at how CPaaS offers leverage public cloud infrastructure and review the factors involved in providing reliable, high quality communications services. In this post we will review the public cloud infrastructure used by several major CPaaS vendors and discuss the implications of their choices.
How Cloud Communications Platforms Build their Networks
All cloud-based Software as a Service providers (SaaS) need to host their code on a server somewhere. That code needs to be connected to various Internet exchange points so those services can reach customers and partner service providers. These providers generally have two choices for where to host their infrastructure:
- Public cloud providers - the big cloud computing companies that run various API’s and cloud infrastructure services. The most popular ones include Amazon Web Services, Microsoft Azure, Google Cloud Platform, IBM Cloud as well as many smaller ones.
- Data center providers - offer lower-level access to servers and Internet Service Provider without all the added software layers on top. Examples include companies like Equinix, INAP, and servers.com. Many of these companies have started to add services that make them look more like the Infrastructure as a Service (IaaS) offers of the big public cloud providers.
The big public cloud providers have grown to be massive with giant ecosystems of supporting products. These tend to be the most popular choice for use by Cloud Communications Platforms. However, independent data center providers do still find some use for niche needs not fulfilled by the big players.
How a Cloud provider’s infrastructure impacts CPaaS services
The choices a CPaaS provider makes for its cloud infrastructure has many impacts for users of those services. The real-time, personal, and often mission-critical nature of communications makes it particularly challenging to implement compared to other applications. Since even a momentary outage is noticeable during a call, communications application providers must offer a highly reliable service while maintaining high quality. In addition, communicating using microphone or camera brings unique privacy implications. These choices are major factors in how a CPaaS chooses to deploy their networks.
Reliability through Redundancy
There are many facets to providing reliable services over the Internet. Outside of code development best practices and rigorous Quality Assurance processes, the best defense against major issues like an outage is redundancy. Redundancy is ultimately about avoiding single points of failure that could take down a service. The bigger, more complex the network the more potential points of failure there are. A well designed network will have duplicate software processes running on redundant hardware with multiple, independent network routes. In practice this means setting up:
- Redundant servers / processes - ensuring that should one server or software process have critical issues, another will take its place.
- Redundant data centers - in the relatively rare case that a physical data center facility has an issue - such as in a flood or construction accident - there should be redundant physical facilities.
- Redundant cloud providers - the last resort is leveraging multiple cloud providers.
Redundancy at the server or process level is often achieved via a load balancer element that distributes traffic among multiple servers. Should one server fail, the load balancer can send traffic to other processes - so long as the application is able to handle it. Of course the load balancers themselves should be set up in a redundant fashion too.
Redundancy at the data center level is often built into modern cloud platform offerings. For example, major cloud providers often host several physical data centers in physical areas that provide similar latency characteristics. Should one of the physical data centers have issues there are procedures for automatically routing traffic to alternative data centers. These providers may also provide tools for helping to share data and traffic across regions.
That last level of redundancy is cloud provider redundancy. Although it is rare, human error and dumb luck can take a cloud provider offline. Bugs in the use of a specific cloud vendor’s SDK can potentially cause cloud-provider specific issues too. A multicloud architecture that depends on multiple cloud providers is the best defense in these cases. Multicloud architectures may utilize multiple public cloud providers as well as independent data centers for the ultimate redundancy.
Latency: closer is usually better
Real Time Communications are very sensitive to latency - the time it takes voice and video data to travel between callers. Low latency is usually one of the best indicators of good media quality. Generally the closer a user is to the entity they are communicating with, the better. The path taken by a given VoIP packet may be subject to twists and turns along the way - but just like a highway, a short, direct route usually gets you there faster.
For this reason, CPaaS providers should strive to host their services in data centers that are close to users. This is particularly important for media-handling elements such as media servers, gateways, and SBCs that handle media and are directly in the communication flow.
Of course there are practical limits on what makes sense as well as real costs to running more data centers than what is truly needed. Just like highways usually have limited on-ramps and off-ramps, the same is true of the Internet connections between data centers. It is not practical to have a data center in every city, but network designers should strive to have a data center in the general region.
As an example, if you have a caller in New York City, you don’t want their traffic travelling to San Francisco so that they can talk to someone roughly 100 miles away / 150 km in Philadelphia. A good target round-trip latency is 150 ms. The time it takes light to travel through fiber in a straight line from New York to San Francisco and back is nearly 40ms. Adding a more realistic path with delays added by many routers along the way will add many multiples to that latency, bringing it well above 150ms. You really want that traffic to traverse a data center somewhere closer. A New York City based data center would be ideal, but usually somewhere on the East Coast will generally give very acceptable performance.
Data sovereignty & localization
Many countries have laws regulating the control and storage of cloud-based data. These rules may specify how governments can access user data and the systems this data is hosted on. Beyond requiring that companies comply with their user data and security laws, data localization efforts actually require some initial data handling within national boundaries. In addition, some companies may prefer their data be handled within a certain region to aid in compliance with broader security and privacy regulations. For example, the European Union’s General Data Protection Regulation (GDPR) requires protection of EU citizen data, even outside of EU borders. Many companies have chosen to keep their data within the EU rather than deal with compliance issues raised by hosting data outside of the EU.
The only way to comply with data sovereignty efforts is to host servers within the countries that require it. Global providers may offer data residency options that allow data to be stored in a specific country or data center.
CPaaS Data Center Analysis
A reverse-IP address lookup analysis of several major CPaaS providers yields some interesting results. One can examine how a CPaaS has built its network by analyzing its publicly listed IP addresses. We looked at Twilio, Vonage/Nexmo, and Messagebird to understand their geographic footprint and data center providers.
|CPaaS||Continents||Countries||DC Providers||DC Regions|
The methodology section further below describes how this analysis was conducted.
True geographic coverage requires at least one data center located in each continent. Here is how these providers compare across the globe:
Number of unique countries with at least one data center
No CPaaS covered every continent. Unsurprisingly, there are no public data centers for the miniscule Antarctica population. However, none of the CPaaS providers have a datacenter in Africa. Only Voximplant and Twilio cover all the remaining continents.
As mentioned earlier, coverage in specific countries may be required for regulatory or policy reasons in addition to the reduced latency that closer proximity usually provides. The below maps illustrate where these CPaaS vendors have servers.
Unique countries with at least one data center highlighted for each CPaaS. Note the map is to scale, so some small countries, like Singapore are difficult to see.
All the CPaaS above had data centers in Singapore. Nexmo is unique in offering a Canadian data center for North America. Voximplant is very unique in offering many data centers in 9 countries throughout Europe, including:
Datacenter Provider Diversity
This analysis can also reveal who is providing hosting and IaaS behind each CPaaS:
Unique hosting organizations used by each CPaaS
All of the CPaaS examined has some reliance on major public cloud providers. Twilio and Messagebird had no multi-cloud architecture, with IP addresses indicating only one major cloud platform provider. Voximplant is unique in that it uses several public cloud providers. In addition, Voximplant also utilizes several independent data centers giving it the most diversity.
Voximplant’s Networking Approach
Voximplant started building its network early-on with a multi-cloud approach. As its customers required broader coverage and the company went global, it was relatively easy to extend this infrastructure to different cloud providers and independent data centers. As a result, Voximplant is not limited to the locations offered by major cloud providers like Amazon Web Services. This has the added advantage that it can very flexibly add new data center locations for unique customer demands or where performance requires it. Multicloud also allows us to provide better redundancy without reliance on a single cloud vendor. Lastly, this diverse set of infrastructure also provides a robust foundation for data sovereignty needs.
Moving forward, Voximplant will continue to offer a diverse, multi-cloud environment. Some data center providers may be consolidated to help keep costs low and streamline operations. At the same time, we expect we will continue to add new data center locations that improve performance and help our customers with compliance.
Make sure to use our getMediaResources API for the latest list of IP addresses by type, Of course you can always contact Voximplant for more information and guidance too!
CPaaS must share their IP address information so that firewalls can be programmed to allow traffic from the CPaaS. IP address lookup tools can be used to determine the datacenter provider and the general geographic location where that IP address is located.
ipapi was used to perform the reverse IP lookup analysis. Where possible, only IP addresses associated with handling media - i.e RTP - were used as these are the most sensitive to latency and reliability needs. In most cases there were no apparent discrepancies between the data centers used for signaling vs. media.
ipapi returns a country, city, region, and organization for each IP address. Org is used to determine the DC Providers values. The org and region fields were combined to determine unique data center regions (DC Regions) for comparison. In cases where a block of IP addresses was provided, a single IP address was analyzed within this block.
The full analysis can be found here. See below for specifics on how the IP addresses were obtained for each CPaaS vendor.
All of Voximplant’s IP addresses are available via its getMediaResources API. As noted above, the focus was on media handling endpoints, so the following URL was used:
http://api.voximplant.com/getMediaResources?with_mediaservers&with_sbcs for the final analysis.
Twilio publicly lists its service regions here. The following links were used to aggregate this IP address information:
Vonage lists the IP addresses for Nexmo SIP addresses here. Vonage uses different IP addresses for Tokbox, its video calling API. Vonage lists the Tokbox locations here, but the IP address list is only available for Tokbox users on the Enterprise plan. As this data was not fully available for verification, and that infrastructure is only used for a more limited set of services, the Tokbox specific locations were excluded.
MessageBird lists its IP address information here. The SIP RTP servers information was used for analysis.