Before a developer creates a SIP-based communications application that travels over the public internet, they need to understand the associated security risks and how to mitigate them. The SIP protocol is vulnerable to a range of attacks that can compromise communications privacy, integrity and reliability, unless developers take steps to protect their users. 

In this blog, we’ll describe the risks with transporting voice and video communications over the internet and the standards-based technologies available to mitigate them. The goal is to make developers aware of the security mechanisms that should be built into their service - regardless of whether you’re using a communications platform as a service (CPaaS) provider, or you’ve built the infrastructure yourself. We’ll also provide a reference to the security features available in the Voximplant service.

SIP Security Vulnerabilities

Early SIP-based communications services were deployed in private IP networks, where security was a secondary consideration. Businesses installed telephony and unified communications gear made by Cisco, Avaya, Microsoft and others based on the SIP protocol. These systems typically operate in a trusted network where clear text SIP signaling and unencrypted RTP media packets can be safely transmitted. When a communications session includes an endpoint outside the business, it is converted to legacy digital formats and carried over the Public Switched Telephone Network (PSTN), which is a completely separate network with its own security mechanisms (and vulnerabilities).

More recently, telecommunications service providers have offered SIP trunking services to their business customers. These services are typically carried over a private IP network connection between the business and telco point of presence, such as MPLS. Here also, communications are generally in the clear because of the high level of trust in the connection.

Risks dramatically increase when SIP communications are transported over an untrusted network, such as the public internet. Examples of communications services that traverse the internet include consumer telephony and a range of cloud communications services (e.g. IVR in the cloud, SIP trunking). These services may use the public internet only to transport a single hop in a SIP session before connecting to the PSTN, or they may carry the entire end-to-end session over the internet. In any case, SIP is inherently vulnerable to a range of attacks that can be conducted over untrusted networks, including:

  • Eavesdropping - as previously mentioned, SIP and RTP are transmitted in the clear, making it trivial for an attacker with access to a mirrored ethernet port to copy the signaling and media packets off the network and play them in real time. 
  • Hijacking - by inserting a man-in-the-middle of the communications path, attackers can redirect SIP sessions to an imposter endpoint, enabling them to steal valuable information from unsuspecting users. Academic studies show attackers can easily replace various portions of SIP messages exchanged by endpoints and proxy servers without detection, including the session description protocol (SDP) contents specifying IP ports for media exchange and SIP request URIs.
  • Fraud - Various tools enable attackers to identify the SIP clients and servers attached to a network and compromise them. For example, SIPvicious is an open source tool that hackers use to scan networks and identify SIP servers. Attackers then exploit weak password protection to access the server and use it to execute a range of fraud schemes, from premium number revenue fraud to phishing attacks. 

This is only a partial list of vulnerabilities. The risks are numerous because client/server authentication methods are weak and packets are not encrypted for privacy. The good news is most of these vulnerabilities can be mitigated by taking extra steps to protect SIP communications sessions.

How to Make Communications Secure

It’s a bit confusing, but when we speak of SIP communications, we’re actually referring to two separate types of traffic that compose a voice or video session: SIP signaling messages and RTP media packets. Both signaling and media must be secured against an attack. This means communicating endpoints must be authenticated and both types of packets must be delivered with integrity and privacy for the duration of the session.

VPN Tunneling

Perhaps the simplest way to secure a communications session is to encapsulate the traffic in a VPN tunnel. This is a field-proven method of securing all types of applications beside real time communication. A variety of authentication methods can be applied, including PKI, tunneling techniques ensure message integrity, and strong IPsec encryption ensures privacy. 

The primary drawback to a VPN is that it doesn’t provide end-to-end protection because it is a network layer security mechanism. The endpoints can’t be confident that packets aren’t corrupted between the VPN boundary and the communications application. This is a limitation common to all VPN applications and shouldn’t disqualify it from use with a communications service. If the local networks at each end of the VPN are trusted, then this can be a satisfactory solution.

The VPN approach also introduces additional systems and administrative overhead. It can add cost and complexity to your application.

Application Layer Security Protocols

For the strongest security at the lowest cost, we recommend an application-layer approach. Standard protocols have been developed and products are available to secure the signaling and media packets exchanged by communicating endpoints and servers. 

SIP signaling can be secured using Transport Layer Security (TLS) in much the same way that HTTP messages are secured. TLS enables mutual authentication between SIP clients and servers using digital certificates and it ensures integrity and privacy. A PKI infrastructure provides authentication and enables the client and server to negotiate cryptographic keys. Strong ciphers, such as AES, are used to encrypt packets. 

A note of caution:  SIP TLS secures only the messages traversing a single hop between client and server or between proxy servers. Most communications services use multiple specialized servers to set up a complete session between endpoints, which can make end-to-end security challenging. However, this is typically not an issue because the servers reside within the provider’s private network. We’re concerned with securing a specific hop that is using the internet for transport, not the entire end-to-end session. 

For example, the developer of a telephony endpoint application uses a CPaaS to deliver PSTN services for the application’s users and the endpoints access the CPaaS via the internet. In this case, the developer needs to invoke a SIP TLS method (e.g. SIPS:user@host) for the client-server session request in order to secure the first hop between endpoint and CPaaS.
The CPaaS takes responsibility for securing additional hops using a private network and/or applying secure methods. This example is further described in this post: Voximplant TLS support.

RTP media can be secured using the Secure RTP protocol (SRTP). The authenticity of the media participants is established in advance by the SIP signaling session, which specifies the address ports to be used to exchange media packets. Strong AES encryption is typically used for the cipher in conjunction with any of these protocols for key management:

Security Description (SDES) - The endpoints negotiate unicast media encryption key material with a simple exchange contained within the Session Description Protocol (SDP) messages sent during session set up. It enables the use of any of the three cryptographic transforms available in SRTP. SDES assumes the signaling session is secured by SIP TLS or other means.

MIKEY -  Similar to SDES, endpoints negotiate encryption key material with parameters sent within SDP messages. MIKEY is more flexible than SDES, with support for multiple key sharing methods that support unicast and multicast session types. Methods include pre shared keys, Diffie-Hellman and public key encryption. Unlike SDES, MIKEY parameters are secured.

Datagram TLS (DTLS) - The endpoints establish a DTLS channel directly between media ports in order to negotiate keys. In contrast to SDES and MIKEY, there are no intervening signaling servers involved in key negotiation that could possibly compromise security. Note, the endpoint must mux/demux DTLS and SRTP packets when this method is used and a PKI infrastructure is required.

ZRTP - The endpoints negotiate media encryption keys directly using RTP packets between media ports and Diffie-Hellman shared secret methods. Like DTLS, this has the advantage of producing keys independent of the SIP signaling session. In addition, ZRTP eliminates the need for PKI and packet mux/demux.

It’s worth noting that media sessions typically follow fewer hops than signaling sessions. In fact, media may flow directly between endpoints without traversing any intervening servers. In this case, media sessions can be more secure than signaling sessions.

When developers want to leverage the internet to connect communications clients and servers, they have two strong options available to make the sessions secure. VPN tunneling connects the two network domains and protects the entire session with IPsec encryption, while SIP TLS and SRTP secure application layer protocols use AES to protect signaling and media.

Voximplant Security Features

As a leader in the CPaaS industry, Voximplant offers flexible, secure internet access to all its SIP communications products. We make it easy for developers to create innovative, high quality voice and video services that are widely accessible via internet transport while mitigating security risks. 

Developers can integrate Voximplant products directly into their application using secure application layer interfaces with support for a range of key negotiation protocols. We support SIP TLS and SRTP for signaling and media security. We offer a range of key negotiation protocols for SRTP media security for compatibility with existing infrastructure and SIP clients:

  • SDES
  • DTLS

Once your traffic hits our network, we protect it by assigning all incoming and outgoing streams to the same media server. This minimizes exposure of these streams when we need to decrypt media. All signalling is encrypted as it traverses our network. 

Protect Your Communications Applications

With a range of flexible and economical security solutions available, there is no reason for developers to hesitate leveraging the internet to deliver SIP communications services. Indeed, public internet-based voice and video communications are rapidly rising because high quality broadband services are nearly ubiquitous and threats are easily mitigated. 

Secure client/server communications protocols are available to protect SIP communications with the same strong authentication, integrity and privacy features that protect HTTP web applications. By carefully applying these protocols, developers can enjoy the freedom to innovate and create delightful communications experiences for their end users without concern for potential threats.