Recently, we published a blog post describing why WebSockets are great for real-time services. In this article, we describe the process of establishing, maintaining and closing the WebSockets connection. 

WebSocket is a protocol that provides low-latency data exchange between client and server. The WebSocket protocol maintains persistent two-way communication between parties. This means that a client can receive and send data to the server at any time during the session and vice-versa. In contrast to the HTTP protocol, no extra requests are required.

The WebSocket protocol is great for services that require real-time communications such as natural language processing and chats.

WebSocket Connection

The WebSocket protocol workflow consists of two stages: opening handshake and data transfer. These parts are logically separated from each other. 

Opening Handshake

To establish a new WebSocket connection, parties use the request/response HTTP approach. The connection starts when the client sends a request to the browser. This is what it looks like:

GET/chat

Host: www.voximplant.com

Upgrade: websocket

Connection: Upgrade

Sec-WebSocket-Key: Iv7io/9s+lYFgYBcXczP8Q==

Origin: www.voximplant.com

Sec-WebSocket-Version: 13
  • Upgrade: websocket – describes switching to the WebSocket protocol 
  • Sec-WebSocket-Key: Iv7io/9s+lYFgYBcXczP8Q== – a 16-bit randomly generated code used to protect against fake requests
  • Sec-WebSocket-Version: 13 – provides the current WebSockets version
  • Origin – the origin of the client page

The server also responds in an HTTP-like way:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: hsBlbuDTkk24srzEOTBUlZAlC2g=
  • HTTP/1.1 101 Switching Protocols – a protocol switching code
  • Sec-WebSocket-Accept: hsBlbuDTkk24srzEOTBUlZAlC2g= – a security key which is generated based on the client key using the RFC 6455 algorithm

After that, HTTP is no longer used. WebSockets has established a TCP-based persistent connection for data transfers.

Data Transfer

In WebSockets, data is transferred in frames – data fragments – using the TCP connection. To minimize buffering on large data transfers, WebSockets fragments the message into several small frames.

WebSockets support three types of frames:

  1. Text frames. Contain UTF-8 encoded text data that parties send to each other
  2. Binary data frames. Contain binary data 
  3. Control frames. Used to maintain and close the connection

How Connections are Maintained

To avoid connection timeouts and prevent proxies from closing connections, WebSockets can send ‘ping/pong’ requests. The server periodically sends a 'ping' control frame and the client responds with a 'pong' control frame.

The ping/pong approach resolves two problems:

  1. It makes sure the second party is still functioning
  2. It maintains a TCP connection if the client and server are not directly connected, but through a proxy server

Connection Close

To close the WebSocket connection, parties exchange the ‘close’ type of control frames. Either the server or the client can close the connection. 

The client can send the ‘close’ request to the server. In turn, the server sends the same request and breaks the connection.

Try WebSockets with Voximplant

Voximplant simplifies app development with the WebSocket API that is located in the cloud. When your app calls a third-party service, our cloud opens the WebSockets connection and sends and receives data. You can build and scale your real-time app without the need to provision servers or monitor capacity utilization.

Here are the instructions you can use to create inbound and outbound WebSockets connections. For instance, you can add speech synthesis and recognition to your bots using the WebSocket API.

Create a developer account and try out our platform for free.