SIGN UP

WebSocket Introduction

WebSocket Introduction

WebSocket is an advanced standard for full-duplex (two-way) communication between a client and a third-party service in real-time. It is used to organize continuous data exchange while keeping the connection open. No extra HTTP-requests needed.

And we’re happy to tell you that all this became possible in Voximplant due to WebSocket – a new VoxEngine module. From this moment on, you’ll be able to send text and audio using all the WebSocket advantages to the fullest. In other words, you have one more tool to boost your app.

In this guide, you’ll get to know how to make an outgoing WebSocket connection, send an audio stream through it, and transcribe your speech with the help of Google Cloud Speech-to-Text API.

Pay attention

Voximplant has built-in functionality for real-time speech-to-text which is operated by the ASR module. The module uses features of Google and other major vendors. Please check our Speech-to-text how-to for additional info.

The current article is dedicated to a case when you want to use another speech-to-text vendor and/or spend credits from its account instead of Voximplant's one.

To begin with, let's learn what types of WebSocket connections exist and how to establish them.

There are two types available:

  • outgoing
  • incoming

Outgoing

The first thing you should do to make an outgoing connection is to start a VoxEngine scenario. Next, create a WebSocket object using the VoxEngine.createWebSocket method. It accepts 2 parameters: URL in the format of 'wss: // + domain + path' and protocols (optional). Here is how it works:

VoxEngine.addEventListener(AppEvents.CallAlerting, function(e) {
    const webSocket = VoxEngine.createWebSocket( /*url*/ "wss://your_link/");
    // You can handle an outgoing WebSocket connection here
});

If everything is ok, the call.sendMediaTo method will start sending an audio stream to the WebSocket. The WebSocket.send method, in turn, will be sending a decoded audio stream in JSON format via the WebSocket. Thus, you’ll be getting messages from the service handling your requests.

The WebSocket.close method will close the connection. Please note, that the connection can be closed both from the client-side and the server-side.

Incoming

To make incoming connections available, use the VoxEngine.allowWebSocketConnections method. Also, you need to subscribe to the AppEvents.WebSocket event. Now you can accept an incoming connection and get a WebSocket object: event.WebSocket. Check the code below:

VoxEngine.allowWebSocketConnections();

VoxEngine.addEventListener(AppEvents.WebSocket, function(e) {
    // You can handle an incoming WebSocket connection here
});

An accessSecureURL to a WebSocket can be obtained from the HTTP session initialization request or directly from the AppEvents.Started event. Please note that 'https' should be changed to 'wss' in the URL.

The last steps are just the same as in the outgoing connection scheme.

WHAT YOU NEED

To implement the WebSocket and the speech recognition functionality in your app, you'll need:

  • Voximplant developer account. If you don’t have one, sign up here.
  • Voximplant application, JS scenario, rule, and user. These will be created during this tutorial.
  • Simple backend (we’ll run a node.js server) along with Cloud client library for Speech-to-Text API.
  • Web client to make a call (we’ll use our webphone at https://phone.voximplant.com/).

1. VOXIMPLANT APPLICATION SETTINGS

Let’s begin with the Voximplant side. Log into your account at https://manage.voximplant.com/auth. On the left menu, select Applications, click New application and create one. Let’s name it websocket. After that, go to your new application, switch to the Scenarios tab, and create a scenario containing the following code:

require(Modules.WebSocket);

VoxEngine.addEventListener(AppEvents.CallAlerting, function(e) {
    call = e.call;
    call.answer();

    const webSocket = VoxEngine.createWebSocket( /*url*/ "wss://your_ngrok_link/");

    webSocket.addEventListener(WebSocketEvents.ERROR, function(e) {
        Logger.write("LOG OUTGOING: WebSocketEvents.ERROR");
        call.sendMessage("LOG OUTGOING: WebSocketEvents.ERROR");
    });
    webSocket.addEventListener(WebSocketEvents.CLOSE, function(e) {
        Logger.write("LOG OUTGOING: WebSocketEvents.CLOSE: " + e.reason);
        call.sendMessage("LOG OUTGOING: WebSocketEvents.CLOSE: " + e.reason);
    });
    webSocket.addEventListener(WebSocketEvents.OPEN, function(e) {
        Logger.write("LOG OUTGOING: WebSocketEvents.OPEN");
        Logger.write(JSON.stringify(e))
        call.sendMessage("LOG OUTGOING: WebSocketEvents.OPEN");
    });
    webSocket.addEventListener(WebSocketEvents.MESSAGE, function(e) {
        Logger.write("LOG OUTGOING: WebSocketEvents.MESSAGE: " + e.text);
        call.sendMessage("LOG OUTGOING: WebSocketEvents.MESSAGE: " + e.text);
        if (e.text == "Hi there, I am a WebSocket server") {
            call.sendMediaTo(webSocket, {
                encoding: WebSocketAudioEncoding.ULAW,
                "tag": "MyAudioStream",
                "customParameters": {
                    "param1": "12345"
                }
            });
        }
    });

    call.addEventListener(CallEvents.Disconnected, function(e) {
        Logger.write("LOG OUTGOING: terminating in 1 sec");
        webSocket.close();
        setTimeout(VoxEngine.terminate, 1000);
    });
});

This VoxEngine scenario sends an audio stream to the WebSocket and listens to the WebSocket events (ERROR, CLOSE, OPEN, MESSAGE). We’ll go into the scenario’s details later.

As for now, let’s switch to the Routing tab of your websocket application and click New rule. Give it a name, assign your JS scenario to it, and leave the default call pattern ( .* ).

And the last but not the least thing is to create a user for the application. Switch to the Users tab, click Create user, set a username (e.g., socketUser) and password, and click Create. We’ll need this login-password pair to authenticate in the web client.

The configuration is ready, but first, let’s dive into how the WebSocket module works in our scenario.

2. SCENARIO DETAILS

The WebSocket permits developers to open a persistent connection and send data through it. To use the module, we have to mount it at the very beginning of the scenario:

require(Modules.WebSocket);

The createWebSocket method specifies URL and protocols (optional). You’ll learn how to obtain the URL to the WebSocket from the next section.

const webSocket = VoxEngine.createWebSocket( /*url*/ "wss://your_ngrok_link/");

After a WebSocket object has been created, we continue to manage the call inside a handler. In short, we send the call media to the WebSocket object using the call.sendMediaTo method.

Here you can set a preferred encoding format, a tag, and some custom parameters. If you don’t set an encoding, PCM8 will be selected by default.

We call this method as we get the WebSocketEvents.MESSAGE event that the connection is up. In our scenario, this code will look like this:

call.sendMediaTo(webSocket, {
    encoding: WebSocketAudioEncoding.ULAW,
    "tag": "MyAudioStream",
    "customParameters": {
        "param1": "12345"
    }
});

All the other WebSocket events that you see in the code are for debugging purposes. The appropriate handlers do nothing but write info to a session log. You are free to get rid of them if you want to.

Finally, we should add a proper handler for stream ending. In this case, the Voximplant session will terminate on the Disconnected event in 1 second after an established call terminates:

call.addEventListener(CallEvents.Disconnected, function(e) {
    Logger.write("LOG OUTGOING: terminating in 1 sec");
    webSocket.close();
    setTimeout(VoxEngine.terminate, 1000);
});

When the scenario logic is clear, we’re ready to move to the next, very important part of our sample.

3. BACKEND

First, make sure that you have Node.js installed on your computer. If you don’t, download it from here. Then, run the following commands one by one in your Terminal to set up the working environment:

npm install express
npm install ws
npm install @google-cloud/speech

And when it’s done, create an empty JS file and put the following code in there (learn the code nuances below):

const app = require('express')();
const http = require('http').createServer(app);
const WebSocket = require('ws');
const fs = require('fs');

const wss = new WebSocket.Server({
    server: http
});

// Import the Google Cloud client library
const speech = require('@google-cloud/speech');

// Create a client
const client = new speech.SpeechClient();

const config = {
    encoding: 'MULAW',
    sampleRateHertz: 8000,
    languageCode: 'en-US',
};

const request = {
    config,
    interimResults: true,
};

let audioInput = [];
let recognizeStream = null;
process.env["NODE_TLS_REJECT_UNAUTHORIZED"] = 0;

app.get('/', function(req, res) {
    res.send('<h1>Hello world</h1>');
});

wss.on('connection', (ws) => {
    // Create a writable stream
    var wstream = fs.createWriteStream('myBinaryFile');
    // Clear the current audioInput
    audioInput = [];
    // Initiate stream recognizing
    recognizeStream = client
        .streamingRecognize(request)
        .on('error', err => {
            ws.close();
        })
        .on('data', data => {
            ws.send(data.results[0].alternatives[0].transcript)
            process.stdout.write(
                data.results[0] && data.results[0].alternatives[0] ?
                `Transcription: ${data.results[0].alternatives[0].transcript}\n` :
                `\n\nError occurred, press Ctrl+C\n`
            )
        });

    ws.on('close', (message) => {
        console.log('The time limit for speech recognition has been reached. Please disconnect and call again.');
        wstream.end();
    })
    // Connection is up, let's add a simple event
    ws.on('message', (message) => {
        // Put base64 audio data to recognizeStream
        try {
            let data = JSON.parse(message)
            if (data.event == "media") {
                let b64data = data.media.payload;
                let buff = new Buffer.from(b64data, 'base64');
                recognizeStream.write(buff);
                wstream.write(buff);
            }
        } catch (err) {
            console.log(message)
        }
    });
    // Send a notification  
    ws.send('Hi there, I am a WebSocket server');
});

http.listen(3000, function() {
    console.log('listening on *:3000');
});

Now that we have the server all set up, it'll help us achieve our speech recognition goals. Test your solution locally by making a tunnel to your localhost 3000 with the help of ngrok.

To do that, follow these steps:

  1. Install ngrok following the instructions on its site.
  2. Specify your authtoken to ngrok so that your client is tied to this account.
  3. Run node your_file_name.js to start your server locally on localhost:3000.
  4. Go to the ngrok folder on your computer and run ./ngrok http 3000 to tunnel your running local server to a public URL.

Pay attention to the generated public URL, we use it as the WebSocket URL with the 'wss' prefix in our scenario:

4. SPEECH RECOGNITION

You’ve probably noticed that our backend code contains some lines related to Google Cloud.

The library itself is imported this way:

const speech = require('@google-cloud/speech');

Now you need to specify how to process the request. To do that, choose an encoding, sampleRateHertz, and languageCode in the config:

const config = {
    encoding: 'MULAW',
    sampleRateHertz: 8000,
    languageCode: 'en-US',
};

Then, create a new stream to be written into a binary file:

var wstream = fs.createWriteStream('myBinaryFile');

When everything is set up, you should parse the message and put base64 audio data to recognizeStream:

let data = JSON.parse(message)
if (data.event == "media") {
    b64data = data.media.payload;
    let buff = new Buffer.from(b64data, 'base64');
    recognizeStream.write(buff);
    wstream.write(buff);
}

Right after this, a recognition request will be initiated and therefore handled:

recognizeStream = client
    .streamingRecognize(request)
    .on('data', data => {
        ws.send(data.results[0].alternatives[0].transcript)
    });

Lastly, obtain and provide your service account credentials to connect the Google library to its backend. To do this, go to the Google Authentication page and complete all the steps listed there. Next, run this export command in the same workspace (the same Terminal tab) as the node your_file_name.js:

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"

HOW TO USE IT

Open https://phone.voximplant.com/, fill out this form, and click Sign in. The username and password refer to the user created in step 1:

After a successful login, click Call and start talking. Cloud Speech-to-Text will be turning your speech into text in real-time.

See the results in your Terminal window? They just rock!

Liked this post? We hope you’ll make great use of our new functionality. Have a great day and stay tuned!

Tags:voximplantvoxenginewebsocket
B6A24216-9891-45D1-9D1D-E7359CEB8282 Created with sketchtool.

Comments(0)

Add your comment

Please complete this field.

Recommended

Sign up for a free Voximplant developer account or talk to our experts
SIGN UP