Voximplant has advanced speech-to-text processing implemented, but our platform also enables you to interact with external recognition services via WebSocket. Learn how to add such interaction to your Voximplant scenarios.
Voximplant JS Scenario
We are going to create an outgoing WebSocket connection, send an audio stream through it, and transcribe speech with the help of Google Cloud Speech-to-Text API, a 3rd party service for demonstration purposes.
Log into your account at https://manage.voximplant.com/auth. On the left menu, select Applications, click New application and create one. Let’s name it websocket. After that, go to your new application, switch to the Scenarios tab, and create a scenario containing the following code. This VoxEngine scenario sends an audio stream to the WebSocket and listens to the WebSocket events (ERROR, CLOSE, OPEN, MESSAGE). Pay attention to line 7, where you have to substitute your backend URL:
Finally, create a Rule to and a User for this application.
In order to handle an audio stream sent from Voximplant, you'll need a backend server. The server is supposed to process incoming messages, parse them, and put base64 audio data to a Google Cloud client library instance for transcription. We suggest a Node.js server, but you're free to use any other programming language.
Before implementing a Node.js server, make sure that you've executed the following command in your terminal:
Then create an empty JS file and put the following code in there:
You have to obtain and provide your service account credentials to connect the Google library to its backend. To do this, go to the Google Authentication page and complete all the steps listed there. Next, run this export command in the same workspace (the same Terminal tab) before executing
Lastly, you need to run your backend server (
node your_file_name.js) and tunnel it to a public URL, e.g., by using ngrok. Pay attention to the generated public URL, this is the value to use in our scenario:
You can log in to https://phone.voximplant.com/, fill out the form, and click Sign in. Then click Call and start talking, you'll see the transcription results in your Terminal window in real time.
If you're interested in how the code shown in this article works in depth, you're welcome to check the WebSocket section of our documentation.