New article about Voximplant by Deborah Balshem, Mergermarket
Communications are an integral part of our life: we communicate with each other, with businesses, and we are constantly online so many technologies are familiar to us. Today, we’ll talk about things that have not yet become everyday, but are already real and successfully implemented by large companies. My speech will be devoted to advanced technologies that are only just emerging, and what I think will become trends over the next few years.
Trend #1: Machine Learning
The term "machine learning" is known to everyone and continues to gain popularity. For example, at the Google Cloud Next ’18 conference in San Francisco, Google devoted all releases to ML: in one form or another, it was a part of every service presented. Domestic companies are also not far behind with this trend: at the beginning of this year, the Dialogflow Connector, which allows you to connect text assistants to telephony appeared on the market. Continuing on the topic of machine learning, it is worth noting the development of services for determining voicemail and classifying dialogs.
Everyone has been waiting for VMD in the market for a long time: when you make automatic calls to your customers and get their voicemail, the answering machine will take the message but who checks the recording later on? Trained on several tens of thousands of real recordings, the detector can ensure the accuracy of voicemail recognition in 99% of cases within two seconds or less. This allows businesses to significantly reduce the cost of notifying customers by only paying for real calls.
If we talk about case studies in the automotive, real estate, and medical fields, there are several to choose from just using the Voximplant platform. In the auto business, we managed to single out a number of categories by which calls can be broken down: recording for a test drive, credit, trade-ins, car models, first conversion, or suitability of service. Thanks to the classification of dialogs, a business can quickly find the necessary conversations in its own CRM, which displays who called and when, what the query was, and whether it was a relevant call. Businesses also have access to the recording or transcription. Previously, full-text search was used for these purposes but the trained model understands many more nuances in the dialogue.
How is it done? First, you need to translate the voice into text - either through automatic transcription or in manual mode. Then, markup occurs: part of the dialog is highlighted and its theme is determined. Finally, this data is loaded into the model, which is learned and begins to independently determine the topic of the dialogue.
The modern information field is saturated with cases of voice and text assistants.
I’ll briefly talk about the mechanics: the recognized text can be sent to the backend, where thanks to the NLU, the intent of the request is processed, and then the answer is returned again in text form and voiced by speech synthesis. I’ll also mention Dialogflow from the pool of modern technologies. This process has significantly accelerated since Google has implemented technologies for fast data transfer.
Trend #2: Video Conferencing and Streaming
This year, we visited the Google Launchpad accelerator in San Francisco where there were many interesting speakers, including Vint Cerf, one of the founders of the Internet. After an excellent lecture, the audience asked, “Vint, NASA is working on the interplanetary Internet, and video conferencing still somehow doesn’t work very well here on Earth. Why is that?”
“The Internet is at the very beginning of its development, so I’m sure that we will solve this problem,” said Vint Cerf.
So what makes video conferencing so complex?
When multiple people participate in a conference, a certain bandwidth is needed on the servers. When video transmission quality rises, including resolution, more bandwidth is required. You won’t believe it but even now, the infrastructure doesn’t keep pace with the implementation of “marketing ideas” like 4K video. The bandwidth required for such video conferencing is huge and not all operators are ready for such a “revolution” from an equipment standpoint.
Different Internet quality among participants
Some participants communicate via Wi-Fi, some have 3G, and others have broadband; the technologies on the client’s server must quickly deal with the situation but at this moment, it remains a problem. This is somewhat leveled by new technologies that enable simultaneous streaming where one video is sent to one participant in good quality and the other in poorer quality but this “way out” is not a cure-all.
Video Codec Licenses
Videos pass through the network in a compressed form; for correct playback, a video codec is required - the latter up to a certain point required deductions in favor of copyright holders. There was a similar situation with the transfer of audio files, but then open codecs appeared, including OPUS, which helped all cases from telephony to playing music.
What happens to the video? Large companies such as Google, Microsoft, Apple, IBM, and another 8-10 giants gathered and organized an alliance, which is largely the owner of a patent pool for any technology related to the compression method. They decided that it was time to do the same with video as with audio; that is, create a free open codec that would qualitatively satisfy our needs. This happened in 2015 but since video codecs are a very complex technology that is difficult to make and even harder to put on the market, their implementation in software, hardware, and real applications will only be widely available in a couple of years.
By 2020, a surge in the development of video technology, video conferencing, and streaming is expected. The active development of the mobile Internet is also contributing to this: US and European operators have already begun to introduce 5G, which gives bandwidth up to 10 GB and an almost zero delay.
To conclude, I’ll talk about the emergence of new technology in the domestic telecom market. In addition to AI and ML, based on the Voximplant platform, we managed to launch SMS and bot API, as well as the outgoing call editor, Smartcalls.
- SMS API - multichannel service still requires sending messages in the "classic" way, although messaging is only increasing in popularity as a form of communication.
- Bot API - the ability to combine communication with the bot in several channels at once, voice and text. If you started talking with the bot on the phone, and got too frustrated to continue the conversation, you could switch to the messenger and continue the conversation.
- Smartcalls - the ability for people without a technical background to build a call scenario in the visual editor, upload a contact list, and see the results of automated calling in their CRM system.
But this is not all the news for today: thanks to the fact that we made the Dialogflow Connector, bypassing all similar solutions in the global market, we became the official technology partner of Google Cloud. This will allow us to access advanced technologies in the fields of machine learning and artificial intelligence and transmit them to you for simple and effective use by your company.