In our industry of real-time cloud communications, we’ve kept a close eye on the active development of voice bots that speak natural language with people. Today, these technologies are widely used by large businesses - banks, airlines, online stores - to communicate with customers. In this article, we’ll look at what has changed since the first assistant appeared eight years ago.

Popular Assistants:

Siri.  In 2011, Apple’s assistant, integrated into the iPhone 4S, received widespread distribution. At the time, Siri was able to interact with 12 applications, including reminders, weather, maps, email, and calendar scheduling. Subsequently, the list of skills has expanded and Siri became a smart device across all Apple products.

Nuance’s breakthrough in Siri’s speech recognition technology has spurred the development of the voice assistant industry.


Google Assistant. In 2012, the Google Now service appeared, which four years later transitioned to the Google Assistant and Google Home. The company wanted a bot that could be used to search for information on the Internet that worked with its own cloud services, and eventually,  with the Android operating system.


Cortana. Microsoft’s Personal Computer Assistant entered the market in 2013, and later, the technology was distributed via Xbox.


Alexa. Amazon released their smart assistant in 2014, heading for individual devices with integrated Alexa. The company, a world leader in e-commerce, has allowed people to buy products more quickly and easily with a bot.

 

Alice. The Russian development from Yandex was introduced in October 2017. First, Alice was added to the browser and last summer, she became the “brain” of Yandex.Stations.


None of the major industry players wanted to lag behind, seeing great potential in voice assistant technology.

Development: Software and Hardware

In the first case, we are talking about skills built into the vendor or added by other services through the open API. With the addition of external developers, the list of skills done by the bot is growing significantly while the ecosystem expands. For example, integrating the Spotify music app with Google Assistant no longer limits you to just Google Music tracks.


In addition, work is constantly ongoing to improve the quality of synthesis and speech recognition, as well as to identify different voices. The latter is useful for smart speakers: for example, I can ask Google Assistant to turn on my favorite music, and the conditional Google Home will open my playlist, and not that of another family member’s.

The second direction of development is hardware. Initially, voice assistants appeared on smartphones. Then, the bots began to work with personal computers and, finally, speakers. The latter have become a necessary link in the ecosystem of a smart home - an interface that allows you to manage other devices. The Forrester study demonstrates the direct influence that the spread of smart speakers has on the growth of the IOT sector as a whole.

The Benefits of Voice Assistants

  • Supporting natural language: With an assistant, we can talk in much the same way as we do with each other.
  • Convenience factor: This is true only for situations with a small number of choices. For example, when ordering pizza, the bot will list several types. But it is obvious that no one would listen to a list consisting of over 50 options.
  • Situational awareness:. The assistant takes into account a lot of data to improve the quality of work, including information that it already knows about you. Google is a leader thanks to the array of user data that the assistant has available. Amazon knows your shopping history so Alexa can place new orders based on previous order history
  • Teachable: An assistant can always be trained in additional skills, within reasonable limits.

Cons of Voice Assistants

  • Background noise: This is especially true for noisy rooms or situations when several people speak at the same time. These problems are solved at the levels of both software and hardware. For example, smart speakers are equipped with a minimum of 4-5 microphones, one of which filters noise and the other,  echo.
  • Artificial voices: SFive years ago, the voice in IVR was mechanical, but now thanks to machine learning and neural networks, a significant breakthrough is taking place. Today, WaveNet technology from Google allows you to train the bot using voice recordings from a living person, making the synthesis almost indistinguishable from natural speech with its pauses, intonations, breaths, and exhalations.
  • Unnatural communication: It’s not possible to interrupt the bot or add words with gestures. The conditional Alice accepts the request, recognizes it, and tries to complete the task, but if at this time the assistant is loaded with new information, the bot can get confused. 
  • WiFi reliance: Today, the Internet is almost everywhere, but in its absence, a smart assistant becomes rather dumb, working only with local data.
  • Limited features: Almost all developers associated with voice artificial intelligence are now working on expanding the skills of assistants.

 

Further Concerns


The issue of privacy of information remains unresolved. Smart devices record our speech and surrounding sounds, and law enforcement agencies may be interested in this data. At the moment, there is no generally accepted practice on how the owner of the databases should behave when approached by authorities: for example, Amazon, in some cases, refused to provide information and in others, chose to cooperate.

Future Developments

  • Deeper integration of an assistant within a smart home.
  • Proactivity. The assistant’s potential ability to contact you independently. Most likely, it will be optional but if the bot is smart, why shouldn’t it start the conversation first?
  • Definition of emotions. For example, if the user speaks annoyingly, the assistant will be able to change the algorithm of work. These developments will be actively used both in the case of voice assistants and in B2B cases - namely, for virtual call center operators.
  • Development of neural networks and models. This is an ongoing process that opens up even greater prospects for the use of assistants.
  • Visualization. When the voice interface is inconvenient - for example, when choosing from a variety of positions - a request for visual accompaniment arises. The question of how exactly this will be implemented is still open. Most likely, in the future we will see a hologram of an assistant or a smart screen.