High quality Speech Recognition is now available

2016-11-11 21:27:20
91401
0

RecognitionHi everyone! We've been busy working on new cool features for you, many developers asked us about high quality speech recognition for both audio call records transcription and real-time recognition scenarios. We are happy to announce that this functionality is now available for all our developers.

Key benefits to developers:

  • Call recording transcription. Voximplant developers can produce high-fidelity transcripts of their calls.
  • Real-time voice recognition for smart interactive voice response (IVR) scenarios. Customers can now make use of IVR systems offering computerized voice interactions through the Voximplant platform.
  • Rapid speech recognition and processing. Fast recognition reduces delays between when a person stops speaking and when the software receives the speech as text in order to act upon it.
  • Superior end-of-speech detection. Accurate end-of-speech detection enables more effective parsing of conversations for both transcription and software processing purpose.
  • Robust language support. More than 80 languages are now supported.

We use Google Speech API to bring this new functionality and we're eager to see new exciting communication applications and scenarios Voximplant developers will create using it. You can find the examples below.

How to use

Use the following code if you want to make a call record that should be transcribed into text:

require(Modules.ASR);
// Record two-way audio (stereo is optional) and transcribe it after
// recording is stopped
call.record({
  language: ASRProfileList.Google.en_US,
  transcribe: true,
  stereo: true
});

Use the following code if you want to build IVR with real-time recognition of some words/phrases from the specified array:

require(Modules.ASR);
//...
call.say("Choose your color", { "language": VoiceList.Amazon.en_US_Joanna });
call.addEventListener(CallEvents.PlaybackFinished, () => {
  call.sendMediaTo(asr);
});
//...
const asr = VoxEngine.createASR(
  ASRProfileList.Google.en_US,
  ["Yellow", "Green", "Red", "Blue", "White", "Black"]);
asr.addEventListener(ASREvents.Result, e => {
  if (e.confidence > 0) {
    call.say(
      `You have chosen ${e.text} color, confidence is ${e.confidence}`,
      { "language": VoiceList.Amazon.en_US_Joanna });
  }
  else {
    call.say("Couldn't recognize your answer",
      { "language": VoiceList.Amazon.en_US_Joanna });
  ]
});
asr.addEventListener(ASREvents.SpeechCaptured, () => {
   all.stopMediaTo(asr);
});
//...

Use the following code if you want to recognize call audio on-the-fly (streaming mode):

require(Modules.ASR);
let full_result = "";
let ts = null;

//..
call.say("Please start saying someting", { "language": VoiceList.Amazon.en_US_Joanna });
call.addEventListener(CallEvents.PlaybackFinished, () => {
  call.sendMediaTo(asr);
});
//...
// Removing the dictionary to use freeform recognition
const asr = VoxEngine.createASR(ASRProfileList.Google.en_US);
asr.addEventListener(ASREvents.Result, e => {
  // Recognition results arrive here
  full_result += (e.text + " ");
  // If CaptureStarted won't be fired in 5 seconds then stop recognition
  ts = setTimeout(() => asr.stop(), 5000);
});
asr.addEventListener(ASREvents.SpeechCaptured, () => {
  // After speech has been captured - don't stop sending media to ASR
  // call.stopMediaTo(asr);
});
asr.addEventListener(ASREvents.CaptureStarted, () => {
  // Clear timeout if CaptureStarted has been fired
  clearTimeout(ts);
});
//...

There is one important thing related to the streaming mode - CaptureStarted event can happen because of some background noise, so it's a good idea to use our voice activity detection (VAD) in addition to ASR:

call.handleMicStatus(true);
call.addEventListener(CallEvents.MicStatusChange, e => {
  if (e.active) {
    // speech started
  } else {
    // speech ended
  }
});
Sign Up for a free Voximplant developer account or talk to our experts

Add your comment

Name*
Email*
Message

Your comment has been added and will be published after moderation.

Recommended posts

An Introduction to Selective Forwarding Units

An Introduction to Selective Forwarding Units

Adding peer-to-peer communications to an application is relatively straight-forward. Developers can leverage WebRTC APIs or a CPaaS service to quickly add real time voice and video to their web or mobile app. But, what if you want to hold a meeting with more than two people? How can you leverage powerful WebRTC APIs to build a multi party conferencing application?

What Are Voice Bots And How They Can Help You

What Are Voice Bots And How They Can Help You

Voice bots are AI-powered software that can understand natural language (NLU) and synthesize speech in order to converse with people. Voice bots can run automated call campaigns for various purposes, put data into your CRM and route calls to the appropriate agents.

Voximplant Kit vs Talkdesk: Comparing contact centers for the small and medium enterprise

Voximplant Kit vs Talkdesk: Comparing contact centers for the small and medium enterprise

Any contact center manager considering a new cloud communications solution needs to do their “due diligence” before choosing a provider, including those in small to medium enterprises. The stakes are high for SMEs because your needs are unique and there are significant differences in the available providers. Your decision not only affects your organization’s budget, but also its business processes, customer experiences, and agent work environment.