Voximplant now has speech synthesis support for Microsoft Azure Text-to-Speech (TTS). With these new additions, Voximplant developers can now choose from more than 425 voice options covering 40 languages and 60 dialects from 6 distinct speech synthesis engine providers. Like Voximplant’s other speech options, you can add these voices to telephony apps in seconds for automated prompts, IVR, virtual agent applications, and more. The new Azure TTS engine adds 116 total new voice options covering 35 languages and 49 unique dialects. These options include lifelike 36 neural options based on the latest deep learning technology, with several that offer even more advanced functionality with Microsoft’s proprietary speaking styles.
The variety of choices, with SSML support, gives Voximplant developers many options for choosing a voice that matches their brand and feature needs. Existing Azure Text-to-Speech users can quickly move their standard and neural TTS implementations to VoxEngine, Voximplant’s serverless platform for enabling a wide variety of telephony features.
Use the new Microsoft voices in seconds
Convenient access in VoxEngine
You can find the Microsoft Azure TTS voices under the VoiceList.Microsoft
reference map with Voximplant’s other supported voices inside VoxEngine. Neural voices can be found under VoiceList.Microsoft.Neural
. See here for the full list with their corresponding language codes. Use the voices with the language
object as part of sayOptions
parameter object with call.say or ttsPlayerOptions parameter object within the VoxEngine.сreateTTSPlayer method.
Using the new voices in your VoxEngine scripts is as simple as:
VoxEngine.createTTSPlayer("Hello, you have created a TTSPlayer, have fun!", {"language": VoiceList.Microsoft.Neural.en_GB_LibbyNeural});
See our TTS documentation for more details.
Voximplant Kit
Microsoft Speech Synthesis will come soon to Voximplant Kit. Keep a look out for Microsoft options in the Synth Language dropdown in the Text to Speech, Interactive Menu, and Dialogflow Connector blocks.
Pricing
Microsoft TTS is priced at $5 USD for one million characters for standard voices and $20 per one million characters for neural voices. Phrases are rounded up to the next highest 10 character increment.
As an example, let’s use the phrase “let's do some tests with Microsoft Text-to-Speech” spoken in a neural voice. This phrase contains 49 characters. This is rounded up to the next 10 character increment, so 50 characters are charged. Neural voices are priced at $20/1M chars, or $0.00002 per character, resulting in a total charge of $0.001 (50 * $0.00002).
All characters sent to the speech engine are included in the cost calculation, including unspoken SSML tags. Chinese, Japanese, and Korean characters count as two characters.
See our pricing page for more pricing information.
Advanced Microsoft Speech Synthesis Options
SSML Support with Multi-voice capabilities
Speech Synthesis Markup Language (SSML) is supported on all Microsoft voices. Microsoft TTS also supports switching voices, even with different languages, in the same SSML statement. This is helpful for multilingual responses and for simulating multiple virtual agents using the same command.
Adjust speaking styles
In addition, the following neural voices support distinct speaking styles, allowing you a quick way to adjust the way the voice sounds for specific circumstances, such as customer service or sounding empathetic:
Voice | Style | Description |
en-US-AriaNeural | style="newscast-formal" | A formal, confident and authoritative tone for news delivery |
style="newscast-casual" | A versatile and casual tone for general news delivery | |
style="customerservice" | Expresses a friendly and helpful tone for customer support | |
style="chat" | Expresses a casual and relaxed tone | |
style="cheerful" | Expresses a positive and happy tone | |
style="empathetic" | Expresses a sense of caring and understanding | |
zh-CN-XiaoxiaoNeural | style="newscast" | Expresses a formal and professional tone for narrating news |
style="customerservice" | Expresses a friendly and helpful tone for customer support | |
style="assistant" | Expresses a warm and relaxed tone for digital assistants | |
style="lyrical" | Expresses emotions in a melodic and sentimental way | |
zh-CN-YunyangNeural | style="customerservice" | Expresses a friendly and helpful tone for customer support |
See Voximplant’s SSML HowTo and Microsoft’s Azure Speech services SSML support page for full usage details.
SSML Creation Tool
Microsoft Speech Studio is a visual tool for adjusting voice parameters and outputting SSML including tools for setting the speaking style and adjusting intonation. Just cut and paste the <speak>..</speak>
section of the SSML output as your TTS phrase in VoxEngine.
Learn more about this tool on Microsoft’s Speech Studio page.
VoxEngine Example
The following is an example VoxEngine script showing different Microsoft voices. phrases
is taken from the Speech Studio example shown above. Note the use of different voices and speaking styles in the same SSML statement:
// Microsoft TTS Example
function onCallConnected(e) {
let phrases = [];
phrases[0] = "let's do some tests with Microsoft Text-to-Speech";
phrases[1] = "Let's start without SSML";
phrases[2] = "This is the Microsoft Azure Text-to-Speech service that has been added to Voximplant. Just select one of the Microsoft voices from VoiceList to use these. Remember SSML is supported.";
// Exported from Microsoft Speech Studio
phrases[3] =
`<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US"><voice name="Microsoft Server Speech Text to Speech Voice (en-US, GuyNeural)">Now let's test Microsoft's SSML support.</voice><voice name="Microsoft Server Speech Text to Speech Voice (en-US, AriaNeural)"><mstts:express-as type="cheerful">This is the Microsoft Azure <prosody contour="(53%, -59%)">Text-to-Speech</prosody> service that has been added to Voximplant. </mstts:express-as><mstts:express-as type="newscast-formal">Just select one of the Microsoft voices from VoiceList to use these. </mstts:express-as><mstts:express-as type="customerservice">Remember <say-as interpret-as="spell">SSML</say-as> is supported.</mstts:express-as></voice></speak>`;
call.say(phrases[0], {"language": VoiceList.Microsoft.Neural.en_US_GuyNeural});
let i = 1;
call.addEventListener(CallEvents.PlaybackFinished, ()=>{
if(i === phrases.length){
VoxEngine.terminate();
return
}
call.say(phrases[i], {"language": VoiceList.Microsoft.Neural.en_US_AriaNeural});
i++;
});
}
VoxEngine.addEventListener(AppEvents.CallAlerting, e => {
call = e.call
call.addEventListener(CallEvents.Connected, onCallConnected)
call.addEventListener(CallEvents.Disconnected, VoxEngine.terminate)
call.answer()
})
Learn More
Ready to add speech synthesis to your telephony application? Check out our documentation here, sign-up for an account here, or contact-us now to discuss your needs.