Rate this page:

How to use SSML

Speech Synthesis Markup Language (SSML) is used to control aspects of speech such as pronunciation, volume, pitch, rate, etc. These aspects are specified in the ttsOptions of the Call.say and createTTSPlayer methods.

Available options

For now, there are 3 built-in attributes:

  • Pitch (voice pitch) with the following acceptable ranges: 1) the numbers followed by "Hz" from 0.5Hz to 2Hz 2) x-low, low, medium, high, x-high, default

  • Rate (speech speed) with the following possible values: x-slow, slow, medium, fast, x-fast, default.

  • Volume (speech volume) with the possible values: silent, x-soft, soft, medium, loud, x-loud, default.

If you want to set one of them for the whole text in the call.say method, you don’t have to use the speak tag, just specify the ttsOptions:

ttsOptions

ttsOptions

If you want to use other attributes for the whole text or a part of it, specify the speak tag manually. Please note that the lists of the supported tags and attributes depend on the language providers. You can find these lists on their official websites. For unsupported combinations the PlaybackFinished event will be triggered with error 400.

For example, if we choose Amazon, we have to use the prosody tag to control volume, rate, or pitch of the selected text fragment. Here is how we make this fragment sound higher:

prosody tag for pitch

prosody tag for pitch

And the same goes for some other attributes we haven’t mentioned, they are supported by some other SSML tags:

say-as tag for interpret-as

say-as tag for interpret-as