For now, there are 3 built-in attributes:
Pitch (voice pitch) with the following acceptable ranges: 1) the numbers followed by "Hz" from 0.5Hz to 2Hz 2) x-low, low, medium, high, x-high, default
Rate (speech speed) with the following possible values: x-slow, slow, medium, fast, x-fast, default.
Volume (speech volume) with the possible values: silent, x-soft, soft, medium, loud, x-loud, default.
If you want to set one of them for the whole text in the call.say method, you don’t have to use the
speak tag, just specify the ttsOptions:
If you want to use other attributes for the whole text or a part of it, specify the
speak tag manually. Please note that the lists of the supported tags and attributes depend on the language providers. You can find these lists on their official websites. For unsupported combinations the PlaybackFinished event will be triggered with error 400.
For example, if we choose Amazon, we have to use the
prosody tag to control volume, rate, or pitch of the selected text fragment. Here is how we make this fragment sound higher:
And the same goes for some other attributes we haven’t mentioned, they are supported by some other SSML tags: