TTS output optimization

 

Here are a few tips to help you optimise and fine tune your voice output.

Read the instructions, hear the voice samples, apply to your content, enjoy the audio result!

1. Add Pauses to finetune intonation and rythm of the generated output
2. Combine speed and pauses tags to make the important information stand out
3. Use alternative selection tags to finetune the default output according to your expectations
4. Make sure you are using the right format (for hours, date, etc.)
5. Use the prononciation editor to create a specific entry of a word such as a proper name.
6. Use alternative transcriptions – allophone- for full satisfaction

-

1. Add Pauses to finetune intonation and rythm of the generated output

An efficient way to improve the output of a TTS is to tune your text with pauses in order to modify the intonation and/or the rhythm of the generated output.

Let’s take the following example:
-

You wish to talk with a counselor concerning dental, optical or hospital reimbursements, press 2.

-

-

Pauses can be inserted in different ways:

The first one is simply the use of punctuation marks. This will automatically include pauses where you put a punctuation mark.

You wish to talk with a counselor, concerning dental, optical, or hospital reimbursements, press 2.

-


-

A potential problem of punctuation marks is that the duration of the pause could be too long. Another way is to insert a \pau=XXXX\ tag instead of a punctuation.

-

You wish to talk with a counselor \pau=100\ concerning dental, optical \pau=50\  or hospital reimbursements, press 2.

-

-

Punctuation marks not only introduce a pause but they also locally change the intonation of the sentence. A comma causes a rising intonation, a full stop a downward one.

-
You wish to talk with a counselor \pau=100\, concerning dental, optical or hospital reimbursements, press 2.

-

-
You wish to talk with a counselor \pau=100\. concerning dental, optical or hospital reimbursements, press 2.

-

-

- -

2. Combine speed and pauses tags to make the important information stand out

When you create a message with a TTS, some parts of the message contain the relevant information that has to be understood. The relative speed tag (\rspd=XXX\) combined with a pause tag (\pau=XXX\) is a good way to make the important information stand out.

Please call 911 monday through friday from 9 AM to 8 PM.

-

 

Please \pau=200\ \rspd=80\ call 911 \rspd=100\ \pau=200\ monday through friday from 9 AM to 8 PM.

-

-

Please \pau=200\ \rspd=80\  call 911 \rspd=100\ \pau=200\ monday through friday \pau=300\ from 9 AM to 8 PM.


-
-’

When you use the \rspd tag, don’t forget to close it when it’s no longer needed. To close it use \rspd=100\.

-

3. Use alternative selection tags to finetune the default output according to your expectations

When the default output of the TTS does not completely match your expectations, you can get alternative outputs by using the alternative selection tag. This gives you the opportunity to get different output for the same words, group of words or sentences. This tag has to be used before each word you would like to get in a different way.

Please hold on for more information.

-

-

Please hold on for more \sel=alt2\ information.

-

-

\sel=alt1\ Please hold on for more \sel=alt2\ information.

-

-

\sel=alt20\ Please hold \sel=alt20\ on for more \sel=alt20\ information.

-

-

4. Make sure you are using the right format (for hours, date, etc.)

An important thing to keep in mind when you are using a TTS system is to keep in mind the formats that are accepted by the system for different kinds of information like hours, date, numbers … Those can be found in the language manual.

Here are some examples of time formats: Time

  • 2:20  or  10:20
  • 2:40 AM  or  10:40 AM
  • 2:40 PM  or  10:40 PM
  • 2.40 AM  or  10.40 AM
  • 2.40 PM  or  10.40 PM
  • 10:00  -> ten o’clock
  • 2:00 AM  -> two AM
  • 2:20:45 or  2:20’45″ -> two twenty and forty-five seconds
  • 3-4 PM    -> three to four PM

-

5.Use the prononciation editor to create a specific entry of a word such as a proper name.

A typical issue you meet when using TTS is the wrong pronunciation of a word.

Most of the time this occurs on proper names. Indeed, proper names often do not follow standard pronunciation rules.

The best way to solve this kind of problem is to use the pronunciation editor and to create an entry in the user lexicon with the proper name and the appropriate phonetic transcription.

A phonetic tag could also be used if the pronunciation needs to be changed locally only. The different phonetic alphabets can be found in the language manual.

-

6. Use alternative transcriptions – allophone- for full satisfaction

Sometimes the official transcription of a word does not give full satisfaction. Using alternative transcriptions constructed with the use of ‘allophones’ can be helpful.

Here is a set of examples of phoneme replacements for American English.

Normally, /t/, /p/, /k/ are aspirated if followed by an accented vowel.  This is not always the case, but forcing aspiration can change the pronunciation.

t => t_h

They outweigh you.

-

-

They \prx= aU t_h w EI1\ you.

-

-

p => p_h

The hurricane uprooted the trees.

-

-

The hurricane \prx= V p_h r u1 t @ d\ the trees.

-

-

k => k_h

The democrats voted today.

-

-

The \prx= d E1 m @ k_h r { t s\ voted today.

-

-

“Flapping” is a reduction of /t/ frequent in American English, mainly between stressed and unstressed vowels.  It can be changed to a /t/ (sounds a bit more British).

4 => t

The city comes to life.

-

-

The \prx= s I1 (t) i\ comes to life.

-

-

In other cases, a /t/ can be replaced by a flap

t => 4

Autism is a serious handicap.

-

-

\prx= O1 4 I z @ m\ is a serious handicap.

-

-

A /t/ in American English can also be “swallowed” into a glottal stop.  Which in turn can be replaced by a flap.

? => t

Clinton was president of the United States.
-

-
\prx= k l I1 n t @ n\ was president of the United States.

-

-

? => 4

Climb up the mountaintop.
-

-
Climb up the \prx= m aU1 n 4 n= n t O1 p\.
-

-
Climb up the \prx= m aU1 n t n= n t O1 p\.

-

-

A user can enhance the /N/ sound by adding /g/ after it.

N  => N g

Camping is fun.
-

-
\prx= k {1 m p I N g \ is fun.
-

-

Simple replacements:

tS => t S

I like chatting with you.
-

-
I like \prx= t S {1 t I N\ with you.
-

-

dZ  => d Z

He’ll join the army.
-

-
He’ll \prx= d Z OI1 n\ the army.
-

-

T =>D   or   D => T

A nice toothy grin.
-

-
A nice \prx= t u1 D i\ grin.
-

-
or
The smooth surface.
-

-
The \prx= s m u1 T\ surface.
-

-

j => i

señor.
-

-
\prx= s i n i O1 r\.
-

-

r=  => @ r  or    @ r => r=

greater.
-

-
\prx= g r EI1 4 @ r\.
-

-
or
generation.
-

-
\prx= dZ E n r= EI1 S @ n\.
-

-

O => A    or   A => O

sorry.
-

-
\prx= s A1 r i\.
-

-
or
swat team.
-

-
\prx= s w O1 t \ team.
-

-

i => I   or   I => i

city traffic.
-

-
\prx= s I1 4 I\ traffic.
-

-
\prx= s i1 4 i\ traffic.
-

-

u => w

That’s wasting time.
-

-
That’s \prx= u EI1 s t I N\ time.
-

-

V => @   or   @ => V

It’s in my eardrum.
-

-
It’s in my \prx= I1 r d r @ m\.
-

-
or
Don’t dramatize.
-

-
Don’t \prx= d r {1 m V t AI z \.
-

-

U => u   or   u => U

He had a good education.
-

-
He had a good \prx= E dZ u k EI1 S @ n\.
-

-

or

The room is big.
-

-
The \prx= r U1 m \ is big.
-

-

{ => E

I had to go.
-

-
I \prx= h E d\ to go.
-

-

EI => E j or E i

He hit pay-dirt.
-

-
He hit \prx= p E1 j \ dirt.
-

-
He hit \prx= p E1 i \ dirt.
-

-

AI =>    A j or A i

The typhoon hit.
-

-
The \prx= t A j f u1 n\ hit.
-

-
The \prx= t A i f u1 n\ hit.
-

-

OI =>     O j or O i

He heard a strange noise there.
-

-
He heard a strange \prx= n O1 j z\ there.
-

-
He heard a strange \prx= n O1 i z\ there.
-

-

aU =>    { w or { u

The mouse ran.
-

-
The \prx= m {1 w s\ ran.
-

-
The \prx= m {1 u s\ ran.
-

-

l=  =>  @ l

The battle ground.
-

-
The \prx= b {1 4 @ l\ ground.
-

-

n= => @ n

The fountain sang.
-

-
The \prx= f aU1 n ? @ n\ sang.
-

-

print  print