Voice Tuning
Fine-tune the voice output, add voice smileys, sounds, exclamations and more to breathe life into your message!
WE ARE HIRING!
Join our team of talented people from very different backgrounds and areas of expertise, resulting in a rich cultural and multilingual environment. Looking for talents in France & Germany!
Search Acapela
Fine-tune the voice output, add voice smileys, sounds, exclamations and more to breathe life into your message!
While the main technology is Text-To-Speech (TTS), which converts any written text into an audio result using pleasant and natural HQ voices, other technologies such as Concept To Speech can also be used to optimize the audio result.
A wide palette of features is available to optimize the result of the vocalization.
Here are some examples of fine-tuning that can be easily done.
Add vocal smileys for enhanced expressiveness!
DiscoverNestle is pronounced \prn= n E1 s l EI \ when you’re talking about the Swiss brand.
Phonetic Tag
You can change \rspd=60\ the speed of the voice.
Speed Tag
You can make the voice \vct=90\ seem older, \vct=100\ or if you like, \vct=110\ younger as well.
Voice Shaping Tag
The speed tag can come in handy, as the \vct=110\ higher pitch increases the speed, so we can \rspd=80\ counter that effect using the speed tag.
Speed + VCT Tags
The spelling tag will say every single \rms=1\ letter \rms=0\ in the word.
Spelling Tag
The \rmw=1\ word by word \rmw=0\ tag speaks for itself.
Word by Word Tag
Sometimes a short pause \pau=300\can improve the voice output
Pause Tag
You can change the intonation of a word if you think it doesn’t \sel=alt\ sound right.
Alternative Selection Tag - 1
Alternative Selection Tag - 2
You can insert sounds, like, “Sending email to John” \aud=“pathway+filename”\
Audio Tag
“You have a new message from John Smith. Do you want Rod to read it?
\vce=speaker=Rod\ Hey Dave, Really Sorry, but I need to cancel our meeting this afternoon. I’ll call you later to reschedule. Cheers, John
\vce=speaker=Lily\ Would you like to respond?”
vcetag
Most of our voices include exclamations like: “Please try again!” Or “Goodbye!”
Exclamations
Many of our voices include what we call Voice Smileys, Like laughter #LAUGH01# , Or sneezing #SNEEZE01#
Click HERE to access to dedicated webpage
Voice Smileys Tag
A few of our voices come with additional emotional states, like sadness, or happiness.
For example, my friend Will, can be quite emotional. \vce=speaker=Will\
Hello, My name is Will. \vce=speaker=Will-Sad\
Sometimes I get a little bit down. #CRY01# \vce=speaker=Will-Happy\
But I can also be the life of the party! #LAUGH03#
Expressive Voices
An efficient way to improve the output of a TTS is to tune your text with pauses in order to modify the intonation and/or the rhythm of the generated output.
Let’s take the following example:
You wish to talk with a counselor concerning dental, optical or hospital reimbursements, press 2.
Pauses can be inserted in different ways:
The first one is simply the use of punctuation marks. This will automatically include pauses where you put a punctuation mark.
You wish to talk with a counselor, concerning dental, optical, or hospital reimbursements, press 2.
A potential problem of punctuation marks is that the duration of the pause could be too long. Another way is to insert a \pau=XXXX\ tag instead of a punctuation.
You wish to talk with a counselor \pau=100\ concerning dental, optical \pau=50\ or hospital reimbursements, press 2.
Punctuation marks not only introduce a pause but they also locally change the intonation of the sentence. A comma causes a rising intonation, a full stop a downward one.
You wish to talk with a counselor \pau=100\, concerning dental, optical or hospital reimbursements, press 2.
You wish to talk with a counselor \pau=100\. concerning dental, optical or hospital reimbursements, press 2.
When you create a message with a TTS, some parts of the message contain the relevant information that has to be understood. The relative speed tag (\rspd=XXX\) combined with a pause tag (\pau=XXX\) is a good way to make the important information stand out.
Please call 911 monday through friday from 9 AM to 8 PM.
Please \pau=200\ \rspd=80\ call 911 \rspd=100\ \pau=200\ monday through friday from 9 AM to 8 PM.
Please \pau=200\ \rspd=80\ call 911 \rspd=100\ \pau=200\ monday through friday \pau=300\ from 9 AM to 8 PM.
When you use the \rspd tag, don’t forget to close it when it’s no longer needed. To close it use \rspd=100\.
When the default output of the TTS does not completely match your expectations, you can get alternative outputs by using the alternative selection tag. This gives you the opportunity to get different output for the same words, group of words or sentences. This tag has to be used before each word you would like to get in a different way.
Please hold on for more information.
Please hold on for more \sel=alt2\ information.
\sel=alt1\ Please hold on for more \sel=alt2\ information.
\sel=alt20\ Please hold \sel=alt20\ on for more \sel=alt20\ information.
An important thing to keep in mind when you are using a TTS system is to keep in mind the formats that are accepted by the system for different kinds of information like hours, date, numbers … Those can be found in the language manual.
Here are some examples of time formats: Time
2:20 or 10:20
2:40 AM or 10:40 AM
2:40 PM or 10:40 PM
2.40 AM or 10.40 AM
2.40 PM or 10.40 PM
10:00 -> ten o’clock
2:00 AM -> two AM
2:20:45 or 2:20’45” -> two twenty and forty-five seconds
3-4 PM -> three to four PM
A typical issue you meet when using TTS is the wrong pronunciation of a word.
Most of the time this occurs on proper names. Indeed, proper names often do not follow standard pronunciation rules.
The best way to solve this kind of problem is to use the pronunciation editor and to create an entry in the user lexicon with the proper name and the appropriate phonetic transcription.
A phonetic tag could also be used if the pronunciation needs to be changed locally only. The different phonetic alphabets can be found in the language manual.
Sometimes the official transcription of a word does not give full satisfaction. Using alternative transcriptions constructed with the use of ‘allophones’ can be helpful.
Here is a set of examples of phoneme replacements for American English.
Normally, /t/, /p/, /k/ are aspirated if followed by an accented vowel. This is not always the case but forcing aspiration can change the pronunciation.
They \prx= aU t_h w EI1\ you.
The hurricane uprooted the trees.
The hurricane \prx= V p_h r u1 t @ d\ the trees.
The democrats voted today.
The \prx= d E1 m @ k_h r { t s\ voted today.
“Flapping” is a reduction of /t/ frequent in American English, mainly between stressed and unstressed vowels. It can be changed to a /t/ (sounds a bit more British).
The city comes to life.
The \prx= s I1 (t) i\ comes to life.
A /t/ in American English can also be “swallowed” into a glottal stop. Which in turn can be replaced by a flap.
Clinton was president of the United States.
\prx= k l I1 n t @ n\ was president of the United States.
Climb up the mountaintop.
Climb up the \prx= m aU1 n 4 n= n t O1 p\.
Climb up the \prx= m aU1 n t n= n t O1 p\.
A user can enhance the /N/ sound by adding /g/ after it.
Camping is fun.
\prx= k {1 m p I N g \ is fun.
Simple replacements:
I like chatting with you.
I like \prx= t S {1 t I N\ with you.
He’ll join the army.
He’ll \prx= d Z OI1 n\ the army.
A nice toothy grin.
A nice \prx= t u1 D i\ grin.
or
The smooth surface.
The \prx= s m u1 T\ surface.
señor.
\prx= s i n i O1 r\.
greater.
\prx= g r EI1 4 @ r\.
or
generation.
\prx= dZ E n r= EI1 S @ n\.
sorry.
\prx= s A1 r i\.
or
swat team.
\prx= s w O1 t \ team.
city traffic.
\prx= s I1 4 I\ traffic.
\prx= s i1 4 i\ traffic.
That’s wasting time.
That’s \prx= u EI1 s t I N\ time.
It’s in my eardrum.
It’s in my \prx= I1 r d r @ m\.
or
Don’t dramatize.
Don’t \prx= d r {1 m V t AI z \.
He had a good education.
He had a good \prx= E dZ u k EI1 S @ n\.
or
The room is big.
The \prx= r U1 m \ is big.
I had to go.
I \prx= h E d\ to go.
He hit pay-dirt.
He hit \prx= p E1 j \ dirt.
He hit \prx= p E1 i \ dirt.
The typhoon hit.
The \prx= t A j f u1 n\ hit.
The \prx= t A i f u1 n\ hit.
He heard a strange noise there.
He heard a strange \prx= n O1 j z\ there.
He heard a strange \prx= n O1 i z\ there.
The mouse ran.
The \prx= m {1 w s\ ran.
The \prx= m {1 u s\ ran.
The battle ground.
The \prx= b {1 4 @ l\ ground.
The fountain sang.
The \prx= f aU1 n ? @ n\ sang.
Sounds are produced by the speakers’ voice. This include laughing, breathing, sneezing, coughing and other sounds our voices can produce to mimic sounds we make in our daily lives. Sounds are always between two hashtag signs #LAUGH01# in capital letters and sometimes followed by a number if there are more than one of the same sound. The children’s voices have more sounds than adult voices because, as you well know, children are way more playful :-).
Exclamations were a bit trickier to select and we only kept the most commonly used ones. Exclamations are always followed by an exclamation mark (!) – quite obviously – but without a blank between the word and the sign. If there is a blank left in between, the exclamation will be ignored. You may have noticed that in some cases certain exclamations in the document are doubled by the same exclamation in brackets (“), this is simply to avoid extended pauses.
In both cases you simply need to insert sounds and exclamations into your text and they will be expressed be uttered correctly if you are using the right voice.
– Look at the documentation.
Please note, not all voices have sounds and exclamations. In some cases we can make additional recordings, in some cases we can provide substitutes (that’s when (S) is written after the text string, don’t copy this of course). Unfortunately, sometimes we can’t process the voice further.
Download
Yes, do voice switching by using this tag: \vce=speaker\ as in the following example: “Good morning, ladies and gentlemen, \vce=speaker=Julie\ Bonjour mesdames et messieurs.”
Just pick the name of the voice that you want to use and the text-to-speech will immediately switch to the new voice after the tag. For special voices, like the voice containing emotions or variants of a specific voice, you need to type the name without any space, parenthesis or underscore.
For example Will (LittleCreature): \vce=speaker=willlittlecreature\. Enjoy!