R&D Projects

Pushing the Boundaries of Voice.

R&D @ Acapela Group

Involvement in R&D funded projects and collaboration with high profile partners, organizations, universities and laboratories, are essential to helping us meet our goals, to fulfilling our role as ground breaking inventors of leading-edge speech solutions. As an expert and innovative bespoke player we push the boundaries of voice.

Projects involving Acapela Group since 2001:

Humanoid  intelligent companions, multilingual conversations, singing speech, expressive reading and transmission of emotions, Internet of Things, biometrics and multimodal man-machine interaction are some of the domains we have been seriously involved with for over a decade, partnering with experts worldwide.

2014 – ChaNTeR
2013 – PATH
2013- I-Treasures
2012 – DBOX
2012 – Mardi
2012 – Content4All
2011 – DIYSE
2010 – ROMEO
2009 – GVLEX
2009 – Franel
2008 – HMMTTS
2007 – INDIGO
2005 – BOON Companion
2004 – DIVINES
2003 – E! 2990- MAJORCALL Majordome CRM Call Centers
2003 – STOP
2003 – ULYCES
2002 – EVASY
2001 – SYNFACE

2013 – PATH

  • Project Description:People with autism have significant communication problems undermining their integration into society. Autism is a severe and persistent neurobiological disability.
    The diversity of needs of each person with autism involves the need for flexible and individualized communication tools. PATH aims to provide individuals with autism, families and therapists custom tools to generate or enhance communication via a collaborative platform.
    PATH combines technological dimension (speech synthesis – recognition – eye movement tracking – embedded technologies) with a participatory dimension (cloud computing – sharing – “custom” adaptation).

2013- I-Treasures


  • Project Description:Intangible Treasures – Capturing the Intangible Cultural Heritage and Learning the Rare Know-How of Living Human TreasuresCultural expression is not limited to architecture, monuments or collections of artifacts. It also includes fragile intangible live expressions, which involve knowledge and skills. Such expressions include music, dance, singing, theatre, human skills and craftsmanship. These manifestations of human intelligence and creativeness constitute our Intangible Cultural Heritage (ICH). The main objective of i-Treasures is to develop an open and extendable platform to provide access to ICH resources, enable knowledge exchange between researchers and contribute to the transmission of rare know-how from Living Human Treasures to apprentices. To this end, the project aims to go beyond the mere digitization of cultural content. Combining conventional learning procedures and advanced services, such as Singing Voice Synthesis and sensorimotor learning through an interactive 3D environment, the i-Treasure is expected to break new ground in education and knowledge transfer of ICH.

2012 – DBOX

  • Project Description:
    A generic dialog box for multilingual conversational applications.
    D-Box’ main goal is to develop an architecture to support collaboration between users through a multi-lingual conversational agent embedded in an interactive application.
    D-Box’s main goal is to develop and test an innovative architecture for conversational agents whose purpose is to support multilingual collaboration between users on a common problem in an interactive application. The interactive agent will enable type-written and/or spoken collaboration in the users’ native language by mediating communication: all user interactions will be transmitted through the D-Box multilingual agent.

21012 – Mardi: Man Robot Dialogue

  • Project desription:In this project, we intend to study the human-computer interaction in situated manner. We believe that the interaction must have a physical realization, anchored in the real world to be natural and effective. In order to embody interactive systems, we propose to use humanoid robots. Robots, endowed with perceptions, but also means to act in the environment, allow the integration of a physical context in the interaction for the machine as well as for humans.
  • Partners :SUPELEC (http://www.supelec.fr/), LIA (http://www.univ-avignon.fr/), LAAS (http://www.laas.fr/)

2012 – Content4All

  • Project description:The main goal of the Content4ALL project is to improve text to speech synthesis system for long texts (story telling – newspapers). The basic concept behind this project is to allow anyone including people suffering from visual disabilities (Elderly or Blind) to access to the same information as other people.
  • Partner:

D2011 – DIYSE  (Do It Yourself)

  • Project description:The Do-it-Yourself Smart Experiences project (DiYSE) aims at enabling ordinary people to easily create, setup and control applications in their smart living environments as well as in the public Internet-of-Things space, allowing them to leverage aware services and smart objects for obtaining highly personalised, social, interactive, flowing experiences at home and in the city.

2010- EMOSPEECH (EuroStar Project)

  • Project description :Virtual worlds are a very new way for socializing. They allow users to embody an avatar evolving in a three dimensional representation of a real or imaginary place, in which they usually can meet other users and interact with them. In this case, such applications digitally extend users social life. E-Learning solutions can appear as simple as forms to fill-up or be developed using technologies from the Computer entertainment field. The later type of E-learning solutions is known as serious games. They aim at merging educational content in a gaming design, allowing users to actively learn and improve their skills.Virtual worlds and serious games offer a good technological answer to this challenge since they give users virtual experiences of real situations. If actual existing solutions reached a satisfying level of physical immersion, the next steps consist in providing users with a higher level of interaction both with other users and with virtual humans populating the digital environments. Today’s applications lack of verbal and emotional interactions. Filling this gap would give the virtual experience a better realism. For instance, avatar’s lips and face animation should be coherent not only with the phrasing but also with the emotional message (anger, pity, etc.). Thus, spoken interactions (in other words, dialogs) are an important aspect to focus on in order to improve users’ experience. More precisely, synthesized speech and face animations should take into account verbal and non-verbal components (mainly emotions) to fully represent speaker intentions. Allowing users’ avatars but also virtual humans to handle emotions will definitely improve the immersiveness of the virtual worlds and serious games.
  • Partners :INRIA Lorraine (Parole and Talaris), (http://www.inria.fr/centre/nancy), Artefacto (http://www.artefacto-ar.com/)

2010- BIOSPEAK (EuroStar Framework)

  • Project description :BioSpeak aims to improve the ALIZE open source software to generate a commercial quality toolkit for real time voice biometrics validation at variable risk levels. BioSpeak partner companies will benefit from state of the art algorithms for speaker validation integrated into their products. The BioSpeak project aims to create robust and scalable tools for Interactive Voice Response (IVR) systems, able to process thousands of channels in parallel using state of the art algorithms. These tools will allow multilingual interoperability and they will be designed to work on security and telephony focused environments.This project will develop biometric tools based on ALIZE, an open source library designed for research and experimentation of signal processing algorithms and statistics used on biometric authentication. Although very complete, ALIZE is not ready to be used in a large scale commercial application with real time and multiple audio channels processing needs.

2010 – ROMEO

  • Project description:Romeo is a project led by the French Cap Digital business cluster and funded by the Ile-de-France region, the General Directorate for Competitiveness, Industry and Services (DGCIS ex DGE) and the City of Paris. This 10 million Euros project is subsidized up to 4.9 millions.The project’s objective is to develop a humanoid robot that can act as a comprehensive assistant for persons suffering from loss of autonomy. With this target in mind, the robot has to be able to interact with most familiar objects/movements (open and close a door, grasp a glass, a bottle, a bunch of keys…). But it will also have to assist people who need to move around their home and be able to help them should they fall on the ground.Beyond its physical abilities, Romeo has to come with a very “human-friendly” interface, voice and gestures being the principal means of communication with the robot. It will have to understand what is said to him, carry out simple talks and even feel the intentions and emotions of its interlocutor in order to deduce the actions it has to realize.
  • Partners:ALDEBARAN (http://www.aldebaran-robotics.com/), VOXLER (http://www.voxler.eu/), SpirOps (http://www.spirops.com/), AsAnAngel (http://www.asanangel.net/), LISV (http://www.lisv.uvsq.fr/), LIMSI (http://www.limsi.fr/), LAAS, (http://www.laas.fr/), CEA-LIST (http://www-list.cea.fr/), Paris Telecom (http://www.telecom-paristech.fr/), INRIA (http://www.inria.fr/), LPPA (college de france), Institut de la Vision (http://www.fondave.org/-Institut-de-la-Vision-.html),

2009 – GVLEX, Gesture and Voice for expressive reading

  • Project description:GV-LEx is subsidized by the French National Agency for Research (ANR) in the scope of the 2009 project “Content and Interaction”. Members of the consortium are ALDEBARAN Robotics (holder of the project), Acapela, CNRS/LIMSI and Telecom Paris Tech. Its aim is to make the robot NAO and the Avatar Greta capable of reading texts for several minutes without boring the listener with a monotoneous computer voice. To reach this objective, we propose to bring expressiveness into the speech synthesis itself as well as to take advantage of the robot or virtual human being: they are capable of performing expressive gestures while talking.

2009 – Franel

  • Project description:Franel is an innovative project in language teaching, which offers a unique opportunity to the people of West Flanders, Wallonia and France (Nord / Pas-de-Calais)  to know the neighbor’s language.Specifically, learning activities developed from reports of regional television stations WTV (West Flanders), C9 (Nord-Pas-de-Calais) and NoTV (Hainaut) and three universities: KULeuven Campus Kortrijk on the Flemish side, the University Lille III Charles de Gaulle on the French side and the Polytechnic Faculty of Mons in Wallonia.

2008 – HMMTTS

  • Project description:Intelligibility and expressivity have become the keywords in speech synthesis. For this, a system (HTS) based on the statistical generation of voice parameters from Hidden Markov Models has recently shown its potential efficiency and flexibility. Nevertheless this approach has not yet reached its maturity and is limited by the buzziness it produces. This latter inconvenience is undoubtedly due to the parametrical representation of speech inducing a lack of voice quality. The first part of this thesis is consequently devoted to the high-quality analysis of speech. In the future, applications oriented towards voice conversion and expressive speech synthesis could also be carried out.
  • Partner:
    FPMs (http://tcts.fpms.ac.be/)

2007 – INDIGO

  • Project description:FP6- European Project : Interaction with Personality and Dialogue Enabled RobotsA key enabling technology for next-generation robots for the service, domestic and entertainment market is Human-Robot-Interaction. A robot that collaborates with humans on a daily basis – be this in care applications, in a professional or private context – requires interactive skills that go beyond keyboards, button clicks or metallic voices. For this class of robots, human-like interactivity is a fundamental part of their functionality.INDIGO aims to develop human-robot communication technology for intelligent mobile robots that operate and serve tasks in populated environments. In doing so, the project will involve technologies from various sectors, and will attempt to introduce advances in respective areas, i.e. natural language interaction, autonomous navigation, visual perception, dialogue systems, and virtual emotions. The project will address human-robot communication from two sides: by enabling robots to correctly perceive and understand natural human behaviour and by making them act in ways that are familiar to humans.
  • Partners :

FORTH-ICS (http://www.ics.forth.gr/cvrl),Univ Edinburgh (http://www.iccs.inf.ed.ac.uk/),Uni Albert Ludwigs of Freiburg (http://ais.informatik.uni-freiburg.de/),University of Athens (http://ais.informatik.uni-freiburg.de/),University of Geneva (http://www.miralab.ch/),NEOGPS (http://www.neobotix.de/en/),HANSON (http://hansonrobotics.wordpress.com/),Fondation Hellenic World (http://www.fhw.gr/index_en.html),NCSR (http://www.iit.demokritos.gr/skel/)

2005 – BOON Companion (ITEA project)

  • Project description:The Boon Companion project aims at investigating and demonstrating an autonomous cognitive system (ACS) integrating perception, reasoning, and learning. The consortium’s interest in ACS is motivated by the desire to develop intelligent companions and domestic assistants that could exhibit some human-like cognitive abilities (e.g. adaptiveness to the interaction context, adaptiveness to the user) and thus gain in acceptance.
  • Partners :
    BERCHET, France (http://www.groupe-berchet.com/), CEA List, France (http://www-list.cea.fr/), Wany Robotics, France (http://www.wanyrobotics.com/), Eurecom, France (http://www.eurecom.fr/fr),Generation 5, France (http://www.generation5.fr/), Thales, France (https://www.thalesgroup.com/fr),Philips, Netherland (http://www.philips.nl/fr), Sound Intelligence, Netherland (http://www.soundintel.com/en/home-en), University of Gröningen,Netherland (http://www.rug.nl/?lang=en), University of Utrecht, Netherland (http://www.uu.nl/en/pages/default.aspx), CRIFA, Belgium (http://www.crifa.ulg.ac.be/)

2004 – DIVINES

  • Project description:

European Project FP6-IST-2002-002034 DIVINES: Diagnostic and Intrinsic Variabilities in Natural Speech

The goal of DIVINES is to develop some new knowledge towards renewed feature extraction and modelling techniques that would have better capacities, particularly in handling speech intrinsic variabilities. First, human and machine performance and the effect of intrinsic variabilities will be compared based on a diagnostic procedure. The outcomes of this analysis will then be exploited to target feature extraction, acoustic and lexical modelling. Compatibility with techniques dealing with noise and integration within current systems are also part of the objectives.

The project is relevant to the “multimodal interfaces” objective as it concerns more accurate and adaptable recognition of spoken language. This is central to the concept of multimodal man-machine interaction where the speech understanding service is likely to remain an independent component in a modular design. Advances in this field could be decisive in realizing the vision of natural interactivity.

  • Partners:THE ROYAL INSTITUTION FOR THE ADVANCEMENT OF LEARNING (MCGILL UNIVERSITY) (http://www.mcgill.ca/fr),FRANCE TELECOM SA (http://www.orange.fr/),LOQUENDO SPA (http://www.loquendo.com/en/), UNIVERSITE D’AVIGNON ET DU PAYS-VAUCLUSE (http://www.univ-avignon.fr/),INSTITUT EURECOM (http://www.eurecom.fr/),CARL VON OSSIETZKY UNIVERSITAET OLDENBURG (http://www.uni-oldenburg.de/),POLITECNICO DI TORINO (http://www.polito.it/

2003 – E! 2990- MAJORCALL Majordome CRM Call Centers

  • Partners:MULTITEL – Belgium (http://http://www.multitel.be),Software 602, Czech (http://www.602.cz/), GVZ, Turkey,Vecsys, France (http://www.vecsys.fr/),ENST, France (http://www.telecom-paristech.fr/),Knowledge S.A., Greece (http://www.knowledge.gr/ksa/shared/index.jsp?context=101),University Of Patras, Greece (http://www.wcl.ece.upatras.gr/),Harpax, Italy (http://www.harpax.com/)

2003 – STOP : http://tcts.fpms.ac.be/projects/stop/index.html

  • Project description:
    Speech dynamics and voice Quality analysis for improved speech synthesis
    The STOP Project aims at studying the relationship between speech dynamics and voice quality, based on home-made tools for efficient source-tract separation. STOP stands for “Synthèse Technologique Optimisée de la Parole” (Optimized Technological Speech Synthesis).
    It aims at improving speech synthesis technologies by exploiting speech dynamics, a field that has been unexplored till now. The aim of the project is to compute a software library to modify dynamics in concatenative speech synthesis (diphones and Non Uniform Units). For this, not only the modification of the prosody is envisaged, but also the voice quality should be adapted to the desired perceived phonation.
  • Partners :“5e Saison”, a French society oriented towards digital sound processing, located in Boulogne-Billancourt (France),
    Development of the arabic TTS system, New voice: Bruno has been recorded in this project.MixLP method : separation of signal source and vocal tract,  TCTS lab, the Circuits Theory and Signal Processing laboratory of the Faculté Polytechnique de Mons (FPMs) (http://tcts.fpms.ac.be/)


  • Project description:The project aims to strengthen the French contribution to international standardization entities in the field of speech technologies, and to work on standards committees in the French industrial and academic community, organizing regular information of this community . To achieve this objective, the project consortium brings together seven partners from industry and academic players in the field of speech technologies in four complementary angles: upstream research, speech technology vendors, sellers of voice platforms, component vendors. This sub-project is divided into two phases: the first is more general and covers all relevant standards for speech technology, for 1 year and the second concentrating on the main standard, VoiceXML, over a period of 2 years .


BABEL Technologies (Now Acapela Group) and EZOS (with Twin Development and Automobile Gillet) were working  together on the UlyCEs project, which aimed to develop a Telematic platform for the automotive industry, based on Win CE technology.


  • Project description:The EVASY project is dedicated to the evaluation of speech synthesis systems for the French language. The project is financed by the French Ministry of Research in the context of the Technolangue programme. This evaluation campaign is intended to expand upon the ARC-AUPELF (now AUF) campaign of 1996-1999, the only previous evaluation campaign for text-to-speech systems for the French language. The EvaSy campaign is subdivided into three components :

– evaluation of the grapheme-to-phoneme module,
– evaluation of prosody and expressivity,
– global evaluation of the quality of the synthesised speech.

6 systems have been evaluated: three systems based on diphones (marked D1, D2 and D3 as before), and 3 systems based on the unit selection method (S1, S2 and S3). It came from the following entites: Acapela Group, Multitel, CRISCO, ELAN, ICP, LIMSI-CNRS.

  • Partners:

ELDA (Evaluations and Language Resources distribution Agency)  http://www.elda.org, LIMSI (Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur) (http://www.limsi.fr/), Equipe de recherche DELIC (Description Linguistique Informatisée sur Corpus), Université de Provence (http://www.up.univ-mrs.fr/delic/), CRISCO (Centre de Recherches Inter-langues sur la Signification en Contexte) (http://elsap1.unicaen.fr/), ICP (Institut de la Communication Parlée), LIA (Laboratoire Informatique d’Avignon), MULTITEL ASBL http://www.multitel.be/

2001 – SYNFACE
Synthesised talking face derived from speech for hearing disabled users of voice channels

  • Project description:Many hard of hearing people rely on lip-reading during conversations, which makes it difficult for them to communicate over the telephone. SynFace is a technology allows an animated talking face to be controlled by the incoming telephone speech signal in real time. The talking face facilitates speech understanding by providing lip-reading support. This method works with any telephone and is cost-effective compared to video telephony and text telephony that need compatible equipment at both ends. The SynFace technology has many other potential areas of application, for example in the areas of education, entertainment and public information systems.Development of the SynFace technology was funded by EU IST program (project number IST-2001-33327) between 2001 and 2004. During this time a multilingual, real-time prototype of the SynFace talking head telepone was developed and evaluated this hearing impaired users in Sweden, Holland and England with positve results. A summary report of the project is available.
print print