Start typing and press enter...
Acapela Group is actively working on Deep Neural Networks (DNN) and we are very enthusiastic and proud to present the first achievements of our research in this fascinating field, creating new opportunities for voice interfaces.
Humanoid intelligent companions, multilingual conversations, singing speech, expressive reading and transmission of emotions, Internet of Things, biometrics and multimodal man-machine intercation are some of the domains we have been seriously involved with for over a decade, partnering with experts worldwide.
2014 – ChaNTeR
2013 – PATH
2012 – DBOX
2012 – Mardi
2012 – Content4All
2011 – DIYSE
2010 – ROMEO
2009 – GVLEX
2009 – FRANEL
2008 – HMMTTS
2007 – INDIGO
2005 – BOON Companion
2004 – DIVINES
2003 – E! 2990- MAJORCALL Majordome CRM Call Centers
2003 – STOP
2003 – NORMALANGUE
2003 – ULYCES
The goal of the ChaNTeR (Chant Numérique Temps-Réel in French : Digital Real time singing) project is to create a high quality system for synthesizing songs that can be used by the general public. The system will sing the words of a song and the synthesizer imagined will work in two modes ‘song from text’ or ‘virtual singer’. In the first mode, the user can enter a text to be sung along with a score (times and pitches), and the machine will transform it into sound. In the second one, the ‘virtual singer’ mode, the user controls the song synthesizer in real-time via specific interfaces, just like playing an instrument.
To achieve the synthesizer, the project will combine advanced voice transformation techniques, including analysis and processing of the parameters of the vocal tract and the glottal source, with state of the art know how about unit selection for speech synthesis, rules based singing synthesis systems, and innovative gesture control interfaces. The project focuses on capturing and reproducing a variety of vocal styles (e.g. lyrical/classical, popular/song).
A prototype system for singing synthesis will be developed to be used by projects partners to offer synthesized singing voice and singing instrument products that are currently lacking, or to improve the functions of currently existing products. The project will offer musicians and performers a new artistic approach to synthesized song, new means of creation that make interactive experiences with a sung voice possible.
ANR (The French National Research Agency), LIMSI, IRCAM and DUALO.
People with autism have significant communication problems undermining their integration into society. Autism is a severe and persistent neurobiological disability.
The diversity of needs of each person with autism involves the need for flexible and individualized communication tools. PATH aims to provide individuals with autism, families and therapists custom tools to generate or enhance communication via a collaborative platform.
PATH combines technological dimension (speech synthesis – recognition – eye movement tracking – embedded technologies) with a participatory dimension (cloud computing – sharing – “custom” adaptation).
Mons University (SUSA), ULG, TRIPTYK, MULTITEL
Intangible Treasures – Capturing the Intangible Cultural Heritage and Learning the Rare Know-How of Living Human Treasures.
Cultural expression is not limited to architecture, monuments or collections of artifacts. It also includes fragile intangible live expressions, which involve knowledge and skills. Such expressions include music, dance, singing, theatre, human skills and craftsmanship. These manifestations of human intelligence and creativeness constitute our Intangible Cultural Heritage (ICH).
The main objective of i-Treasures is to develop an open and extendable platform to provide access to ICH resources, enable knowledge exchange between researchers and contribute to the transmission of rare know-how from Living Human Treasures to apprentices. To this end, the project aims to go beyond the mere digitization of cultural content.
Combining conventional learning procedures and advanced services, such as Singing Voice Synthesis and sensorimotor learning through an interactive 3D environment, the i-Treasure is expected to break new ground in education and knowledge transfer of ICH.
Centre for Research and Technology Hellas, Université Pierre et Marie Curie , Centre National de la Recherche Scientifique, Université de Mons, Consiglio Nazionale delle Richerche, University College London, Turk Telekom Company, University System of Maryland, Aristotle University of Thessaloniki, University of Macedonia.
A generic dialog box for multilingual conversational applications.
D-Box’ main goal is to develop an architecture to support collaboration between users through a multi-lingual conversational agent embedded in an interactive application.
D-Box’s main goal is to develop and test an innovative architecture for conversational agents whose purpose is to support multilingual collaboration between users on a common problem in an interactive application. The interactive agent will enable type-written and/or spoken collaboration in the users’ native language by mediating communication: all user interactions will be transmitted through the D-Box multilingual agent.
Mipumi, IDIAP, KOMEI, Saarland University
In this project, we intend to study the human-computer interaction in situated manner.
We believe that the interaction must have a physical realization, anchored in the real world to be natural and effective. In order to embody interactive systems, we propose to use humanoid robots. Robots, endowed with perceptions, but also means to act in the environment, allow the integration of a physical context in the interaction for the machine as well as for humans.
SUPELEC, LIA, LAAS
The main goal of the Content4ALL project is to improve text to speech synthesis system for long texts (story telling – newspapers).
The basic concept behind this project is to allow anyone including people suffering from visual disabilities (Elderly or Blind) to access to the same information as other people.
The Do-it-Yourself Smart Experiences project (DiYSE) aims at enabling ordinary people to easily create, setup and control applications in their smart living environments as well as in the public Internet-of-Things space, allowing them to leverage aware services and smart objects for obtaining highly personalised, social, interactive, flowing experiences at home and in the city.
The partners are coming from France, Belgium, Spain, Greece, Turkey, Finland, Ireland
Alcatel-Lucent Bell Labs France, AnswareTech, Archos, Atos Origin, Catholic University of Leuven – Distrinet Catholic University of Leuven – CUO, ENSIIE, FeedHenry, Finwe, Forthnet, Geniem, Geosparc, Information & Image Management Systems (IMS), Institut TELECOM Sud Paris, Mobilera, Neotiq, Philips Innovative Applications, Pozitim, Rinnekoti-Säätiö, Tecnalia-European Software Institute (ESI), Tecnalia-Robotiker, Thales Communications, There Corporation, Turkcell Teknoloji , Universidad Politécnica de Madrid, University of Alcalá, University of Applied Sciences LAUREA, University of Mons, University of Oulu, University of Tampere, Videra, Vrije Universiteit Brussel – SOFT, Vrije Universiteit Brussel – SMIT, Vrije Universiteit Brussel – Starlab, VTT – Technical Research Centre of Finland, Waterford Institute of Technology, Wiktio.
Virtual worlds are a very new way for socializing. They allow users to embody an avatar evolving in a three-dimensional representation of a real or imaginary place, in which they usually can meet other users and interact with them.
In this case, such applications digitally extend users social life. E-Learning solutions can appear as simple as forms to fill-up or be developed using technologies from the Computer entertainment field. The later type of E-learning solutions is known as serious games. They aim at merging educational content in a gaming design, allowing users to actively learn and improve their skills. Virtual worlds and serious games offer a good technological answer to this challenge since they give users virtual experiences of real situations. If actual existing solutions reached a satisfying level of physical immersion, the next steps consist in providing users with a higher level of interaction both with other users and with virtual humans populating the digital environments.
Today’s applications lack of verbal and emotional interactions. Filling this gap would give the virtual experience a better realism. For instance, avatar’s lips and face animation should be coherent not only with the phrasing but also with the emotional message (anger, pity, etc.). Thus, spoken interactions (in other words, dialogs) are an important aspect to focus on in order to improve users’ experience. More precisely, synthesized speech and face animations should take into account verbal and non-verbal components (mainly emotions) to fully represent speaker intentions. Allowing users’ avatars but also virtual humans to handle emotions will definitely improve the immersiveness of the virtual worlds and serious games.
INRIA Lorraine (Parole and Talaris), Artefacto
BioSpeak aims to improve the ALIZE open source software to generate a commercial quality toolkit for real time voice biometrics validation at variable risk levels.
BioSpeak partner companies will benefit from state of the art algorithms for speaker validation integrated into their products. The BioSpeak project aims to create robust and scalable tools for Interactive Voice Response (IVR) systems, able to process thousands of channels in parallel using state of the art algorithms. These tools will allow multilingual interoperability and they will be designed to work on security and telephony focused environments.
This project will develop biometric tools based on ALIZE, an open source library designed for research and experimentation of signal processing algorithms and statistics used on biometric authentication. Although very complete, ALIZE is not ready to be used in a large scale commercial application with real time and multiple audio channels processing needs.
University of Swansea,ValidSoft, Multitel, Calistel, University of Avignon
Romeo is a project led by the French Cap Digital business cluster and funded by the Ile-de-France region, the General Directorate for Competitiveness, Industry and Services (DGCIS ex DGE) and the City of Paris.
This 10 million Euros project is subsidized up to 4.9 millions. The project’s objective is to develop a humanoid robot that can act as a comprehensive assistant for persons suffering from loss of autonomy. With this target in mind, the robot has to be able to interact with most familiar objects/movements (open and close a door, grasp a glass, a bottle, a bunch of keys…). But it will also have to assist people who need to move around their home and be able to help them should they fall on the ground. Beyond its physical abilities, Romeo has to come with a very “human-friendly” interface, voice and gestures being the principal means of communication with the robot. It will have to understand what is said to him, carry out simple talks and even feel the intentions and emotions of its interlocutor in order to deduce the actions it has to realize.
ALDEBARAN, VOXLER, SpirOps, AsAnAngel, LISV, LIMSI, LAAS, CEA-LIST, Paris Telecom, INRIA, LPPA (college de france), Institut de la Vision.
GV-LEX is subsidized by the French National Agency for Research (ANR) in the scope of the 2009 project “Content and Interaction”. Members of the consortium are ALDEBARAN Robotics (holder of the project), Acapela, CNRS/LIMSI and Telecom Paris Tech. Its aim is to make the robot NAO and the Avatar Greta capable of reading texts for several minutes without boring the listener with a monotoneous computer voice. To reach this objective, we propose to bring expressiveness into the speech synthesis itself as well as to take advantage of the robot or virtual human being: they are capable of performing expressive gestures while talking.
Aldebaran Robotics , LIMSI , Telecom, Paris Tech.
Franel is an innovative project in language teaching, which offers a unique opportunity to the people of West Flanders, Wallonia and France (Nord / Pas-de-Calais) to know the neighbor’s language. Specifically, learning activities developed from reports of regional television stations WTV (West Flanders), C9 (Nord-Pas-de-Calais) and NoTV (Hainaut) and three universities: KULeuven Campus Kortrijk on the Flemish side, the University Lille III Charles de Gaulle on the French side and the Polytechnic Faculty of Mons in Wallonia.
K.U.Leuven Campus Kortrijk, Lille3 Charles De Gaulle, Faculté Polytechnique de Mons, WTV, C9, NoTélé, Televic, BLCC, VDAB, Forem, AVnet, ILT
Intelligibility and expressivity have become the keywords in speech synthesis. For this, a system (HTS) based on the statistical generation of voice parameters from Hidden Markov Models has recently shown its potential efficiency and flexibility.
Nevertheless this approach has not yet reached its maturity and is limited by the buzziness it produces. This latter inconvenience is undoubtedly due to the parametrical representation of speech inducing a lack of voice quality. The first part of this thesis is consequently devoted to the high-quality analysis of speech. In the future, applications oriented towards voice conversion and expressive speech synthesis could also be carried out.
FP6- European Project : Interaction with Personality and Dialogue Enabled Robots. A key enabling technology for next-generation robots for the service, domestic and entertainment market is Human-Robot-Interaction. A robot that collaborates with humans on a daily basis – be this in care applications, in a professional or private context – requires interactive skills that go beyond keyboards, button clicks or metallic voices.
For this class of robots, human-like interactivity is a fundamental part of their functionality. INDIGO aims to develop human-robot communication technology for intelligent mobile robots that operate and serve tasks in populated environments. In doing so, the project will involve technologies from various sectors, and will attempt to introduce advances in respective areas, i.e. natural language interaction, autonomous navigation, visual perception, dialogue systems, and virtual emotions.
The project will address human-robot communication from two sides: by enabling robots to correctly perceive and understand natural human behaviour and by making them act in ways that are familiar to humans.
FORTH-ICS,Univ Edinburgh ,Uni Albert Ludwigs of Freiburg ,University of Athens, University of Geneva, NEOGPS, HANSON ROBOTICS, Fondation Hellenic World, NCSR.
The Boon Companion project aims at investigating and demonstrating an autonomous cognitive system (ACS) integrating perception, reasoning, and learning. The consortium’s interest in ACS is motivated by the desire to develop intelligent companions and domestic assistants that could exhibit some human-like cognitive abilities (e.g. adaptiveness to the interaction context, adaptiveness to the user) and thus gain in acceptance.
BERCHET, CEA, Wany Robotics, Eurecom, Generation 5, Thales, Philips, Sound Intelligence,University of Gröningen, University of Utrecht, CRIFA
European Project FP6-IST-2002-002034 DIVINES: Diagnostic and Intrinsic Variabilities in Natural Speech.
The goal of DIVINES is to develop some new knowledge towards renewed feature extraction and modelling techniques that would have better capacities, particularly in handling speech intrinsic variabilities. First, human and machine performance and the effect of intrinsic variabilities will be compared based on a diagnostic procedure. The outcomes of this analysis will then be exploited to target feature extraction, acoustic and lexical modelling. Compatibility with techniques dealing with noise and integration within current systems are also part of the objectives.
The project is relevant to the “multimodal interfaces” objective as it concerns more accurate and adaptable recognition of spoken language. This is central to the concept of multimodal man-machine interaction where the speech understanding service is likely to remain an independent component in a modular design. Advances in this field could be decisive in realizing the vision of natural interactivity.
THE ROYAL INSTITUTION FOR THE ADVANCEMENT OF LEARNING (MCGILL UNIVERSITY), FRANCE TELECOM SA, LOQUENDO SPA, UNIVERSITE D’AVIGNON ET DU PAYS-VAUCLUSE, INSTITUT EURECOM, CARL VON OSSIETZKY UNIVERSITAET OLDENBURG, POLITECNICO DI TORINO
Integration of speech technology with communication, marketing and customer related services in a single comfortable process enabling instantaneous mobile access to crucial business information.
MULTITEL ,Software 602, GVZ, Vecsys, ENST, Knowledge S.A., University Of Patras, , Harpax, Italy
Speech dynamics and voice Quality analysis for improved speech synthesis.
The STOP Project aims at studying the relationship between speech dynamics and voice quality, based on home-made tools for efficient source-tract separation. STOP stands for “Synthèse Technologique Optimisée de la Parole” (Optimized Technological Speech Synthesis).
It aims at improving speech synthesis technologies by exploiting speech dynamics, a field that has been unexplored till now. The aim of the project is to compute a software library to modify dynamics in concatenative speech synthesis (diphones and Non Uniform Units). For this, not only the modification of the prosody is envisaged, but also the voice quality should be adapted to the desired perceived phonation.
“5e Saison”, a French society oriented towards digital sound processing, (France),
Development of the arabic TTS system, New voice: Bruno has been recorded in this project. MixLP method : separation of signal source and vocal tract, TCTS lab, the Circuits Theory and Signal Processing laboratory of the Faculté Polytechnique de Mons (FPMs)
The project aims to strengthen the French contribution to international standardization entities in the field of speech technologies, and to work on standards committees in the French industrial and academic community, organizing regular information of this community.
To achieve this objective, the project consortium brings together seven partners from industry and academic players in the field of speech technologies in four complementary angles: upstream research, speech technology vendors, sellers of voice platforms, component vendors.
This sub-project is divided into two phases: the first is more general and covers all relevant standards for speech technology, for 1 year and the second concentrating on the main standard, VoiceXML, over a period of 2 years .
SIEMENS,TELISMA, IDYLIC, ST Microelectr., LORIA, ENST Paris
The UlyCEs project aimed to develop a Telematic platform for the automotive industry, based on Win CE technology.
EZOS, TWIN DEVELOPMENT, GILLET Automobile
The EVASY project is dedicated to the evaluation of speech synthesis systems for the French language.
The project is financed by the French Ministry of Research in the context of the Technolangue programme.
This evaluation campaign is intended to expand upon the ARC-AUPELF (now AUF) campaign of 1996-1999, the only previous evaluation campaign for text-to-speech systems for the French language. The EvaSy campaign is subdivided into three components:
– evaluation of the grapheme-to-phoneme module,
– evaluation of prosody and expressivity,
– global evaluation of the quality of the synthesised speech.
ELDA (Evaluations and Language Resources distribution Agency), LIMSI, Equipe de recherche DELIC (Description Linguistique Informatisée sur Corpus), Université de Provence, CRISCO (Centre de Recherches Inter-langues sur la Signification en Contexte), ICP (Institut de la Communication Parlée), LIA (Laboratoire Informatique d’Avignon), MULTITEL ASBL