As an expert and innovative bespoke player, we everyday aim to push the boundaries of voice.
Acapela Group is actively working on Deep Neural Networks (DNN) and we are very enthusiastic and proud to present the first achievements of our research in this fascinating field, creating new opportunities for voice interfaces.
Humanoid intelligent companions, multilingual conversations, singing speech, expressive reading and transmission of emotions, Internet of Things, biometrics and multimodal man-machine intercation are some of the domains we have been seriously involved with for over a decade, partnering with experts worldwide.
2014 – ChaNTeR
2013 – PATH
2012 – DBOX
2012 – Mardi
2012 – Content4All
2011 – DIYSE
2010 – ROMEO
2009 – GVLEX
2009 – FRANEL
2008 – HMMTTS
2007 – INDIGO
2005 – BOON Companion
2004 – DIVINES
2003 – E! 2990- MAJORCALL Majordome CRM Call Centers
2003 – STOP
2003 – NORMALANGUE
2003 – ULYCES
VIADUCT (Voice Interface for Autonomous Driving based on User experienCe Techniques) is a new project part of the Pôle de compétitivité MecaTech, 23rd Project Call granted by the Walloon Region. This project aims to design, develop and validate multimodal, adaptive, speech-centric a human-machine interface to driving semi-autonomous cars, with a focus on Elderly people.
The product resulting from VIADUCT project consists of a multimodal, adaptive and centered human-machine interface based on voice technologies for driving semi-autonomous cars (MultiModal Voice-centric HMI).
This product integrates two innovative technological bricks:
– A multimodal conversational agent based on new voice technologies optimized for vehicles Automatic Speech Recognition (ASR) and Text-To-Speech (TTS). This agent organizes the effective communication between the driver (or a passenger) and the vehicle, and is able to adapt to the driver’s profile, and especially to the elderly, taking into account the decline of their visual and auditory abilities.
– A driver monitoring system (DMS) based on the technology available and sold by AW augmented with software functions to detect the physical, psychological, physiological, cognitive state of mind of the driver or passengers to dynamically adapt the behaviour of the conversational agent.
The dynamic adaptability of the VIADUCT HMI will be applied to the situation of older drivers, but is also applicable in any other situation where the capacity of the driver would be altered (malaise, handicap …).
This project will help to finance our R&D efforts on related ASR & TTS topics and reinforce the position of Acapela in the Automotive sector. Additionally, a new collaboration with AW Europe for the exploitation of the project is already scheduled and foreseen.
This 3 years project will mobilize for Acapela 6 people (2 additional recruitments forecasted) to develop in French:
The VIADUCT project is the result of an action plan developed by AW Technical CEnter and Acapela Group to address the challenges of voice interfaces in cars. With their expertise in automotive technologies, vehicle information systems, Artificial Intelligence and Voice Technologies, AWTCE and Acapela have mobilized the best skills available in Wallonia for the achievement of this project:
VOICI (VOIce Crew Interaction) project aims to develop an intelligent voice crew interaction system and is part of the H2020 programme.
‘VOICI’ is part of ‘Clean Sky 2’ (CS2) that targets European aeronautics research and innovation and make the global aviation industry ‘future proof’, that is providing safe, seamless and sustainable air mobility to meet the needs of citizens. The first call of CS2 includes 29 topics and has a total funding budget of €205m from Horizon 2020. More info here.
Within the 6th call of Clean Sky 2, the Voice Crew Interaction (VOICI) project aims to develop the technology that implements an intelligent voice crew interaction system as a “natural crew assistant” in a cockpit environment up to TRL 3.
The main goal of the project is to provide a proof-of-concept demonstrator of a natural crew assistant, which is capable of listening to all communications occurring in the cockpit, either between crew members or between crew and ATC, recognizing and interpreting speech content, interacting with the crew and fulfilling crew requests, such as to simplify crew tasks and reduce workload.
The topic leader has predefined: sound recording, voice recognition and artificial intelligence, as the three main technology components constituting the system, which should fulfil specific requirements such as Robustness against noisy environment, high recognition rate and requests interpretation. An audio evaluation environment will be developed which will allow the evaluation of the sound recording/ voice recognition systems and natural crew assistant according to evaluation scenarios provided by the topic manager.
Acapela will work on the development of a specific voice for the cockpit environment to provide clear and understandable vocal information to the crew using different technologies: CTS, TTS, DNN.
The Empathic project focuses on Personalised Virtual Coaches to assist elderly people living independently at and around their home. Acapela Group is working on the voice synthesis part, to provide users with advanced voice-first interface, based on Deep Learning.
The EMPATHIC Research & Innovation project will research, innovate, explore and validate new paradigms and platforms, laying the foundation for future generations of Personalised Virtual coaches.
The project is part of the Horizon 2020 programme which is the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020). It is made of 10 partners involved in health-maintenance end-user organizations, technology developers, academic / research institutes and system integrators.
Innovative multimodal face analytics, adaptive spoken dialogue systems and natural language interfaces are part of what the project will research and innovate, in order to help dependent aging persons and their careers.
Acapela will provide a new TTS technology based on Deep Neural Networks, and adapted expressive speech which will enhance the expressive possibilities of the dialogue system and adapt it to the user’s emotions and mood to improve the believability, naturalness and adaptability of the interaction. Four languages are targeted: English, French, Spanish and Norwegian.
The project will use remote non-intrusive technologies to extract physiological markers of emotional states in real-time for online adaptive responses of the coach, and advance holistic modelling of behavioral, computational, physical and social aspects of a personalized expressive virtual coach.
It will include a demonstration and validation phase with clearly-defined realistic use cases. It will focus on evidence-based, user-validated research and integration of intelligent user and context sensing methods through voice, eye and facial analysis, intelligent heuristics (complex interaction, user intention detection, distraction estimation, system decision), visual and spoken dialogue system, and system reaction capabilities. Through measurable end-user validation, to be performed in 3 different countries (Spain, Norway and France) with 3 distinct languages and cultures (plus English for R&D), the proposed methods and solutions will ensure usefulness, reliability, flexibility and robustness.
ARCHIBALD (ARCHIves Breeding by Automated Language Description) :
This project aims to intensify the way we use digital audiovisual content by accelerating their availability. While keeping focused on Audiovisual sector major needs, the Archibald project foresees the Sonuma archives as an opportunity for incubating projects that meet the expectations of application fields such as voice technologies, Research and Education.
Those objectives will be achieved by combining leading edge expertise in voice technologies available in Wallonia (Acapela and Cental), the professional experience and the needs of targeted users (media, Acapela, Sonuma SA, Universities and High schools, etc.) and the audio/text and metadata content available with the 140,000 hours of audiovisual records already digitized by Sonuma SA.
The outcome will result in the delivery of technological modules and two pilot experiments. The scientific context covers several fields of application: audio, automatic language processing and indexing/classification of digital documents.
The recent development of Deep Neural Network technologies have made possible the use of these technologies in the mentioned fields.
The goals of this project are therefore to:
These technological modules are important for the industrial developments of Sonuma SA and Acapela as well as for the international positioning of Wallonia as a major digital player.
The goal of the ChaNTeR (Chant Numérique Temps-Réel in French : Digital Real time singing) project is to create a high quality system for synthesizing songs that can be used by the general public. The system will sing the words of a song and the synthesizer imagined will work in two modes ‘song from text’ or ‘virtual singer’. In the first mode, the user can enter a text to be sung along with a score (times and pitches), and the machine will transform it into sound. In the second one, the ‘virtual singer’ mode, the user controls the song synthesizer in real-time via specific interfaces, just like playing an instrument.
To achieve the synthesizer, the project will combine advanced voice transformation techniques, including analysis and processing of the parameters of the vocal tract and the glottal source, with state of the art know how about unit selection for speech synthesis, rules based singing synthesis systems, and innovative gesture control interfaces. The project focuses on capturing and reproducing a variety of vocal styles (e.g. lyrical/classical, popular/song).
A prototype system for singing synthesis will be developed to be used by projects partners to offer synthesized singing voice and singing instrument products that are currently lacking, or to improve the functions of currently existing products. The project will offer musicians and performers a new artistic approach to synthesized song, new means of creation that make interactive experiences with a sung voice possible.
ANR (The French National Research Agency), LIMSI, IRCAM and DUALO.
People with autism have significant communication problems undermining their integration into society. Autism is a severe and persistent neurobiological disability.
The diversity of needs of each person with autism involves the need for flexible and individualized communication tools. PATH aims to provide individuals with autism, families and therapists custom tools to generate or enhance communication via a collaborative platform.
PATH combines technological dimension (speech synthesis – recognition – eye movement tracking – embedded technologies) with a participatory dimension (cloud computing – sharing – “custom” adaptation).
Mons University (SUSA), ULG, TRIPTYK, MULTITEL
Intangible Treasures – Capturing the Intangible Cultural Heritage and Learning the Rare Know-How of Living Human Treasures.
Cultural expression is not limited to architecture, monuments or collections of artifacts. It also includes fragile intangible live expressions, which involve knowledge and skills. Such expressions include music, dance, singing, theatre, human skills and craftsmanship. These manifestations of human intelligence and creativeness constitute our Intangible Cultural Heritage (ICH).
The main objective of i-Treasures is to develop an open and extendable platform to provide access to ICH resources, enable knowledge exchange between researchers and contribute to the transmission of rare know-how from Living Human Treasures to apprentices. To this end, the project aims to go beyond the mere digitization of cultural content.
Combining conventional learning procedures and advanced services, such as Singing Voice Synthesis and sensorimotor learning through an interactive 3D environment, the i-Treasure is expected to break new ground in education and knowledge transfer of ICH.
Centre for Research and Technology Hellas, Université Pierre et Marie Curie , Centre National de la Recherche Scientifique, Université de Mons, Consiglio Nazionale delle Richerche, University College London, Turk Telekom Company, University System of Maryland, Aristotle University of Thessaloniki, University of Macedonia.
A generic dialog box for multilingual conversational applications.
D-Box’ main goal is to develop an architecture to support collaboration between users through a multi-lingual conversational agent embedded in an interactive application.
D-Box’s main goal is to develop and test an innovative architecture for conversational agents whose purpose is to support multilingual collaboration between users on a common problem in an interactive application. The interactive agent will enable type-written and/or spoken collaboration in the users’ native language by mediating communication: all user interactions will be transmitted through the D-Box multilingual agent.
Mipumi, IDIAP, KOMEI, Saarland University
In this project, we intend to study the human-computer interaction in situated manner.
We believe that the interaction must have a physical realization, anchored in the real world to be natural and effective. In order to embody interactive systems, we propose to use humanoid robots. Robots, endowed with perceptions, but also means to act in the environment, allow the integration of a physical context in the interaction for the machine as well as for humans.
SUPELEC, LIA, LAAS
The main goal of the Content4ALL project is to improve text to speech synthesis system for long texts (story telling – newspapers).
The basic concept behind this project is to allow anyone including people suffering from visual disabilities (Elderly or Blind) to access to the same information as other people.
The Do-it-Yourself Smart Experiences project (DiYSE) aims at enabling ordinary people to easily create, setup and control applications in their smart living environments as well as in the public Internet-of-Things space, allowing them to leverage aware services and smart objects for obtaining highly personalised, social, interactive, flowing experiences at home and in the city.
The partners are coming from France, Belgium, Spain, Greece, Turkey, Finland, Ireland
Alcatel-Lucent Bell Labs France, AnswareTech, Archos, Atos Origin, Catholic University of Leuven – Distrinet Catholic University of Leuven – CUO, ENSIIE, FeedHenry, Finwe, Forthnet, Geniem, Geosparc, Information & Image Management Systems (IMS), Institut TELECOM Sud Paris, Mobilera, Neotiq, Philips Innovative Applications, Pozitim, Rinnekoti-Säätiö, Tecnalia-European Software Institute (ESI), Tecnalia-Robotiker, Thales Communications, There Corporation, Turkcell Teknoloji , Universidad Politécnica de Madrid, University of Alcalá, University of Applied Sciences LAUREA, University of Mons, University of Oulu, University of Tampere, Videra, Vrije Universiteit Brussel – SOFT, Vrije Universiteit Brussel – SMIT, Vrije Universiteit Brussel – Starlab, VTT – Technical Research Centre of Finland, Waterford Institute of Technology, Wiktio.
Virtual worlds are a very new way for socializing. They allow users to embody an avatar evolving in a three-dimensional representation of a real or imaginary place, in which they usually can meet other users and interact with them.
In this case, such applications digitally extend users social life. E-Learning solutions can appear as simple as forms to fill-up or be developed using technologies from the Computer entertainment field. The later type of E-learning solutions is known as serious games. They aim at merging educational content in a gaming design, allowing users to actively learn and improve their skills. Virtual worlds and serious games offer a good technological answer to this challenge since they give users virtual experiences of real situations. If actual existing solutions reached a satisfying level of physical immersion, the next steps consist in providing users with a higher level of interaction both with other users and with virtual humans populating the digital environments.
Today’s applications lack of verbal and emotional interactions. Filling this gap would give the virtual experience a better realism. For instance, avatar’s lips and face animation should be coherent not only with the phrasing but also with the emotional message (anger, pity, etc.). Thus, spoken interactions (in other words, dialogs) are an important aspect to focus on in order to improve users’ experience. More precisely, synthesized speech and face animations should take into account verbal and non-verbal components (mainly emotions) to fully represent speaker intentions. Allowing users’ avatars but also virtual humans to handle emotions will definitely improve the immersiveness of the virtual worlds and serious games.
INRIA Lorraine (Parole and Talaris), Artefacto
BioSpeak aims to improve the ALIZE open source software to generate a commercial quality toolkit for real time voice biometrics validation at variable risk levels.
BioSpeak partner companies will benefit from state of the art algorithms for speaker validation integrated into their products. The BioSpeak project aims to create robust and scalable tools for Interactive Voice Response (IVR) systems, able to process thousands of channels in parallel using state of the art algorithms. These tools will allow multilingual interoperability and they will be designed to work on security and telephony focused environments.
This project will develop biometric tools based on ALIZE, an open source library designed for research and experimentation of signal processing algorithms and statistics used on biometric authentication. Although very complete, ALIZE is not ready to be used in a large scale commercial application with real time and multiple audio channels processing needs.
University of Swansea,ValidSoft, Multitel, Calistel, University of Avignon
Romeo is a project led by the French Cap Digital business cluster and funded by the Ile-de-France region, the General Directorate for Competitiveness, Industry and Services (DGCIS ex DGE) and the City of Paris.
This 10 million Euros project is subsidized up to 4.9 millions. The project’s objective is to develop a humanoid robot that can act as a comprehensive assistant for persons suffering from loss of autonomy. With this target in mind, the robot has to be able to interact with most familiar objects/movements (open and close a door, grasp a glass, a bottle, a bunch of keys…). But it will also have to assist people who need to move around their home and be able to help them should they fall on the ground. Beyond its physical abilities, Romeo has to come with a very “human-friendly” interface, voice and gestures being the principal means of communication with the robot. It will have to understand what is said to him, carry out simple talks and even feel the intentions and emotions of its interlocutor in order to deduce the actions it has to realize.
ALDEBARAN, VOXLER, SpirOps, AsAnAngel, LISV, LIMSI, LAAS, CEA-LIST, Paris Telecom, INRIA, LPPA (college de france), Institut de la Vision.
GV-LEX is subsidized by the French National Agency for Research (ANR) in the scope of the 2009 project “Content and Interaction”. Members of the consortium are ALDEBARAN Robotics (holder of the project), Acapela, CNRS/LIMSI and Telecom Paris Tech. Its aim is to make the robot NAO and the Avatar Greta capable of reading texts for several minutes without boring the listener with a monotoneous computer voice. To reach this objective, we propose to bring expressiveness into the speech synthesis itself as well as to take advantage of the robot or virtual human being: they are capable of performing expressive gestures while talking.
Aldebaran Robotics , LIMSI , Telecom, Paris Tech.
Franel is an innovative project in language teaching, which offers a unique opportunity to the people of West Flanders, Wallonia and France (Nord / Pas-de-Calais) to know the neighbor’s language. Specifically, learning activities developed from reports of regional television stations WTV (West Flanders), C9 (Nord-Pas-de-Calais) and NoTV (Hainaut) and three universities: KULeuven Campus Kortrijk on the Flemish side, the University Lille III Charles de Gaulle on the French side and the Polytechnic Faculty of Mons in Wallonia.
K.U.Leuven Campus Kortrijk, Lille3 Charles De Gaulle, Faculté Polytechnique de Mons, WTV, C9, NoTélé, Televic, BLCC, VDAB, Forem, AVnet, ILT
Intelligibility and expressivity have become the keywords in speech synthesis. For this, a system (HTS) based on the statistical generation of voice parameters from Hidden Markov Models has recently shown its potential efficiency and flexibility.
Nevertheless this approach has not yet reached its maturity and is limited by the buzziness it produces. This latter inconvenience is undoubtedly due to the parametrical representation of speech inducing a lack of voice quality. The first part of this thesis is consequently devoted to the high-quality analysis of speech. In the future, applications oriented towards voice conversion and expressive speech synthesis could also be carried out.
FP6- European Project : Interaction with Personality and Dialogue Enabled Robots. A key enabling technology for next-generation robots for the service, domestic and entertainment market is Human-Robot-Interaction. A robot that collaborates with humans on a daily basis – be this in care applications, in a professional or private context – requires interactive skills that go beyond keyboards, button clicks or metallic voices.
For this class of robots, human-like interactivity is a fundamental part of their functionality. INDIGO aims to develop human-robot communication technology for intelligent mobile robots that operate and serve tasks in populated environments. In doing so, the project will involve technologies from various sectors, and will attempt to introduce advances in respective areas, i.e. natural language interaction, autonomous navigation, visual perception, dialogue systems, and virtual emotions.
The project will address human-robot communication from two sides: by enabling robots to correctly perceive and understand natural human behaviour and by making them act in ways that are familiar to humans.
FORTH-ICS,Univ Edinburgh ,Uni Albert Ludwigs of Freiburg ,University of Athens, University of Geneva, NEOGPS, HANSON ROBOTICS, Fondation Hellenic World, NCSR.
The Boon Companion project aims at investigating and demonstrating an autonomous cognitive system (ACS) integrating perception, reasoning, and learning. The consortium’s interest in ACS is motivated by the desire to develop intelligent companions and domestic assistants that could exhibit some human-like cognitive abilities (e.g. adaptiveness to the interaction context, adaptiveness to the user) and thus gain in acceptance.
BERCHET, CEA, Wany Robotics, Eurecom, Generation 5, Thales, Philips, Sound Intelligence,University of Gröningen, University of Utrecht, CRIFA
European Project FP6-IST-2002-002034 DIVINES: Diagnostic and Intrinsic Variabilities in Natural Speech.
The goal of DIVINES is to develop some new knowledge towards renewed feature extraction and modelling techniques that would have better capacities, particularly in handling speech intrinsic variabilities. First, human and machine performance and the effect of intrinsic variabilities will be compared based on a diagnostic procedure. The outcomes of this analysis will then be exploited to target feature extraction, acoustic and lexical modelling. Compatibility with techniques dealing with noise and integration within current systems are also part of the objectives.
The project is relevant to the “multimodal interfaces” objective as it concerns more accurate and adaptable recognition of spoken language. This is central to the concept of multimodal man-machine interaction where the speech understanding service is likely to remain an independent component in a modular design. Advances in this field could be decisive in realizing the vision of natural interactivity.
THE ROYAL INSTITUTION FOR THE ADVANCEMENT OF LEARNING (MCGILL UNIVERSITY), FRANCE TELECOM SA, LOQUENDO SPA, UNIVERSITE D’AVIGNON ET DU PAYS-VAUCLUSE, INSTITUT EURECOM, CARL VON OSSIETZKY UNIVERSITAET OLDENBURG, POLITECNICO DI TORINO
Integration of speech technology with communication, marketing and customer related services in a single comfortable process enabling instantaneous mobile access to crucial business information.
MULTITEL ,Software 602, GVZ, Vecsys, ENST, Knowledge S.A., University Of Patras, , Harpax, Italy
Speech dynamics and voice Quality analysis for improved speech synthesis.
The STOP Project aims at studying the relationship between speech dynamics and voice quality, based on home-made tools for efficient source-tract separation. STOP stands for “Synthèse Technologique Optimisée de la Parole” (Optimized Technological Speech Synthesis).
It aims at improving speech synthesis technologies by exploiting speech dynamics, a field that has been unexplored till now. The aim of the project is to compute a software library to modify dynamics in concatenative speech synthesis (diphones and Non Uniform Units). For this, not only the modification of the prosody is envisaged, but also the voice quality should be adapted to the desired perceived phonation.
“5e Saison”, a French society oriented towards digital sound processing, (France),
Development of the arabic TTS system, New voice: Bruno has been recorded in this project. MixLP method : separation of signal source and vocal tract, TCTS lab, the Circuits Theory and Signal Processing laboratory of the Faculté Polytechnique de Mons (FPMs)
The project aims to strengthen the French contribution to international standardization entities in the field of speech technologies, and to work on standards committees in the French industrial and academic community, organizing regular information of this community.
To achieve this objective, the project consortium brings together seven partners from industry and academic players in the field of speech technologies in four complementary angles: upstream research, speech technology vendors, sellers of voice platforms, component vendors.
This sub-project is divided into two phases: the first is more general and covers all relevant standards for speech technology, for 1 year and the second concentrating on the main standard, VoiceXML, over a period of 2 years .
SIEMENS,TELISMA, IDYLIC, ST Microelectr., LORIA, ENST Paris
The UlyCEs project aimed to develop a Telematic platform for the automotive industry, based on Win CE technology.
EZOS, TWIN DEVELOPMENT, GILLET Automobile
The EVASY project is dedicated to the evaluation of speech synthesis systems for the French language.
The project is financed by the French Ministry of Research in the context of the Technolangue programme.
This evaluation campaign is intended to expand upon the ARC-AUPELF (now AUF) campaign of 1996-1999, the only previous evaluation campaign for text-to-speech systems for the French language. The EvaSy campaign is subdivided into three components:
– evaluation of the grapheme-to-phoneme module,
– evaluation of prosody and expressivity,
– global evaluation of the quality of the synthesised speech.
ELDA (Evaluations and Language Resources distribution Agency), LIMSI, Equipe de recherche DELIC (Description Linguistique Informatisée sur Corpus), Université de Provence, CRISCO (Centre de Recherches Inter-langues sur la Signification en Contexte), ICP (Institut de la Communication Parlée), LIA (Laboratoire Informatique d’Avignon), MULTITEL ASBL