Advertisement

Exploiting sensor fusion architectures and stimuli complementarity in AV speech recognition

  • Jordi Robert-Ribes
  • Michel Piquemal
  • Jean-Luc Schwartz
  • Pierre Escudier
Part of the NATO ASI Series book series (NATO ASI F, volume 150)

Abstract

The ambition of the present paper is to strengthen the bridge between audiovisual (AV) speech automatic recognition and cognitive psychology models. For this aim, it is necessary to better define and exploit the possible architectures for sensor fusion, and to better know the content of auditory (A) and visual (V) speech stimuli. We define four models organized around three basic questions about AV speech perception, and show that most recognition systems are based on only two of these models, and ignore one of them which happens to be most compatible with experimental data. Then we present a series of new experimental data that show the deep complementarity of the A and V sensors, both in the configurational and in the temporal domain, and the optimal use of this complementarity by the human AV fusion system. We submit the four models to a benchmark test on the identification of French vowels in noise, and show that only three of them exploit well the AV complementarity. Finally, we propose a general architecture for dealing with AV speech, namely the Timing-Target Model of speech perception, and present elements of implementation of some of its constituent modules.

Keywords

Speech Perception Direct Identification Audiovisual Speech Phonetic Feature Versus Stimulus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Jordi Robert-Ribes
    • 1
  • Michel Piquemal
    • 1
  • Jean-Luc Schwartz
    • 1
  • Pierre Escudier
    • 1
  1. 1.Institut de la Communication ParléeUnité de Recherche Associée N° 368 INPG/ENSERGGrenoble Cedex 1France

Personalised recommendations