Multimodal Authentication Using Asynchronous HMMs

  • Samy Bengio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2688)


It has often been shown that using multiple modalities to authenticate the identity of a person is more robust than using only one. Various combination techniques exist and are often performed at the level of the output scores of each modality system. In this paper, we present a novel HMM architecture able to model the joint probability distribution of pairs of asynchronous sequences (such as speech and video streams) describing the same event. We show how this model can be used for audio-visual person authentication. Results on the M2VTS database show robust performances of the system under various audio noise conditions, when compared to other state-of-the-art techniques.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    S. Bengio. An asynchronous hidden markov model for audio-visual speech recognition. Technical Report IDIAP-RR 02-26, IDIAP, 2002.Google Scholar
  2. [2]
    S. Bengio. An asynchronous hidden markov model for audio-visual speech recognition. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems, NIPS 15, 2003.Google Scholar
  3. [3]
    S. Bengio and Y. Bengio. An EM algorithm for asynchronous input/output hidden markov models. In Proceedings of the International Conference on Neural Information Processing, ICONIP, Hong Kong, 1996.Google Scholar
  4. [4]
    S. Dupont and J. Luettin. Audio-visual speech modelling for continuous speech recognition. IEEE Transactions on Multimedia, 2:141–151, 2000.CrossRefGoogle Scholar
  5. [5]
    R. Durbin, S. Eddy, A. Krogh, and G. Michison. Biological Sequence Analysis: Probabilistic Models of proteins and nucleic acids. Cambridge University Press, 1998.Google Scholar
  6. [6]
    S. Pigeon and L. Vandendorpe. The M2VTS multimodal face database (release 1.00). In Proceedings of the First International Conference on Audio-and Videobased Biometric Person Authentication ABVPA, 1997.Google Scholar
  7. [7]
    Laurence R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.Google Scholar
  8. [8]
    Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10(1–3), 2000.Google Scholar
  9. [9]
    A. Ross, A.K. Jain, and J. Z. Qian. Information fusion in biometrics. In Proceedings of the 3rd International Conference on Audio-and Video-Based Person Authentication (AVBPA), pages 354–359, 2001.Google Scholar
  10. [10]
    A. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones. The noisex-92 study on the effect of additive noise on automatic speech recognition. Technical report, DRA Speech Research Unit, 1992.Google Scholar
  11. [11]
    P. Verlinde, G. Chollet, and M. Acheroy. Multi-modal identity verification using expert fusion. Information Fusion, 1:17–33, 2000.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Samy Bengio
    • 1
  1. 1.Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP)MartignySwitzerland

Personalised recommendations