Scalability Analysis of Audio-Visual Person Identity Verification

  • Jacek Czyz
  • Samy Bengio
  • Christine Marcel
  • Luc Vandendorpe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2688)


In this work, we present a multimodal identity verification system based on the fusion of the face image and the text independent speech data of a person. The system conciliates the monomodal face and speaker verification algorithms by fusing their respective scores. In order is evaluated at various sizes of the face and speech user template. The user template size is a key parameter when the storage space is limited like in a smart card. Our experimental results show that the multimodal fusion allows to reduce significantly the user template size while keeping a satisfactory level of performance. Experiments are performed on the newly recorded multimodal database BANCA.


Linear Discriminant Analysis Face Image Gaussian Mixture Model Smart Card Speaker Verification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    P. Belhumeur, J. Hespanha and D. Kriegman, “Face recognition: Eigenfaces vs. Fisherfaces: Recognition using class specific projection”, IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7), 1997.Google Scholar
  2. [2]
    S. Bengio, F. Bimbot, J. Mariethoz, V. Popovici, F. Porée, E. Bailly-Balliere, G. Matas and B. Ruiz “Experimental protocol on the BANCA database” Technical Report IDIAP-RR 02-05, IDIAP, 2002.Google Scholar
  3. [3]
    B. Duc, E. S. Bigun, J. Bigun, G. Maitre, and S. Fischer. “Fusion of audio and video information for multi modal person authentication” Pattern Recognition Letters, 18:835–843, 1997.CrossRefGoogle Scholar
  4. [4]
    A. Jain, R. Bolle and S. Pankanti “Biometrics: personal identification in a networked society”, Kluwer Academic Publishers, 1999.Google Scholar
  5. [5]
    J. Kittler, M. Hatef, R.P.W. Duin and J. Matas “On combining classifiers” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 20, No. 3, pp. 226–239, 1998.CrossRefGoogle Scholar
  6. [6]
    K. Messer, J. Matas, J. Kittler, J. Luettin and G. Maitre “XM2VTSDB: The extended M2VTS database” in Proc. of Int. Conf. on Audio and Video based Biometric Person Authentication, Washington, USA, 1999.Google Scholar
  7. [7]
    D.A. Reynolds and R.C. Rose “Robust Text-Independent Speaker identification using Gaussian mixture speaker models” in IEEE Trans. on Speech and Audio Processing, vol. 3, no. 1, pp. 72–83, Jan. 1995.CrossRefGoogle Scholar
  8. [8]
    A. Ross, A. Jain and J.-Z. Qian “Information fusion in Biometrics” in Proc. of Int. Conf. on Audio and Video based Biometric Person Authentication, Halmstad, Sweden, 2001.Google Scholar
  9. [9]
    R. Sanchez-Reillo “Including Biometric Authentication in a smart card operating system”, Int. Conf. on Audio-and Video-based Person Authentication, Halmstad, Sweden, 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Jacek Czyz
    • 1
  • Samy Bengio
    • 2
  • Christine Marcel
    • 2
  • Luc Vandendorpe
    • 1
  1. 1.Communications LaboratoryUniversité catholique de LouvainBelgium
  2. 2.IDIAPMartignySwitzerland

Personalised recommendations