Speaker and Digit Recognition by Audio-Visual Lip Biometrics

  • Maycel Isaac Faraj
  • Josef Bigun
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4642)


This paper proposes a new robust bi-modal audio visual digit and speaker recognition system by lip-motion and speech biometrics. To increase the robustness of digit and speaker recognition, we have proposed a method using speaker lip motion information extracted from video sequences with low resolution (128 ×128 pixels). In this paper we investigate a biometric system for digit recognition and speaker identification based using line-motion estimation with speech information and Support Vector Machines. The acoustic and visual features are fused at the feature level showing favourable results with digit recognition being 83% to 100% and speaker recognition 100% on the XM2VTS database.


Support Vector Machine Recognition Rate Gaussian Mixture Model Speaker Recognition Audiovisual Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.: Recent advances in the automatic recognition of audiovisual speech. Proceedings of the IEEE 91(9), 1306–1326 (2003)CrossRefGoogle Scholar
  2. 2.
    Brunelli, K.R., Falavigna, D.: Person identification using multiple cues. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(10), 955–966 (1995)CrossRefGoogle Scholar
  3. 3.
    Chibelushi, C., Deravi, F., Mason, J.: A review of speech-based bimodal recognition. IEEE Transactions on Multimedia 4(1), 23–37 (2002)CrossRefGoogle Scholar
  4. 4.
    Duc, B., Fischer, S., Bigun, J.: Face authentication with sparse grid gabor information. IEEE International Conference Acoustics, Speech, and Signal Processing 4(21), 3053–3056 (1997)Google Scholar
  5. 5.
    Tang, X., Li, X.: Video based face recognition using multiple classifiers. In: FGR 2004. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 345–349. IEEE Computer Society, Los Alamitos (2004)Google Scholar
  6. 6.
    Faraj, M.I., Bigun, J.: Person verification by lip-motion. In: CVPRW 2006. Conference on Computer Vision and Pattern Recognition Workshop, pp. 37–45 (2006)Google Scholar
  7. 7.
    Luettin, J., Maitre, G.: Evaluation protocol for the extended m2vts database xm2vtsdb (1998). In: IDIAP Communication 98-054, Technical report R R-21, number = IDIAP - (1998)Google Scholar
  8. 8.
    Dieckmann, U., Plankensteiner, P., Wagner, T.: Acoustic-labial speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 301–310. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  9. 9.
    Jourlin, P., Luettin, J., Genoud, D., Wassner, H.: Acoustic-labial speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 319–326. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  10. 10.
    Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine 18(1), 9–21 (2001)zbMATHCrossRefGoogle Scholar
  11. 11.
    Liang, L., Zhao, X.L.Y., Pi, X., Nefian, A.: Speaker independent audio-visual continuous speech recognition. In: ICME 2002. IEEE International Conference on Multimedia and Expo., Proceedings, vol. 2, pp. 26–29 (2002)Google Scholar
  12. 12.
    Kollreider, K., Fronthaler, H., Bigun, J.: Evaluating liveness by face images and the structure tensor. In: AutoID 2005. Fourth Workshop on Automatic Identification Advanced Technologies, pp. 75–80. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  13. 13.
    Wan, V., Campbell, W.: Support vector machines for speaker verification and identification. In: Proceedings of the 2000 IEEE Signal Processing Society Workshop, Neural Networks for Signal Processing X, vol. 2, pp. 775–784 (2000)Google Scholar
  14. 14.
    Gavat, I., Costache, G., Iancu, C.: Robust speech recognizer using multiclass svm. In: NEUREL 2004. 7th Seminar on Neural Network Applications in Electrical Engineering, pp. 63–66 (2004)Google Scholar
  15. 15.
    Clarkson, P., Moreno, P.: On the use of support vector machines for phonetic classification. In: ICASSP. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 585–588. IEEE Computer Society Press, Los Alamitos (1999)Google Scholar
  16. 16.
    Reynolds, D., Quatieri, T., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1-3), 19–41 (2000)CrossRefGoogle Scholar
  17. 17.
    Farrell, K., Mammone, R., Assaleh, K.: Speaker recognition using neural networks and conventional classifiers, vol. 2(1), pp. 194–205. IEEE-Computer Society Press, Los Alamitos (1994)Google Scholar
  18. 18.
    Bigun, J., Granlund, G., Wiklund, J.: Multidimensional orientation estimation with applications to texture analysis of optical flow. IEEE-Trans. Pattern Analysis and Machine Intelligence 13(8), 775–790 (1991)CrossRefGoogle Scholar
  19. 19.
    Faraj, M.I., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps (Article accepted for publication in Pattern Recognition Letters: February 2, 2007)Google Scholar
  20. 20.
    Granlund, G.H.: In search of a general picture processing operator. Computer Graphics and Image Processing 8(2), 155–173 (1978)CrossRefGoogle Scholar
  21. 21.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)CrossRefGoogle Scholar
  22. 22.
    Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The htk book (for htk version 3.0) (2000),
  23. 23.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)zbMATHGoogle Scholar
  24. 24.
    Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)CrossRefGoogle Scholar
  25. 25.
    Chang, C.C., Lin, C.J.: Libsvm-a library for support vector machines. software (2001), available at
  26. 26.
    Messer, K., Matas, J., Kittler, J., Luettin, J.: Xm2vtsdb: The extended m2vts database. In: ICSLP 1996. Second International Conference of Audio and Video-based Biometric Person Authentication, pp. 72–77 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Maycel Isaac Faraj
    • 1
  • Josef Bigun
    • 1
  1. 1.Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad University, Box 823, SE-301 18 Halmstad 

Personalised recommendations