Abstract
This paper proposes a new robust bi-modal audio visual digit and speaker recognition system by lip-motion and speech biometrics. To increase the robustness of digit and speaker recognition, we have proposed a method using speaker lip motion information extracted from video sequences with low resolution (128 ×128 pixels). In this paper we investigate a biometric system for digit recognition and speaker identification based using line-motion estimation with speech information and Support Vector Machines. The acoustic and visual features are fused at the feature level showing favourable results with digit recognition being 83% to 100% and speaker recognition 100% on the XM2VTS database.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.: Recent advances in the automatic recognition of audiovisual speech. Proceedings of the IEEE 91(9), 1306–1326 (2003)
Brunelli, K.R., Falavigna, D.: Person identification using multiple cues. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(10), 955–966 (1995)
Chibelushi, C., Deravi, F., Mason, J.: A review of speech-based bimodal recognition. IEEE Transactions on Multimedia 4(1), 23–37 (2002)
Duc, B., Fischer, S., Bigun, J.: Face authentication with sparse grid gabor information. IEEE International Conference Acoustics, Speech, and Signal Processing 4(21), 3053–3056 (1997)
Tang, X., Li, X.: Video based face recognition using multiple classifiers. In: FGR 2004. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 345–349. IEEE Computer Society, Los Alamitos (2004)
Faraj, M.I., Bigun, J.: Person verification by lip-motion. In: CVPRW 2006. Conference on Computer Vision and Pattern Recognition Workshop, pp. 37–45 (2006)
Luettin, J., Maitre, G.: Evaluation protocol for the extended m2vts database xm2vtsdb (1998). In: IDIAP Communication 98-054, Technical report R R-21, number = IDIAP - (1998)
Dieckmann, U., Plankensteiner, P., Wagner, T.: Acoustic-labial speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 301–310. Springer, Heidelberg (1997)
Jourlin, P., Luettin, J., Genoud, D., Wassner, H.: Acoustic-labial speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 319–326. Springer, Heidelberg (1997)
Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine 18(1), 9–21 (2001)
Liang, L., Zhao, X.L.Y., Pi, X., Nefian, A.: Speaker independent audio-visual continuous speech recognition. In: ICME 2002. IEEE International Conference on Multimedia and Expo., Proceedings, vol. 2, pp. 26–29 (2002)
Kollreider, K., Fronthaler, H., Bigun, J.: Evaluating liveness by face images and the structure tensor. In: AutoID 2005. Fourth Workshop on Automatic Identification Advanced Technologies, pp. 75–80. IEEE Computer Society Press, Los Alamitos (2005)
Wan, V., Campbell, W.: Support vector machines for speaker verification and identification. In: Proceedings of the 2000 IEEE Signal Processing Society Workshop, Neural Networks for Signal Processing X, vol. 2, pp. 775–784 (2000)
Gavat, I., Costache, G., Iancu, C.: Robust speech recognizer using multiclass svm. In: NEUREL 2004. 7th Seminar on Neural Network Applications in Electrical Engineering, pp. 63–66 (2004)
Clarkson, P., Moreno, P.: On the use of support vector machines for phonetic classification. In: ICASSP. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 585–588. IEEE Computer Society Press, Los Alamitos (1999)
Reynolds, D., Quatieri, T., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1-3), 19–41 (2000)
Farrell, K., Mammone, R., Assaleh, K.: Speaker recognition using neural networks and conventional classifiers, vol. 2(1), pp. 194–205. IEEE-Computer Society Press, Los Alamitos (1994)
Bigun, J., Granlund, G., Wiklund, J.: Multidimensional orientation estimation with applications to texture analysis of optical flow. IEEE-Trans. Pattern Analysis and Machine Intelligence 13(8), 775–790 (1991)
Faraj, M.I., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps (Article accepted for publication in Pattern Recognition Letters: February 2, 2007)
Granlund, G.H.: In search of a general picture processing operator. Computer Graphics and Image Processing 8(2), 155–173 (1978)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The htk book (for htk version 3.0) (2000), http://htk.eng.cam.ac.uk/docs/docs.shtml
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Chang, C.C., Lin, C.J.: Libsvm-a library for support vector machines. software (2001), available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Messer, K., Matas, J., Kittler, J., Luettin, J.: Xm2vtsdb: The extended m2vts database. In: ICSLP 1996. Second International Conference of Audio and Video-based Biometric Person Authentication, pp. 72–77 (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Isaac Faraj, M., Bigun, J. (2007). Speaker and Digit Recognition by Audio-Visual Lip Biometrics. In: Lee, SW., Li, S.Z. (eds) Advances in Biometrics. ICB 2007. Lecture Notes in Computer Science, vol 4642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74549-5_106
Download citation
DOI: https://doi.org/10.1007/978-3-540-74549-5_106
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74548-8
Online ISBN: 978-3-540-74549-5
eBook Packages: Computer ScienceComputer Science (R0)