Speaker and Digit Recognition by Audio-Visual Lip Biometrics

Isaac Faraj, Maycel; Bigun, Josef

doi:10.1007/978-3-540-74549-5_106

Maycel Isaac Faraj¹ &
Josef Bigun¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4642))

Included in the following conference series:

International Conference on Biometrics

2889 Accesses

Abstract

This paper proposes a new robust bi-modal audio visual digit and speaker recognition system by lip-motion and speech biometrics. To increase the robustness of digit and speaker recognition, we have proposed a method using speaker lip motion information extracted from video sequences with low resolution (128 ×128 pixels). In this paper we investigate a biometric system for digit recognition and speaker identification based using line-motion estimation with speech information and Support Vector Machines. The acoustic and visual features are fused at the feature level showing favourable results with digit recognition being 83% to 100% and speaker recognition 100% on the XM2VTS database.

Download to read the full chapter text

Chapter PDF

Speaker Identification System Based on Lip-Motion Feature

An adaptive approach for lip-reading using image and depth data

Article 09 July 2015

Ahmed Rekik, Achraf Ben-Hamadou & Walid Mahdi

Unified System for Visual Speech Recognition and Speaker Identification

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.: Recent advances in the automatic recognition of audiovisual speech. Proceedings of the IEEE 91(9), 1306–1326 (2003)
Article Google Scholar
Brunelli, K.R., Falavigna, D.: Person identification using multiple cues. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(10), 955–966 (1995)
Article Google Scholar
Chibelushi, C., Deravi, F., Mason, J.: A review of speech-based bimodal recognition. IEEE Transactions on Multimedia 4(1), 23–37 (2002)
Article Google Scholar
Duc, B., Fischer, S., Bigun, J.: Face authentication with sparse grid gabor information. IEEE International Conference Acoustics, Speech, and Signal Processing 4(21), 3053–3056 (1997)
Google Scholar
Tang, X., Li, X.: Video based face recognition using multiple classifiers. In: FGR 2004. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 345–349. IEEE Computer Society, Los Alamitos (2004)
Google Scholar
Faraj, M.I., Bigun, J.: Person verification by lip-motion. In: CVPRW 2006. Conference on Computer Vision and Pattern Recognition Workshop, pp. 37–45 (2006)
Google Scholar
Luettin, J., Maitre, G.: Evaluation protocol for the extended m2vts database xm2vtsdb (1998). In: IDIAP Communication 98-054, Technical report R R-21, number = IDIAP - (1998)
Google Scholar
Dieckmann, U., Plankensteiner, P., Wagner, T.: Acoustic-labial speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 301–310. Springer, Heidelberg (1997)
Chapter Google Scholar
Jourlin, P., Luettin, J., Genoud, D., Wassner, H.: Acoustic-labial speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 319–326. Springer, Heidelberg (1997)
Chapter Google Scholar
Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine 18(1), 9–21 (2001)
Article MATH Google Scholar
Liang, L., Zhao, X.L.Y., Pi, X., Nefian, A.: Speaker independent audio-visual continuous speech recognition. In: ICME 2002. IEEE International Conference on Multimedia and Expo., Proceedings, vol. 2, pp. 26–29 (2002)
Google Scholar
Kollreider, K., Fronthaler, H., Bigun, J.: Evaluating liveness by face images and the structure tensor. In: AutoID 2005. Fourth Workshop on Automatic Identification Advanced Technologies, pp. 75–80. IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Wan, V., Campbell, W.: Support vector machines for speaker verification and identification. In: Proceedings of the 2000 IEEE Signal Processing Society Workshop, Neural Networks for Signal Processing X, vol. 2, pp. 775–784 (2000)
Google Scholar
Gavat, I., Costache, G., Iancu, C.: Robust speech recognizer using multiclass svm. In: NEUREL 2004. 7th Seminar on Neural Network Applications in Electrical Engineering, pp. 63–66 (2004)
Google Scholar
Clarkson, P., Moreno, P.: On the use of support vector machines for phonetic classification. In: ICASSP. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 585–588. IEEE Computer Society Press, Los Alamitos (1999)
Google Scholar
Reynolds, D., Quatieri, T., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1-3), 19–41 (2000)
Article Google Scholar
Farrell, K., Mammone, R., Assaleh, K.: Speaker recognition using neural networks and conventional classifiers, vol. 2(1), pp. 194–205. IEEE-Computer Society Press, Los Alamitos (1994)
Google Scholar
Bigun, J., Granlund, G., Wiklund, J.: Multidimensional orientation estimation with applications to texture analysis of optical flow. IEEE-Trans. Pattern Analysis and Machine Intelligence 13(8), 775–790 (1991)
Article Google Scholar
Faraj, M.I., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps (Article accepted for publication in Pattern Recognition Letters: February 2, 2007)
Google Scholar
Granlund, G.H.: In search of a general picture processing operator. Computer Graphics and Image Processing 8(2), 155–173 (1978)
Article Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Article Google Scholar
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The htk book (for htk version 3.0) (2000), http://htk.eng.cam.ac.uk/docs/docs.shtml
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
MATH Google Scholar
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Article Google Scholar
Chang, C.C., Lin, C.J.: Libsvm-a library for support vector machines. software (2001), available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Messer, K., Matas, J., Kittler, J., Luettin, J.: Xm2vtsdb: The extended m2vts database. In: ICSLP 1996. Second International Conference of Audio and Video-based Biometric Person Authentication, pp. 72–77 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad University, Box 823, SE-301 18 Halmstad,
Maycel Isaac Faraj & Josef Bigun

Authors

Maycel Isaac Faraj
View author publications
You can also search for this author in PubMed Google Scholar
Josef Bigun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Seong-Whan Lee Stan Z. Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Isaac Faraj, M., Bigun, J. (2007). Speaker and Digit Recognition by Audio-Visual Lip Biometrics. In: Lee, SW., Li, S.Z. (eds) Advances in Biometrics. ICB 2007. Lecture Notes in Computer Science, vol 4642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74549-5_106

Download citation

DOI: https://doi.org/10.1007/978-3-540-74549-5_106
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74548-8
Online ISBN: 978-3-540-74549-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Speaker and Digit Recognition by Audio-Visual Lip Biometrics

Abstract

Chapter PDF

Similar content being viewed by others

Speaker Identification System Based on Lip-Motion Feature

An adaptive approach for lip-reading using image and depth data

Unified System for Visual Speech Recognition and Speaker Identification

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Speaker and Digit Recognition by Audio-Visual Lip Biometrics

Abstract

Chapter PDF

Similar content being viewed by others

Speaker Identification System Based on Lip-Motion Feature

An adaptive approach for lip-reading using image and depth data

Unified System for Visual Speech Recognition and Speaker Identification

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation