Audio-Visual Feature Fusion for Speaker Identification

Almaadeed, Noor; Aggoun, Amar; Amira, Abbes

doi:10.1007/978-3-642-34475-6_8

Noor Almaadeed²⁰,
Amar Aggoun²⁰ &
Abbes Amira^21,22

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7663))

Included in the following conference series:

International Conference on Neural Information Processing

3242 Accesses

Abstract

Analyses of facial and audio features have been considered separately in conventional speaker identification systems. Herein, we propose a robust algorithm for text-independent speaker identification based on a decision-level and feature-level fusion of facial and audio features. The suggested approach makes use of Mel-frequency Cepstral Coefficients (MFCCs) for audio signal processing, Viola-Jones Haar cascade algorithm for face detection from video, eigenface features (EFF) and Gaussian Mixture Models (GMMs) for feature-level and decision-level fusion of audio and video. Decision-level fusion is carried out using PCA for face and GMM for audio through AND voting. Feature-level fusion is investigated by combining both MFCC (audio) and PCA (face) features to construct a hybrid GMM for each speaker. Testing on GRID, a multi-speaker audio-visual database, shows that the decision-level fusion of PCA (face) and GMM (audio) achieves 98.2 % accuracy and it is almost 15 % more efficient than feature-level fusion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abboud, B., Bredin, H., Aversano, G., Chollet, G.: Audio-visual Identity Verification: An Introductory Overview. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds.) WNSP 2005. LNCS, vol. 4391, pp. 118–134. Springer, Heidelberg (2007)
Chapter Google Scholar
Zhao, X., Shao, Y., Wang, D.L.: Robust Speaker Identification Using a CASA Frontend. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5468–5471 (2011)
Google Scholar
Ekenel, H.K., Stiefelhagen, R.: Why Is Facial Occlusion a Challenging Problem? In: Tistarelli, M., Nixon, M.S. (eds.) ICB 2009. LNCS, vol. 5558, pp. 299–308. Springer, Heidelberg (2009)
Chapter Google Scholar
Boujelbene, S.Z., Mezghani, D.B.A., Ellouze, N.: Improved feature data for robust speaker identification using hybrid Gaussian mixture models - sequential minimal optimization system. The International Review on Computers and Software (IRECOS) 4, 344–350 (2009)
Google Scholar
Mashao, D.J., Skosan, M.: Combining Classifier Decisions for Robust Speaker Identification. Pattern Recognition 39, 147–155 (2006)
Article Google Scholar
Ross, A., Jain, A.: Information Fusion in Biometrics. Pattern Recognition Letters 24, 2115–2125 (2003)
Article Google Scholar
Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face Recognition: A Literature Survey. ACM Computing Survey 35, 399–458 (2003)
Article Google Scholar
Mohamed, S., Noureddine, D., Noureddine, G.: Face and Speech Based Multi-Modal Biometric Authentication. International Journal of Advanced Science and Technology 21, 41 (2010)
Google Scholar
GRID Audio Corpus for Speech Recognition, http://www.dcs.shef.ac.uk/spandh/gridcorpus/
Pandey, B., Ranjan, A., Kumar, R., Shukla, A.: Multilingual Speaker Recognition Using ANFIS. In: Proceedings of the 2nd International Conference on Signal Processing Systems (ICSPS), pp. 714–718 (2010)
Google Scholar
Hassan, M., Memon, S., Gregory, M.A.: A Novel Approach for MFCC Feature Extraction. In: Proceedings of the 4th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–5. IEEE, New York (2011)
Google Scholar
Viola, P., Jones, M.J.: Robust Real-Time Face Detection. International Journal of Computer Vision 57, 137–154 (2004)
Article Google Scholar
Paliy, I.: Face Detection Using Haar-like Features Cascade and Convolutional Neural Network. In: Proceedings of the International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science, pp. 375–377 (2008)
Google Scholar
Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Cognitive Neuroscience 3, 71–86 (1991)
Article Google Scholar
Memon, S., Lech, M., Maddage, N.: Speaker Verification Based on Different Vector Quantization Techniques with Gaussian Mixture Models. In: Proceedings of the Third International Conference on Network and System Security, NSS 2009, pp. 403–408 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Brunel University, London, UK
Noor Almaadeed & Amar Aggoun
NIBEC, University of Ulster, Jordanstown, BT37 0QB, UK
Abbes Amira
College of Engineering, Qatar University, Qatar
Abbes Amira

Authors

Noor Almaadeed
View author publications
You can also search for this author in PubMed Google Scholar
Amar Aggoun
View author publications
You can also search for this author in PubMed Google Scholar
Abbes Amira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Texas A&M University at Qatar, Education City, P.O. Box 23874, Doha, Qatar
Tingwen Huang
Department of Control Science and Engineering, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074, Wuhan, Hubei, China
Zhigang Zeng
College of Computer Science, Chongqing University, 174 Shazhengjie Street, 400044, Chongqing, China
Chuandong Li
Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Chi Sing Leung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Almaadeed, N., Aggoun, A., Amira, A. (2012). Audio-Visual Feature Fusion for Speaker Identification. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34475-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-34475-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34474-9
Online ISBN: 978-3-642-34475-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics