Abstract
Speech-based person recognition by machine has not reached the level of technological maturity required by some of its potential applications. The deficiencies revolve around sub-optimal pre-processing, feature extraction or selection, and classification, particularly under conditions of input data variability. The joint use of audible and visible manifestations of speech aims to alleviate these shortcomings, but the development of effective combination techniques is challenging. This paper proposes and evaluates a combination approach for speaker identification based on fuzzy modelling of acoustic and visual speaker characteristics. The proposed audio-visual model has been evaluated experimentally on a speaker identification task. The results show that the joint model outperforms its isolated components in terms of identification accuracy. In particular, the cross-modal coupling of audio-visual streams is shown to improve identification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
JC Bezdek (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum, 1981.
V Chatzis, AG Bors, I Pitas (1999). Multimodal Decision-Level Fusion for Person Authentication, IEEE Trans SMC-Part A:, Vol 29, No 6, pages 674–680, 1999.
CC Chibelushi, F Deravi, JSD Mason (2002). A Review of Speech-Based Bimodal Recognition, IEEE Trans Multimedia, Vol 4, No 1, pages 23–37, 2002.
CC Chibelushi, F Deravi, JSD Mason (1999). Adaptive Classifier Integration for Robust Pattern Recognition, IEEE Trans SMC—Part B:, Vol 29, No 6, 902–907, 1999.
SB Davis, P Mermelstein (1980). Comparison of Parametric Representations for Monosyllable Word Recognition in Continuously Spoken Sentences, IEEE Trans Acoustics, Speech, and Signal Processing, Vol ASSP-28, pages 357–366, 1980.
S Dupont, J Luettin (2000). Audio-Visual Speech Modeling for Continuous Speech Recognition, IEEE Trans Multimedia, Vol 2, No 3, pages 141–151, 2000.
Qi Li, Biing-Hwang Juang, Chin-Hui Lee, Qiru Zhou, FK Soong (1999). Recent Advancements in Automatic Speaker Authentication, IEEE Robotics & Automation Magazine, Vol 6, No 1, pages 24–34, 1999.
AA Montgomery, PL Jackson (1983). Physical Characteristics of the Lips Underlying Vowel Lipreading Performance, J Acoust Soc of Am, Vol 73, pages 2134–2144, 1983.
C Neti, et-al (2000). Audio-Visual Speech Recognition, Tech Rep, Center for Language and Speech Processing, Johns Hopkins University, 2000.
H Prade (1985). A Computational Approach to Approximate and Plausible Reasoning with Applications to Expert Systems, IEEE Trans Pattern Analysis and Machine Intelligence, Vol PAMI-7, pages 260–283, 1985.
DA Reynolds (2002). An Overview of Automatic Speaker Recognition Technology, IEEE Int’l Conf Acoustics, Speech, and Signal Processing, Vol 4, 4072–4075, 2002.
FK Soong, AE Rosenberg, LR Rabiner, BH Juang (1987). A Vector Quantization Approach to Speaker Recognition, AT & T Tech. J, Vol 66, No 2, pages 14–26, 1987.
K Yu, J Mason, J Oglesby (1995). Speaker Recognition Using Hidden Markov Models, Dynamic Time Warping, and Vector Quantisation, IEE Proc-Vision, Image, and Signal Processing, Vol 142, No 5, pages 313–318, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chibelushi, C.C. (2004). Fuzzy Audio-Visual Feature Maps for Speaker Identification. In: Lotfi, A., Garibaldi, J.M. (eds) Applications and Science in Soft Computing. Advances in Soft Computing, vol 24. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45240-9_43
Download citation
DOI: https://doi.org/10.1007/978-3-540-45240-9_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40856-7
Online ISBN: 978-3-540-45240-9
eBook Packages: Springer Book Archive