Fuzzy Audio-Visual Feature Maps for Speaker Identification

Chibelushi, Claude C.

doi:10.1007/978-3-540-45240-9_43

Claude C. Chibelushi⁴

Part of the book series: Advances in Soft Computing ((AINSC,volume 24))

191 Accesses

Abstract

Speech-based person recognition by machine has not reached the level of technological maturity required by some of its potential applications. The deficiencies revolve around sub-optimal pre-processing, feature extraction or selection, and classification, particularly under conditions of input data variability. The joint use of audible and visible manifestations of speech aims to alleviate these shortcomings, but the development of effective combination techniques is challenging. This paper proposes and evaluates a combination approach for speaker identification based on fuzzy modelling of acoustic and visual speaker characteristics. The proposed audio-visual model has been evaluated experimentally on a speaker identification task. The results show that the joint model outperforms its isolated components in terms of identification accuracy. In particular, the cross-modal coupling of audio-visual streams is shown to improve identification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

JC Bezdek (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum, 1981.
Book MATH Google Scholar
V Chatzis, AG Bors, I Pitas (1999). Multimodal Decision-Level Fusion for Person Authentication, IEEE Trans SMC-Part A:, Vol 29, No 6, pages 674–680, 1999.
Google Scholar
CC Chibelushi, F Deravi, JSD Mason (2002). A Review of Speech-Based Bimodal Recognition, IEEE Trans Multimedia, Vol 4, No 1, pages 23–37, 2002.
Article Google Scholar
CC Chibelushi, F Deravi, JSD Mason (1999). Adaptive Classifier Integration for Robust Pattern Recognition, IEEE Trans SMC—Part B:, Vol 29, No 6, 902–907, 1999.
Google Scholar
SB Davis, P Mermelstein (1980). Comparison of Parametric Representations for Monosyllable Word Recognition in Continuously Spoken Sentences, IEEE Trans Acoustics, Speech, and Signal Processing, Vol ASSP-28, pages 357–366, 1980.
Google Scholar
S Dupont, J Luettin (2000). Audio-Visual Speech Modeling for Continuous Speech Recognition, IEEE Trans Multimedia, Vol 2, No 3, pages 141–151, 2000.
Google Scholar
Qi Li, Biing-Hwang Juang, Chin-Hui Lee, Qiru Zhou, FK Soong (1999). Recent Advancements in Automatic Speaker Authentication, IEEE Robotics & Automation Magazine, Vol 6, No 1, pages 24–34, 1999.
Article Google Scholar
AA Montgomery, PL Jackson (1983). Physical Characteristics of the Lips Underlying Vowel Lipreading Performance, J Acoust Soc of Am, Vol 73, pages 2134–2144, 1983.
Article Google Scholar
C Neti, et-al (2000). Audio-Visual Speech Recognition, Tech Rep, Center for Language and Speech Processing, Johns Hopkins University, 2000.
Google Scholar
H Prade (1985). A Computational Approach to Approximate and Plausible Reasoning with Applications to Expert Systems, IEEE Trans Pattern Analysis and Machine Intelligence, Vol PAMI-7, pages 260–283, 1985.
MATH Google Scholar
DA Reynolds (2002). An Overview of Automatic Speaker Recognition Technology, IEEE Int’l Conf Acoustics, Speech, and Signal Processing, Vol 4, 4072–4075, 2002.
Google Scholar
FK Soong, AE Rosenberg, LR Rabiner, BH Juang (1987). A Vector Quantization Approach to Speaker Recognition, AT & T Tech. J, Vol 66, No 2, pages 14–26, 1987.
Google Scholar
K Yu, J Mason, J Oglesby (1995). Speaker Recognition Using Hidden Markov Models, Dynamic Time Warping, and Vector Quantisation, IEE Proc-Vision, Image, and Signal Processing, Vol 142, No 5, pages 313–318, 1995.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Staffordshire University, Beaconside, Stafford, ST18 0DG, UK
Claude C. Chibelushi

Authors

Claude C. Chibelushi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing and Technology, Nottingham Trent University, Burton Street, NG1 4BU, Nottingham, UK
Ahamad Lotfi
Automated Scheduling, Optimisation and Planning Group, School of Computer Science and IT, Universtiy of Nottingham, Jubilee Campus, Wollaton Road, NG8 1BB, Nottingham, UK
Jonathan M. Garibaldi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chibelushi, C.C. (2004). Fuzzy Audio-Visual Feature Maps for Speaker Identification. In: Lotfi, A., Garibaldi, J.M. (eds) Applications and Science in Soft Computing. Advances in Soft Computing, vol 24. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45240-9_43

Download citation

DOI: https://doi.org/10.1007/978-3-540-45240-9_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40856-7
Online ISBN: 978-3-540-45240-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics