Skip to main content

Fuzzy Audio-Visual Feature Maps for Speaker Identification

  • Conference paper
Applications and Science in Soft Computing

Part of the book series: Advances in Soft Computing ((AINSC,volume 24))

  • 191 Accesses

Abstract

Speech-based person recognition by machine has not reached the level of technological maturity required by some of its potential applications. The deficiencies revolve around sub-optimal pre-processing, feature extraction or selection, and classification, particularly under conditions of input data variability. The joint use of audible and visible manifestations of speech aims to alleviate these shortcomings, but the development of effective combination techniques is challenging. This paper proposes and evaluates a combination approach for speaker identification based on fuzzy modelling of acoustic and visual speaker characteristics. The proposed audio-visual model has been evaluated experimentally on a speaker identification task. The results show that the joint model outperforms its isolated components in terms of identification accuracy. In particular, the cross-modal coupling of audio-visual streams is shown to improve identification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. JC Bezdek (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum, 1981.

    Book  MATH  Google Scholar 

  2. V Chatzis, AG Bors, I Pitas (1999). Multimodal Decision-Level Fusion for Person Authentication, IEEE Trans SMC-Part A:, Vol 29, No 6, pages 674–680, 1999.

    Google Scholar 

  3. CC Chibelushi, F Deravi, JSD Mason (2002). A Review of Speech-Based Bimodal Recognition, IEEE Trans Multimedia, Vol 4, No 1, pages 23–37, 2002.

    Article  Google Scholar 

  4. CC Chibelushi, F Deravi, JSD Mason (1999). Adaptive Classifier Integration for Robust Pattern Recognition, IEEE Trans SMC—Part B:, Vol 29, No 6, 902–907, 1999.

    Google Scholar 

  5. SB Davis, P Mermelstein (1980). Comparison of Parametric Representations for Monosyllable Word Recognition in Continuously Spoken Sentences, IEEE Trans Acoustics, Speech, and Signal Processing, Vol ASSP-28, pages 357–366, 1980.

    Google Scholar 

  6. S Dupont, J Luettin (2000). Audio-Visual Speech Modeling for Continuous Speech Recognition, IEEE Trans Multimedia, Vol 2, No 3, pages 141–151, 2000.

    Google Scholar 

  7. Qi Li, Biing-Hwang Juang, Chin-Hui Lee, Qiru Zhou, FK Soong (1999). Recent Advancements in Automatic Speaker Authentication, IEEE Robotics & Automation Magazine, Vol 6, No 1, pages 24–34, 1999.

    Article  Google Scholar 

  8. AA Montgomery, PL Jackson (1983). Physical Characteristics of the Lips Underlying Vowel Lipreading Performance, J Acoust Soc of Am, Vol 73, pages 2134–2144, 1983.

    Article  Google Scholar 

  9. C Neti, et-al (2000). Audio-Visual Speech Recognition, Tech Rep, Center for Language and Speech Processing, Johns Hopkins University, 2000.

    Google Scholar 

  10. H Prade (1985). A Computational Approach to Approximate and Plausible Reasoning with Applications to Expert Systems, IEEE Trans Pattern Analysis and Machine Intelligence, Vol PAMI-7, pages 260–283, 1985.

    MATH  Google Scholar 

  11. DA Reynolds (2002). An Overview of Automatic Speaker Recognition Technology, IEEE Int’l Conf Acoustics, Speech, and Signal Processing, Vol 4, 4072–4075, 2002.

    Google Scholar 

  12. FK Soong, AE Rosenberg, LR Rabiner, BH Juang (1987). A Vector Quantization Approach to Speaker Recognition, AT & T Tech. J, Vol 66, No 2, pages 14–26, 1987.

    Google Scholar 

  13. K Yu, J Mason, J Oglesby (1995). Speaker Recognition Using Hidden Markov Models, Dynamic Time Warping, and Vector Quantisation, IEE Proc-Vision, Image, and Signal Processing, Vol 142, No 5, pages 313–318, 1995.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chibelushi, C.C. (2004). Fuzzy Audio-Visual Feature Maps for Speaker Identification. In: Lotfi, A., Garibaldi, J.M. (eds) Applications and Science in Soft Computing. Advances in Soft Computing, vol 24. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45240-9_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45240-9_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40856-7

  • Online ISBN: 978-3-540-45240-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics