Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification

Wu, Zhiyong; Cai, Lianhong; Meng, Helen M.

doi:10.1007/978-3-540-37258-5_144

Zhiyong Wu^3,4,
Lianhong Cai⁴ &
Helen M. Meng³

Part of the book series: Lecture Notes in Control and Information Sciences ((LNCIS,volume 345))

Abstract

This paper investigates the estimation of fusion weights under varying acoustic noise conditions for audio-visual multi-level hybrid fusion strategy in speaker identification. The multi-level fusion combines model level and decision level fusion via dynamic Bayesian networks (DBNs). A novel methodology known as support vector regression (SVR) is utilized to estimate the fusion weights directly from audio features; Sigma-Pi network sampling method is also incorporated to reduce feature dimensions. Experiments on the homegrown Chinese database and CMU English database both demonstrate that the method improves the accuracies of audio-visual bimodal speaker identification under dynamically varying acoustic noise conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Bimodal fusion of visual and speech data for audiovisual speaker recognition in noisy environment

Article 22 May 2023

Effective use of combined excitation source and vocal-tract information for speaker recognition tasks

Article 29 October 2018

An Algorithm for Calculating the Contribution of Acoustic Features in Speaker Recognition

References

Senior, A., Neti, C., Maison, B.: On the Use of Visual Information for Improving Audiobased Speaker Recognition. In: Audio-visual Speech Processing Conf. (1999) 108–111
Google Scholar
Nefian, A.V., Liang, L.H., Fu, T.Y., Liu, X.X.: A Bayesian Approach to Audio-Visual Speaker Identification. In: Proc. 4th Int. Conf. AVBPA, Vol. 2688 (2003) 761–769
MATH Google Scholar
Chibelushi, C.C., Deravi, F., Mason, J.S.D.: A Review of Speech-based Bimodal Recognition. IEEE Trans. Multimedia 4 (2002) 23–37
Article Google Scholar
Wu, Z.Y., Cai, L.H., Meng, M.H.: Multi-level Fusion of Audio and Visual Features for Speaker Identification. In: Proc. Int. Conf. Biometrics, LNCS 3832 (2006) 493–499
Google Scholar
Scholkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New Support Vector Algorithms. Neural Computation 12 (2000) 1083–1121
Article Google Scholar
Gramß, T., Strube, H.W.: Recognition of Isolated Words based on Psychoacoustics and Neurobiology. Speech Communication 9 (1990) 35–40
Article Google Scholar
Chen, T.: Audiovisual Speech Processing. IEEE Trans. Signal Processing 18 (2001) 9–21
MATH Google Scholar
Bilmes, J., Zweig, G.: The Graphical Models Toolkit: An Open Source Software System for Speech and Time-series Processing. In: Proc. Int. Conf. ICASSP. (2002) 3916–3919
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
Zhiyong Wu & Helen M. Meng
Department of Computer, Tsinghua University, Beijing, 100084, China
Zhiyong Wu & Lianhong Cai

Authors

Zhiyong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lianhong Cai
View author publications
You can also search for this author in PubMed Google Scholar
Helen M. Meng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui, China
De-Shuang Huang
Queen’s University, Belfast, UK
Kang Li & George William Irwin &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wu, Z., Cai, L., Meng, H.M. (2006). Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification. In: Huang, DS., Li, K., Irwin, G.W. (eds) Intelligent Computing in Signal Processing and Pattern Recognition. Lecture Notes in Control and Information Sciences, vol 345. Springer, Berlin, Heidelberg . https://doi.org/10.1007/978-3-540-37258-5_144

Download citation

DOI: https://doi.org/10.1007/978-3-540-37258-5_144
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37257-8
Online ISBN: 978-3-540-37258-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification

Abstract

Access this chapter

Preview

Similar content being viewed by others

Bimodal fusion of visual and speech data for audiovisual speaker recognition in noisy environment

Effective use of combined excitation source and vocal-tract information for speaker recognition tasks

An Algorithm for Calculating the Contribution of Acoustic Features in Speaker Recognition

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification

Abstract

Access this chapter

Preview

Similar content being viewed by others

Bimodal fusion of visual and speech data for audiovisual speaker recognition in noisy environment

Effective use of combined excitation source and vocal-tract information for speaker recognition tasks

An Algorithm for Calculating the Contribution of Acoustic Features in Speaker Recognition

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation