Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise

Cao, Jie; Li, Jun; Li, Wei

doi:10.1007/978-3-642-38466-0_25

Jie Cao^3,4,
Jun Li⁵ &
Wei Li⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 256))

2177 Accesses

Abstract

In order to meet the high precision, strong robust demands of speaker tracking system, this paper proposed a new Particle filter algorithm with unknown noise statistic characteristics. The proposed algorithm estimate and correct the statistic characteristics of the unknown noise on-line by improved Sage-Husa estimator, and produce optimal distribution function with unscented Kalman filter. Finally, it realized speaker tracking problem based on audio-visual fusion in the framework of the new algorithm. Experiment results show that the method proposed in this paper has enhanced the accuracy and robustness of speaker tracking system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Particle Flow SMC-PHD Filter for Audio-Visual Multi-speaker Tracking

Target Tracking Based on Audio and Video Information Fusion

Distributed Multiple Speaker Tracking Based on Unscented Particle Filter and Data Association in Microphone Array Networks

Article 06 October 2021

References

Cobos M, Lopez JJ, Martinez D (2011) Two-microphone multi-speaker localization based on a Laplacian mixture model. Digital Signal Proc 21(1):66–76
Article Google Scholar
Shivappa ST, Trivedi M, Rao D (2011) Audio-visual Information fusion in human computer interfaces and intelligent environments: a survey. IEEE Proc 98(10):1680–1691
Google Scholar
Shivappa ST, Rao BD, Trivedi MM (2010) Audio-visual fusion and tracking with multilevel iterative decoding: framework and experimental evaluation. IEEE J Sel Top Signal Proc 4(5):882–894
Article Google Scholar
Gatica-Perez D, Lathoud G, McCowan I, Odobez J-M, Moore D (2003) Image processing. In: Proceedings of ICIP 2003 international conference, vol 3, issue 2, pp 5–8
Google Scholar
Perez DG, Lathoud G, Odobez JM, Cowan IM (2007) Audio-visual probabilistic tracking of multiple speakers in meetings. IEEE Trans Audio Speech Lang Process 15(2):601–615
Article Google Scholar
Checka N, Wilson KW, Siracusa MR et al (2004) Multiple person and speaker activity tracking with a particle filter. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, May 2004
Google Scholar
Gordon NJ, Salmond DJ, Smith AFM (1993) Novel approach to nonlinear/non-gaussian bayesian state estimation. IEEE Proc Radar Signal Process 140(2):107–113
Google Scholar
Julier SJ, Uhlmann JK (2004) Unscented filtering and nonlinear estimation. Proc IEEE 92(3):401–422
Google Scholar
Wan EA, Merwe R (2000) The unscented Kalman filter for nonlinear estimation. In: Proceedings of the international symposium on adaptive systems for signal processing, communications and control, Alberta, Canada, pp 153–158
Google Scholar
Shi Y, Han C-Z (2011) Adaptive UKF method with applications to target tracking. Acta Automatica Sin 37(6):755–759 (in Chinese)
Google Scholar
Cao J, Zheng J (2012) Speaker tracking based on audio-video information fusion. Comput Eng Appl 48(13):118–124 (in Chinese)
Google Scholar
Blauth MV, Claudio PJ et al (2012) Voice activity detection and speaker localization using audiovisual cues. Pattern Recogn Lett 33(4):373–380
Google Scholar

Download references

Acknowledgments

The research was supported by Nation Natural Science Foundation of China (61263031), Natural Science Foundation of Gansu province of China (1010RJZA046).

Author information

Authors and Affiliations

Manufacturing Engineering Technology Research Center of Gansu, Lanzhou, 730050, China
Jie Cao
College of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, China
Jie Cao
College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou, 730050, China
Jun Li & Wei Li

Authors

Jie Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jun Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Li .

Editor information

Editors and Affiliations

, Department of Computer Science, Tsinghua University, Qinghua, Beijing, 100084, China, People's Republic
Zengqi Sun
Tsinghua University, Qinghua, Beijing, 100084, China, People's Republic
Zhidong Deng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cao, J., Li, J., Li, W. (2013). Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise. In: Sun, Z., Deng, Z. (eds) Proceedings of 2013 Chinese Intelligent Automation Conference. Lecture Notes in Electrical Engineering, vol 256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38466-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-38466-0_25
Published: 28 June 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38465-3
Online ISBN: 978-3-642-38466-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise

Abstract

Access this chapter

Similar content being viewed by others

Particle Flow SMC-PHD Filter for Audio-Visual Multi-speaker Tracking

Target Tracking Based on Audio and Video Information Fusion

Distributed Multiple Speaker Tracking Based on Unscented Particle Filter and Data Association in Microphone Array Networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise

Abstract

Access this chapter

Similar content being viewed by others

Particle Flow SMC-PHD Filter for Audio-Visual Multi-speaker Tracking

Target Tracking Based on Audio and Video Information Fusion

Distributed Multiple Speaker Tracking Based on Unscented Particle Filter and Data Association in Microphone Array Networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation