Skip to main content

Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise

  • Conference paper
  • First Online:
Proceedings of 2013 Chinese Intelligent Automation Conference

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 256))

  • 2177 Accesses

Abstract

In order to meet the high precision, strong robust demands of speaker tracking system, this paper proposed a new Particle filter algorithm with unknown noise statistic characteristics. The proposed algorithm estimate and correct the statistic characteristics of the unknown noise on-line by improved Sage-Husa estimator, and produce optimal distribution function with unscented Kalman filter. Finally, it realized speaker tracking problem based on audio-visual fusion in the framework of the new algorithm. Experiment results show that the method proposed in this paper has enhanced the accuracy and robustness of speaker tracking system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cobos M, Lopez JJ, Martinez D (2011) Two-microphone multi-speaker localization based on a Laplacian mixture model. Digital Signal Proc 21(1):66–76

    Article  Google Scholar 

  2. Shivappa ST, Trivedi M, Rao D (2011) Audio-visual Information fusion in human computer interfaces and intelligent environments: a survey. IEEE Proc 98(10):1680–1691

    Google Scholar 

  3. Shivappa ST, Rao BD, Trivedi MM (2010) Audio-visual fusion and tracking with multilevel iterative decoding: framework and experimental evaluation. IEEE J Sel Top Signal Proc 4(5):882–894

    Article  Google Scholar 

  4. Gatica-Perez D, Lathoud G, McCowan I, Odobez J-M, Moore D (2003) Image processing. In: Proceedings of ICIP 2003 international conference, vol 3, issue 2, pp 5–8

    Google Scholar 

  5. Perez DG, Lathoud G, Odobez JM, Cowan IM (2007) Audio-visual probabilistic tracking of multiple speakers in meetings. IEEE Trans Audio Speech Lang Process 15(2):601–615

    Article  Google Scholar 

  6. Checka N, Wilson KW, Siracusa MR et al (2004) Multiple person and speaker activity tracking with a particle filter. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, May 2004

    Google Scholar 

  7. Gordon NJ, Salmond DJ, Smith AFM (1993) Novel approach to nonlinear/non-gaussian bayesian state estimation. IEEE Proc Radar Signal Process 140(2):107–113

    Google Scholar 

  8. Julier SJ, Uhlmann JK (2004) Unscented filtering and nonlinear estimation. Proc IEEE 92(3):401–422

    Google Scholar 

  9. Wan EA, Merwe R (2000) The unscented Kalman filter for nonlinear estimation. In: Proceedings of the international symposium on adaptive systems for signal processing, communications and control, Alberta, Canada, pp 153–158

    Google Scholar 

  10. Shi Y, Han C-Z (2011) Adaptive UKF method with applications to target tracking. Acta Automatica Sin 37(6):755–759 (in Chinese)

    Google Scholar 

  11. Cao J, Zheng J (2012) Speaker tracking based on audio-video information fusion. Comput Eng Appl 48(13):118–124 (in Chinese)

    Google Scholar 

  12. Blauth MV, Claudio PJ et al (2012) Voice activity detection and speaker localization using audiovisual cues. Pattern Recogn Lett 33(4):373–380

    Google Scholar 

Download references

Acknowledgments

The research was supported by Nation Natural Science Foundation of China (61263031), Natural Science Foundation of Gansu province of China (1010RJZA046).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cao, J., Li, J., Li, W. (2013). Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise. In: Sun, Z., Deng, Z. (eds) Proceedings of 2013 Chinese Intelligent Automation Conference. Lecture Notes in Electrical Engineering, vol 256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38466-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38466-0_25

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38465-3

  • Online ISBN: 978-3-642-38466-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics