Skip to main content

Online Clustering of Narrowband Position Estimates with Application to Multi-speaker Detection and Tracking

  • Conference paper
  • First Online:
Advances in Machine Learning and Signal Processing

Abstract

Speaker detection, localization and tracking are required in systems that involve e.g. hands-free speech acquisition, or blind source separation. Localization can be done in the (TF) domain, where location features extracted using microphone arrays are used to cluster the TF bins corresponding to the same source. The TF clustering approaches provide an alternative to the Bayesian tracking approaches that are based on Kalman and particle filters. In this work, we propose a maximum-likelihood approach where detection, localization, and tracking are achieved by online clustering of narrowband position estimates, while incorporating the speech presence probability at each TF bin in a unified manner.

A joint institution of the University Erlangen-Nuremberg and Fraunhofer IIS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fallon FC, Godsill JS (2012) Acoustic source localization and tracking of a time-varying number of speakers. IEEE Trans Audio Speech Lang Process 20:1409–1415

    Google Scholar 

  2. Gehrig T, Klee U, McDonough J, Ikbal S, Wölfel M, Fügen C (2006) Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters. Interspeech

    Google Scholar 

  3. Yilmaz O, Rickard S (2004) Blind separation of speech mixture via time-frequency masking. IEEE Trans Signal Process 52:1830–1847

    Google Scholar 

  4. Mandel M, Ellis D, Jebara M (2006) An EM algorithm for localizing multiple sound sources in reveberant environtments Proceedings of Neural Information Processing System

    Google Scholar 

  5. Schwartz O, Gannot S (2014) Speaker tracking using recursive EM algorithms. IEEE Trans Audio Speech Lang Process 22:392–402

    Google Scholar 

  6. Loesch B, Yang B (2008) Source number estimation and clustering for underdetermined blind source separation. Proceedings of international workshop on acoustic signal enhancement

    Google Scholar 

  7. Madhu N, Martin R (2011) A versatile framework for speaker separation using a model-based speaker localization approach. IEEE Trans Audio Speech Lang Process 19:1900–1912

    Google Scholar 

  8. Taseska M, Habets EAP (2014) Informed spatial filtering with distributed arrays. IEEE Trans Audio Speech Lang Process 22:1195–1207

    Google Scholar 

  9. Souden M, Kinoshita K, Delcroix M, Nakatani T (2014) Location feature integration for clustering-based speech separation in distributed microphone arrays. IEEE Trans Audio Speech Lang Process 22:354–367

    Google Scholar 

  10. Plinge A, Fink GA (2014) Multi-speaker tracking using multiple distributed microphone arrays. Proceedings of IEEE international conference on acoustics, speech and signal processing

    Google Scholar 

  11. Taseska M, Habets EAP (2013) An online EM algorithm for source extraction using distributed microphone arrays. Proceedings of European signal processing conference

    Google Scholar 

  12. Lehmann EA, Johansson AM (2007) Particle filter with integrated voice activity detection for acoustic source tracking. EURASIP J Appl Signal Process

    Google Scholar 

  13. Araki S, Sawada H, Mukai SMR (2006) DOA estimation for multiple sparse sources with normalized observation vector clustering. Proceedings of IEEE international conference on acoustics, speech and signal processing

    Google Scholar 

  14. Taseska M, Habets EAP (2012) MMSE-based blind source extraction in diffuse noise fields using a complex coherence-based a priori SAP estimator. Proceedings of international workshop acoustic signal enhancement

    Google Scholar 

  15. Demptster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc 39:1–38

    Google Scholar 

  16. Bar-Shalom Y (2001) Estimation with applications to tracking and navigation. Wiley & Sons

    Google Scholar 

  17. Neal RM, Hinton GE (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models, Kluwer Academic Publishers, p 355–368

    Google Scholar 

  18. Habets EAP, Gannot S (2007) MATLAB implementation for: generating sensor signals in isotropic noise fields. [Online]. Available: https://www.audiolabs-erlangen.de/fau/professor/habets/software/noise-generators

  19. Habets EAP Available: http://www.audiolabs-erlangen.de/fau/professor/habets/software/signal-generator

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maja Taseska .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Taseska, M., Lamani, G., Habets, E.A.P. (2016). Online Clustering of Narrowband Position Estimates with Application to Multi-speaker Detection and Tracking. In: Soh, P., Woo, W., Sulaiman, H., Othman, M., Saat, M. (eds) Advances in Machine Learning and Signal Processing. Lecture Notes in Electrical Engineering, vol 387. Springer, Cham. https://doi.org/10.1007/978-3-319-32213-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32213-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32212-4

  • Online ISBN: 978-3-319-32213-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics