Abstract
Speaker detection, localization and tracking are required in systems that involve e.g. hands-free speech acquisition, or blind source separation. Localization can be done in the (TF) domain, where location features extracted using microphone arrays are used to cluster the TF bins corresponding to the same source. The TF clustering approaches provide an alternative to the Bayesian tracking approaches that are based on Kalman and particle filters. In this work, we propose a maximum-likelihood approach where detection, localization, and tracking are achieved by online clustering of narrowband position estimates, while incorporating the speech presence probability at each TF bin in a unified manner.
A joint institution of the University Erlangen-Nuremberg and Fraunhofer IIS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fallon FC, Godsill JS (2012) Acoustic source localization and tracking of a time-varying number of speakers. IEEE Trans Audio Speech Lang Process 20:1409–1415
Gehrig T, Klee U, McDonough J, Ikbal S, Wölfel M, Fügen C (2006) Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters. Interspeech
Yilmaz O, Rickard S (2004) Blind separation of speech mixture via time-frequency masking. IEEE Trans Signal Process 52:1830–1847
Mandel M, Ellis D, Jebara M (2006) An EM algorithm for localizing multiple sound sources in reveberant environtments Proceedings of Neural Information Processing System
Schwartz O, Gannot S (2014) Speaker tracking using recursive EM algorithms. IEEE Trans Audio Speech Lang Process 22:392–402
Loesch B, Yang B (2008) Source number estimation and clustering for underdetermined blind source separation. Proceedings of international workshop on acoustic signal enhancement
Madhu N, Martin R (2011) A versatile framework for speaker separation using a model-based speaker localization approach. IEEE Trans Audio Speech Lang Process 19:1900–1912
Taseska M, Habets EAP (2014) Informed spatial filtering with distributed arrays. IEEE Trans Audio Speech Lang Process 22:1195–1207
Souden M, Kinoshita K, Delcroix M, Nakatani T (2014) Location feature integration for clustering-based speech separation in distributed microphone arrays. IEEE Trans Audio Speech Lang Process 22:354–367
Plinge A, Fink GA (2014) Multi-speaker tracking using multiple distributed microphone arrays. Proceedings of IEEE international conference on acoustics, speech and signal processing
Taseska M, Habets EAP (2013) An online EM algorithm for source extraction using distributed microphone arrays. Proceedings of European signal processing conference
Lehmann EA, Johansson AM (2007) Particle filter with integrated voice activity detection for acoustic source tracking. EURASIP J Appl Signal Process
Araki S, Sawada H, Mukai SMR (2006) DOA estimation for multiple sparse sources with normalized observation vector clustering. Proceedings of IEEE international conference on acoustics, speech and signal processing
Taseska M, Habets EAP (2012) MMSE-based blind source extraction in diffuse noise fields using a complex coherence-based a priori SAP estimator. Proceedings of international workshop acoustic signal enhancement
Demptster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc 39:1–38
Bar-Shalom Y (2001) Estimation with applications to tracking and navigation. Wiley & Sons
Neal RM, Hinton GE (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models, Kluwer Academic Publishers, p 355–368
Habets EAP, Gannot S (2007) MATLAB implementation for: generating sensor signals in isotropic noise fields. [Online]. Available: https://www.audiolabs-erlangen.de/fau/professor/habets/software/noise-generators
Habets EAP Available: http://www.audiolabs-erlangen.de/fau/professor/habets/software/signal-generator
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Taseska, M., Lamani, G., Habets, E.A.P. (2016). Online Clustering of Narrowband Position Estimates with Application to Multi-speaker Detection and Tracking. In: Soh, P., Woo, W., Sulaiman, H., Othman, M., Saat, M. (eds) Advances in Machine Learning and Signal Processing. Lecture Notes in Electrical Engineering, vol 387. Springer, Cham. https://doi.org/10.1007/978-3-319-32213-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-32213-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32212-4
Online ISBN: 978-3-319-32213-1
eBook Packages: EngineeringEngineering (R0)