Skip to main content

Concurrent speakers localization using blind source separation and microphone array geometry

Abstract

Speaker localization has been an active topic of research due to its wide range of applications in multimedia and communication technologies. While traditional blind source separation algorithms are robust in reverberant environments, they are generally unable to localize more than two concurrent speakers. In this paper, a novel method for localization of concurrent speakers using blind source separation by exploiting microphone array geometry is presented. In this work, we used the TRINICON BSS (Buchner et al., in: 2004 IEEE international conference on acoustics, speech, and signal processing, IEEE, 2004) algorithm as the baseline for determining the raw direction of arrival estimates, the results have shown that the proposed algorithm is capable of localizing up to three concurrent speakers successfully by exploiting the redundancy in the microphone array. The algorithm is evaluated in real-world environments with background noise and reverberations such as computer labs and meeting rooms. The localization results were compared with the well-known Steered-Response Power Phase Transform (SRP-PHAT) algorithm using the root mean square error as an evaluation metric. The results for the two speakers and three concurrent speaker scenarios show that the proposed algorithm is more stable and robust as compared to the SRP-PHAT. Moreover, the proposed algorithm also shows the potential to track multiple simultaneous moving speakers, hence it can be used as a front-end by a speaker tracking algorithm.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. 1.

    https://github.com/UmairKhanUET/bss-concurrentspeaker-localization/tree/master/corpus.

  2. 2.

    http://wiki.seeedstudio.com/ReSpeaker_Core_v2.0/.

References

  1. Brendel, A., Gannot, S., & Kellermann, W. (2018). Localization of multiple simultaneously active speakers in an acoustic sensor network. In 2018 IEEE 10th sensor array and multichannel signal processing workshop (SAM) (pp. 450–454). IEEE.

  2. Brendel, A., & Kellermann, W. (2017). Localization of multiple simultaneously active sources in acoustic sensor networks using ADP. In 2017 IEEE 7th international workshop on computational advances in multi-sensor adaptive processing (CAMSAP) (pp. 1–5). IEEE.

  3. Buchner, H., Aichner, R., & Kellermann, W. (2004). Trinicon: A versatile framework for multichannel blind signal processing. In 2004 IEEE international conference on acoustics, speech, and signal processing (Vol. 3, pp. 889–892). IEEE.

  4. DiBiase, J., Silverman, H., & Brandstein, M. (2001). Microphone arrays: Signal processing techniques and applications. In Robust localization in reverberant rooms (pp. 157–180). Springer.

  5. Ester, M., Kriegel, HP., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. 1996 Int. Conf. knowledge discovery and data mining (KDD’96) (pp 226–231).

  6. Evers, C., Dorfan, Y., Gannot, S., & Naylor, P. A. (2017). Source tracking using moving microphone arrays for robot audition. 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6145–6149). IEEE.

  7. Firdaus, S., & Uddin, M. A. (2015). A survey on clustering algorithms and complexity analysis. International Journal of Computer Science Issues, 12(2), 62.

    Google Scholar 

  8. Jian, M., Kot, AC., & Er, M. (1998). Doa estimation of speech source with microphone arrays. In Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (ISCAS’98) (Cat. No. 98CH36187) (Vol. 5, pp. 293–296). IEEE.

  9. Kim, U. H., Nakadai, K., & Okuno, H. G. (2013). Improved sound source localization and front-back disambiguation for humanoid robots with two ears. In International conference on industrial, engineering and other applications of applied intelligent systems (pp. 282–291). Springer.

  10. Kondo, K., Mizuno, Y., Nishino, T., & Takeda, K. (2012). Practically efficient blind speech separation using frequency band selection based on magnitude squared coherence and a small dodecahedral microphone array. Journal of Electrical and Computer Engineering, 2012, 1–11.

    MathSciNet  Article  Google Scholar 

  11. Lombard, A., Zheng, Y., Buchner, H., & Kellermann, W. (2010). TDOA estimation for multiple sound sources in noisy and reverberant environments using broadband independent component analysis. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1490–1503.

    Article  Google Scholar 

  12. Lu, Y. C., & Cooke, M. (2011). Motion strategies for binaural localisation of speech sources in azimuth and distance by artificial listeners. Speech Communication, 53(5), 622–642.

    Article  Google Scholar 

  13. Makino, S., Lee, T. W., & Sawada, H. (2007). Blind speech separation. Springer.

  14. Mandel, M. I., & Barker, J. (2016). Multichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditions. In INTERSPEECH, ISCA (pp. 1991–1995)

  15. Marković, I., & Petrović, I. (2010). Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering. Robotics and Autonomous Systems, 58(11), 1185–1196.

    Article  Google Scholar 

  16. McDonough Jr, J. W., Leutnant, V. S., Krishna, S. V. S. S. R., & Matsoukas, S., et al. (2017). Determining speaker direction using a spherical microphone array. US Patent 9,560,441

  17. Nadiri, O., & Rafaely, B. (2014). Localization of multiple speakers under high reverberation using a spherical microphone array and the direct-path dominance test. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1494–1505.

    Article  Google Scholar 

  18. Nogueira, L. C., & Petraglia, M. R. (2015). Robust localization of multiple sound sources based on BSS algorithms. In 2015 IEEE 24th international symposium on industrial electronics (ISIE) (pp. 579–583). IEEE.

  19. Rickard, S. (2006). Sparse sources are separated sources. In 2006 14th European signal processing conference (pp. 1–5). IEEE.

  20. Schwartz, O., Dorfan, Y., Habets, E. A., & Gannot, S. (2016). Multi-speaker DOA estimation in reverberation conditions using expectation-maximization. In 2016 IEEE international workshop on acoustic signal enhancement (IWAENC) (pp. 1–5). IEEE.

  21. Schwartz, O., Dorfan, Y., Taseska, M., Habets, E. A., & Gannot, S. (2017). DOA estimation in noisy environment with unknown noise power using the EM algorithm. In 2017 Hands-free speech communications and microphone arrays (HSCMA) (pp 86–90). IEEE.

  22. Schwartz, O., & Gannot, S. (2013). Speaker tracking using recursive EM algorithms. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2), 392–402.

    Article  Google Scholar 

  23. Strobel, N., Spors, S., & Rabenstein, R. (2001). Joint audio-video object localization and tracking. IEEE Signal Processing Magazine, 18(1), 22–31.

    Article  Google Scholar 

  24. Wang, L., Reiss, J. D., & Cavallaro, A. (2016). Over-determined source separation and localization using distributed microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9), 1573–1588.

    Article  Google Scholar 

  25. Zohourian, M., & Martin, R. (2016). Binaural speaker localization and separation based on a joint ITD/ILD model and head movement tracking. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 430–434). IEEE.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Muhammad Umair Khan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Khan, M.U., Habib, T. Concurrent speakers localization using blind source separation and microphone array geometry. Multidim Syst Sign Process 32, 1159–1184 (2021). https://doi.org/10.1007/s11045-021-00776-x

Download citation

Keywords

  • Source localization
  • Blind source Separation
  • Microphone array
  • Direction of arrival
  • Time difference of arrival