Skip to main content
Log in

A Review on Sound Source Localization Systems

  • Review article
  • Published:
Archives of Computational Methods in Engineering Aims and scope Submit manuscript

Abstract

Sound Source Localization (SSL) systems focus on finding the direction of a sound source. Sound source localization is an essential feature in robots and humanoids. Research is being done for two decades to optimize SSL techniques and enhance their accuracy. Presented in this review we have categorized various proposed SSL techniques into four main types. Out of which one type is SSL that is based on conventional algorithms like Generalized Cross Correlation (GCC), Multiple Signal Classification (MUSIC), Time Difference of Arrival (TDOA), etc. using multiple microphones array configurations. The second type involves techniques based on binaural signal processing using conventional algorithms (GCC, MUSIC, TDOA). SSL techniques that fall under the third and fourth types of categories are developed recently in the last decade with the rise of Convolutional Neural Network (CNN) algorithms. The third type of SSL technique makes use of multiple microphone array configurations using CNN and the fourth type involves the most recently evolved technique based on binaural signal processing using CNN. The different SSL techniques based on multiaural and binaural signals using the conventional algorithm as well as CNN are presented in this review. The review paper provides an overview of SSL systems in terms of the number of microphones used, layouts of microphonic arrays, algorithms to perform SSL and localization in 3D space (azimuth, elevation and distance). From the review we found that out of all SSL techniques, CNN is the emerging and optimized one. By using CNN in SSL systems, the least error rate of 0.1 % was achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availibility

Not applicable

Code availability

Not applicable.

References

  1. Council NR et al (2004) Hearing loss: determining eligibility for social security benefits. Springer, New York

    Google Scholar 

  2. Smith LS (2015) Toward a neuromorphic microphone. Front Neurosci 9:398

    Article  Google Scholar 

  3. Jepsen ML, Ewert SD, Dau T (2008) A computational model of human auditory signal processing and perception. J Acoust Soc Am 124(1):422

    Article  Google Scholar 

  4. Chen JC, Yip L, Elson J, Wang H, Maniezzo D, Hudson RE, Yao K, Estrin D (2003) Coherent acoustic array processing and localization on wireless sensor networks. Proc IEEE 91(8):1154

    Article  Google Scholar 

  5. Fazenda B, Atmoko H, Gu F, Guan L, Ball A (2009) Acoustic based safety emergency vehicle detection for intelligent transport systems. In: 2009 ICCAS-SICE (IEEE), pp 4250–4255

  6. Zhou J, Miles RN (2018) Directional sound detection by sensing acoustic flow. IEEE Sens Lett 2(2):1

    Article  Google Scholar 

  7. Hoshiba K, Washizaki K, Wakabayashi M, Ishiki T, Kumon M, Bando Y, Gabriel D, Nakadai K, Okuno HG (2017) Design of UAV-embedded microphone array system for sound source localization in outdoor environments. Sensors 17(11):2535

    Article  Google Scholar 

  8. Song KT, Chen JL (2003) Sound direction recognition using a condenser microphone array. In: Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No. 03EX694), vol 3 (IEEE), vol 3, pp 1445–1450

  9. Fazenda B (2008) Localisation of sound sources using coincident microphone techniques. Proc Inst Acoust 29(7):106

    Google Scholar 

  10. Chakrabarty S, Habets EA (2017) Broadband DOA estimation using convolutional neural networks trained with noise signals. In: 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (IEEE), pp 136–140

  11. Li Q, Zhang X, Li H (2018) Online direction of arrival estimation based on deep learning. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE), pp 2616–2620

  12. Sasaki Y, Tanabe R, Takernura H (2018) Online spatial sound perception using microphone array on mobile robot. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE), pp 2478–2484

  13. Grondin F, Glass J, Sobieraj I, Plumbley MD (2019) Sound event localization and detection using CRNN on pairs of microphones. arXiv preprint arXiv:1910.10049

  14. Adavanne S, Politis A, Nikunen J, Virtanen T (2018) Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J Sel Top Signal Process 13(1):34

    Article  Google Scholar 

  15. Raspaud M, Viste H, Evangelista G (2009) Binaural source localization by joint estimation of ILD and ITD. IEEE Trans Audio Speech Lang Process 18(1):68

    Article  Google Scholar 

  16. Li D, Levinson SE (2003) A bayes-rule based hierarchical system for binaural sound source localization. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings.(ICASSP’03). Vol 5 (IEEE), pp V–521

  17. May T, Van De Par S, Kohlrausch A (2010) A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Trans Audio Speech Lang Process 19(1):1

    Article  Google Scholar 

  18. Zannini CM, Parisi R, Uncini A (2011) Binaural sound source localization in the presence of reverberation. In: 2011 17th International Conference on Digital Signal Processing (DSP) (IEEE), pp 1–6

  19. Parisi R, Camoes F, Scarpiniti M, Uncini A (2011) Cepstrum prefiltering for binaural source localization in reverberant environments. IEEE Signal Process Lett 19(2):99

    Article  Google Scholar 

  20. Pang C, Liu H, Zhang J, Li X (2017) Binaural sound localization based on reverberation weighting and generalized parametric mapping. IEEE/ACM Trans Audio Speech Lang Process 25(8):1618

    Article  Google Scholar 

  21. Rodemann T, Ince G, Joublin F, Goerick C (2008) Using binaural and spectral cues for azimuth and elevation localization. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE), pp 2185–2190

  22. Wu X, Talagala DS, Zhang W, Abhayapala TD (2015) Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE), pp 2654–2658

  23. Dietz M, Ewert SD, Hohmann V (2011) Auditory model based direction estimation of concurrent speakers from binaural signals. Speech Commun 53(5):592

    Article  Google Scholar 

  24. Chan VYS, Jin CT, van Schaik A (2012) Neuromorphic audio-visual sensor fusion on a sound-localising robot. Front Neurosci 6:21

    Article  Google Scholar 

  25. Woodruff J, Wang D (2012) Binaural localization of multiple sources in reverberant and noisy environments. IEEE Trans Audio Speech Lang Process 20(5):1503

    Article  Google Scholar 

  26. Youssef K, Argentieri S, Zarader JL (2012) A binaural sound source localization method using auditive cues and vision. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE), pp 217–220

  27. He W, Motlicek P, Odobez JM (2018) Deep neural networks for multiple speaker detection and localization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA) (IEEE), pp 74–79

  28. Pang C, Liu H, Li X (2019) Multitask learning of time-frequency CNN for sound source localization. IEEE Access 7:40725

    Article  Google Scholar 

  29. Jiang S, Wu L, Yuan P, Sun Y, Liu H (2020) Deep and CNN fusion method for binaural sound source localisation. J Eng 2020(13):511

    Article  Google Scholar 

  30. Xu Y, Afshar S, Singh RK, Wang R, van Schaik A, Hamilton TJ (2019) A binaural sound localization system using deep convolutional neural networks. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS) (IEEE), pp 1–5

  31. Liu H, Yuan P, Yang B, Wu L (2019) Robust interaural time difference estimation based on convolutional neural network. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO) (IEEE), pp 352–357

  32. Ma N, May T, Brown GJ (2017) Exploiting deep neural networks and head movements for robust binaural localization of multiple sources in reverberant environments. IEEE/ACM Trans Audio Speech Lang Process 25(12):2444

    Article  Google Scholar 

  33. Vecchiotti P, Ma N, Squartini S, Brown GJ (2019) End-to-end binaural sound localisation from the raw waveform. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE), pp 451–455

  34. Opochinsky R, Laufer-Goldshtein B, Gannot S, Chechik G (2019) Deep ranking-based sound source localization. In: 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (IEEE), pp 283–287

  35. Wang J, Wang J, Qian K, Xie X, Kuang J (2020) Binaural sound localization based on deep neural network and affinity propagation clustering in mismatched HRTF condition. EURASIP J Audio Speech Music Process 2020(1):4

    Article  Google Scholar 

  36. Bianco MJ, Gannot S, Gerstoft P (2020) Semi-supervised source localization with deep generative modeling. In: 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP) (IEEE), pp 1–6

  37. Nguyen Q, Girin L, Bailly G, Elisei F, Nguyen DC (2018) Autonomous sensorimotor learning for sound source localization by a humanoid robot. In: Workshop on Crossmodal Learning for Intelligent Robotics in conjunction with IEEE/RSJ IROS

  38. Choi J, Chang JH (2020) Convolutional Neural Network-based Direction-of-Arrival Estimation using Stereo Microphones for Drone. In: 2020 International Conference on Electronics, Information, and Communication (ICEIC) (IEEE), pp 1–5

Download references

Acknowledgements

Authors would like to thank University of Mumbai for the funding and K. J. Somaiya College of Engineering for all the facilities and infrastructure provided.

Funding

Funding was received from University of Mumbai under Minor Research Project scheme ATD/ICD/2019-20/762, Project number:1070, Date:17 March, 2020.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization was done by Ninad Mehendale (NM). The formal analysis was performed by Dhwani Desai (DD) and NM. Manuscript writing- original draft preparation was done by DD. Review and editing was done by NM. Visualization work was carried out by DD and NM.

Corresponding author

Correspondence to Ninad Mehendale.

Ethics declarations

Conflict of interest

Authors D. Desai and N. Mehendale declare that there has been no conflict of interest.

Ethical approval

All authors consciously assure that the manuscript fulfills the following statements: 1) This material is the authors’ own original work, which has not been previously published elsewhere. 2) The paper is not currently being considered for publication elsewhere. 3) The paper reflects the authors’own research and analysis in a truthful and complete manner. 4) The paper properly credits the meaningful contributions of co-authors and co-researchers. 5) The results are appropriately placed in the context of prior and existing research.

Consent to participate

This article does not contain any studies with animals or humans performed by any of the authors. Informed consent was not required as there were no human participants. All the necessary permissions were obtained from Institute Ethical committee and concerned authorities.

Consent for publication

Authors have taken all the necessary consents for publication from participants wherever required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Desai, D., Mehendale, N. A Review on Sound Source Localization Systems. Arch Computat Methods Eng 29, 4631–4642 (2022). https://doi.org/10.1007/s11831-022-09747-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11831-022-09747-2

Navigation