Skip to main content

Advertisement

Log in

Performance of speaker localization using microphone array

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Speaker localization is a technique to locate and track an active speaker from multiple acoustic sources using microphone array. Microphone array is used to improve the speech quality of recorded speech signal in meeting room and other places. In this work, the time delay estimation between source and each microphone is calculated using a localization method called time differences of arrival (TDOA). TDOA localization consists of two steps namely (a) a time delay estimator and (b) a localization estimator. For time delay estimation, the generalized cross-correlation using phase transform, the generalized cross correlation using maximum likelihood, linear prediction (LP) residual and the Hilbert envelope of the LP residual are chosen for estimating the location of a person. A new speaker localization algorithm known as group search optimization (GSO) algorithm is proposed. The performance of this algorithm is analyzed and compared with Gauss–Newton nonlinear least square method and genetic algorithm. Experimental results show that the proposed GSO method outperforms the other methods in terms of mean square error, root mean square error, mean absolute error, mean absolute percentage error, euclidean distance and mean absolute relative error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  • Alexandridis, A., Griffin, A., & Mouchtaris, A. (2015). Sound source localization and isolation apparatuses, methods and systems. In Foundation for Research and Technology—Hellas (F.O.R.T.H), Institute of Computer Science (I.C.S.).

  • Carter, G. C. (1993). Coherence and time delay estimation: An applied tutorial for research, development, test and evaluation engineers. In IEEE, Press.

  • He, S., & Li, X. (2008). Application of a group search optimization based artificial neural network to machine condition monitoring. In Proceedings of IEEE international conference on emerging technologies and factory automation (ETFA).

  • He, S., Wu, Q. H., & Saunders, J. R. (2009). Group search optimizer: An optimization algorithm inspired by animal searching behavior. In IEEE transactions on evolutionary computation (vol. 13, pp. 973–990).

  • Himawan, I. (2010). Speech recognition using AD-HOC microphone array. Ph.D thesis, Queensland University of Technology Brisbane.

  • Jeannes, R. L. B., Scalart, P., Faucon, G., & Beaugeant, C. (2001). Combined noise and echo reduction in hands-free systems: A survey. IEEE Transactions Speech Audio Processing (vol. 9, no. 1/2, pp. 808–820).

  • Kawaguchi, N., Matsubara, S., Iwa, H., Kajita, S., Takeda, K., & Itakura, F. et al. (2000). Construction of speech corpus in moving car environment. In Proceedings international conference spoken language processing (vol. 3, pp. 362–365). Beijing.

  • Kepesi, M., Pernkopf, F., & Wohlmayr, M., (2007). Joint position pitch tracking for 2-channel audio. In International workshop on content based multimedia indexing. Bourdeaux.

  • Kishore, B., Satyanarayana, M. R. S., & Sujatha, K. (2013). Adaptive genetic algorithm with neural network for machinery fault detection. International Journal of Advances in Engineering and Technology, 6, 1639.

    Google Scholar 

  • Knapp, C. F., & Carter, G. C. (1976). The generalized correlation method for estimation of time delay. In IEEE Transactions on acoustic, speech and signal processing (vol. 24, pp. 320–327).

  • Lathoud, G. (2005). AV16.3: An audio-visual corpus for speaker localization and tracking. In Lectures notes in computer science.

  • Nazu, N. (2014). Locating and extracting acoustic and neural signals. Ph.D thesis, Graduate School of Wayne State University.

  • Nordholm, S., Claesson, I., & Grbiae, N. (2001). Optimal and adaptive microphone arrays for speech input in automobiles. In Digital signal processing (vol. 3, pp. 307–329). Berlin.

  • Omologo, M., Matassoni, M., & Svaizer, P. (2001). Speech recognition with microphone arrays. In Microphone arrays-signal processing techniques and application (vol. 2, pp. 331–353).

  • Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. In Speech communication (pp. 1243–1261).

  • Quazi, A. H. (1981). An overview on the time delay estimation in active and passive systems for target localization. In IEEE Transactions on acoustic, speech and signal processing (vol. 29, pp. 527–533).

  • Raykar, V. C., Yegnanarayana, B., Prasanna, M. S. R., & Duraiswami, R. (2005). Speaker localization using excitation source information in speech. In IEEE transactions speech audio processing (vol. 13, no. 5, pp. 751–761).

  • Roig, E. T. (2014). Eigenbeamforming array systems for sound source localization. Ph.D thesis, Technical University of Denmark.

  • Swamy, R. K., Sri RamaMurty, K., & Yegnanarayana, B. (2007). Determining number of speakers from multispeaker speech signals using excitation source information. In IEEE signal processing letters (vol. 14, no. 7, pp. 481–484).

  • Wang, H., & Chu, P. (1997). Voice source localization for automatic camera pointing system in videoconferencing. In Proceedings IEEE international conference acoustics, speech, signal processing (pp. 187–190). Orlando.

  • Zotkin, D., Duraiswami, R., Philomin, V., & Davis, L. (2000). Smart videoconferencing. In International conference multimedia expo (pp. 1597–2000). New York.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Visalakshi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Visalakshi, R., Dhanalakshmi, P. & Palanivel, S. Performance of speaker localization using microphone array. Int J Speech Technol 19, 467–483 (2016). https://doi.org/10.1007/s10772-016-9341-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-016-9341-9

Keywords

Navigation