Finding Speaker Position Under Difficult Acoustic Conditions

  • Evgeniy Shuranov
  • Aleksandr Lavrentyev
  • Alexey Kozlyaev
  • Galina Lavrentyeva
  • Valeriya Volkovaya
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811)

Abstract

In this paper are presented different approaches for speaker position identification that use a microphone array and known voice models. Comparison of speaker positioning is performed by using acoustic maps based on FBF and PHAT. The goal of the experiments is to find best algorithm parameters and their approbation for different types of noises. The proposed approaches allows for better results in automatic positioning under noisy conditions. It enables to identify the target speaker whose speech duration is longer than 10 s.

Keywords

Microphone array Acoustic map Speech enhancement 

Notes

Acknowledgements

This work was partially financially supported by the Govern-ment of the Russian Federation, Grant 074-U01.

References

  1. 1.
    Ba, D.E., Florencio, D., Zhang, C.: Enhanced MVDR beamforming for arrays of directional microphones. In: IEEE International Conference on Multimedia and Expo, pp. 1307–131 (2007)Google Scholar
  2. 2.
    Kudashev, O., Novoselov, S., Pekhovsky, T., Simonchik, K., Lavrentyeva, G.: Usage of DNN in speaker recognition: advantages and problems. In: To be Appear in Proceedings of the 13th International Symposium on Neural Networks (2016)Google Scholar
  3. 3.
    Kenny, P., et al.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P., Alam, J.: Deep neural networks for extracting baum-welch statistics for speaker recognition. In: Odyssey: The Speaker and Language Recognition Workshop (2014). http://cs.uef.fi/odyssey2014/program/pdfs/28.pdf
  5. 5.
    Pekhovsky, T., Novoselov, S., Sholohov, A., Kudashev, O.: On autoencoders in the i-vector space for speaker recognition. In: Odyssey (2016)Google Scholar
  6. 6.
    Lei, Y., Scheffer, N., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically aware deep neural network. In: IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 1695–1699 (2014)Google Scholar
  7. 7.
    Stafylakis, T., Kenny, P., Senoussaoui, M., Dumouchel, P.: PLDA using gaussian restricted boltzmann machines with application to speaker recognition. In: 13th Annual International Conference Speech Communications Association, pp. 1692–1696 (2012)Google Scholar
  8. 8.
    Novoselov, S., Pekhovsky, T., Kudashev, O., Mendelev, V., Prudnikov, A.: Non-linear PLDA for i-vector speaker verification. In: Interspeech-2015, pp. 214–218 (2015)Google Scholar
  9. 9.
    Prince, S.J.D., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007)Google Scholar
  10. 10.
    Schmidt, M. N., Olsson, R. K.: Single-channel speech separation using sparse non-negative matrix factorization. In: International Conference on Spoken Language Processing (2006)Google Scholar
  11. 11.
    Fischer, S., Kammeyer, K., Simmer, K.: Adaptive microphone arrays for speech en-hancement in coherent and incoherent noise fields. In: 3rd meeting of the Acoustical Society of America and the Acoustical Society of Japan, pp. 1–30 (1996)Google Scholar
  12. 12.
    Busso, C., Hernanz, S., Chu, C.-W., Kwon, S.-I., Lee, S., Georgiou, P., Cohen, I., Narayanan, S.: Smart room: participant and speaker localization and identification. In: IEEE International Conference on Acoustics, Speech, Signal Process, pp. 1117–1120 (2015)Google Scholar
  13. 13.
    Khalidov, V., Forbes, F., Hansard, M., Arnaud, E., Horaud, R.: Audio-visual clustering for multiple speaker localization. In: 5th International Workshop on Machine Learning for Multimodal Interaction (2008)Google Scholar
  14. 14.
    Knapp, C.H., Carter, G.C.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Evgeniy Shuranov
    • 1
  • Aleksandr Lavrentyev
    • 1
  • Alexey Kozlyaev
    • 1
  • Galina Lavrentyeva
    • 1
  • Valeriya Volkovaya
    • 1
    • 2
  1. 1.Speech Technology CenterSt. PetersburgRussia
  2. 2.ITMO UniversitySt. PetersburgRussia

Personalised recommendations