Skip to main content

Finding Speaker Position Under Difficult Acoustic Conditions

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9811))

Included in the following conference series:

  • 2201 Accesses

Abstract

In this paper are presented different approaches for speaker position identification that use a microphone array and known voice models. Comparison of speaker positioning is performed by using acoustic maps based on FBF and PHAT. The goal of the experiments is to find best algorithm parameters and their approbation for different types of noises. The proposed approaches allows for better results in automatic positioning under noisy conditions. It enables to identify the target speaker whose speech duration is longer than 10 s.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ba, D.E., Florencio, D., Zhang, C.: Enhanced MVDR beamforming for arrays of directional microphones. In: IEEE International Conference on Multimedia and Expo, pp. 1307–131 (2007)

    Google Scholar 

  2. Kudashev, O., Novoselov, S., Pekhovsky, T., Simonchik, K., Lavrentyeva, G.: Usage of DNN in speaker recognition: advantages and problems. In: To be Appear in Proceedings of the 13th International Symposium on Neural Networks (2016)

    Google Scholar 

  3. Kenny, P., et al.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)

    Article  MathSciNet  Google Scholar 

  4. Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P., Alam, J.: Deep neural networks for extracting baum-welch statistics for speaker recognition. In: Odyssey: The Speaker and Language Recognition Workshop (2014). http://cs.uef.fi/odyssey2014/program/pdfs/28.pdf

  5. Pekhovsky, T., Novoselov, S., Sholohov, A., Kudashev, O.: On autoencoders in the i-vector space for speaker recognition. In: Odyssey (2016)

    Google Scholar 

  6. Lei, Y., Scheffer, N., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically aware deep neural network. In: IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 1695–1699 (2014)

    Google Scholar 

  7. Stafylakis, T., Kenny, P., Senoussaoui, M., Dumouchel, P.: PLDA using gaussian restricted boltzmann machines with application to speaker recognition. In: 13th Annual International Conference Speech Communications Association, pp. 1692–1696 (2012)

    Google Scholar 

  8. Novoselov, S., Pekhovsky, T., Kudashev, O., Mendelev, V., Prudnikov, A.: Non-linear PLDA for i-vector speaker verification. In: Interspeech-2015, pp. 214–218 (2015)

    Google Scholar 

  9. Prince, S.J.D., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007)

    Google Scholar 

  10. Schmidt, M. N., Olsson, R. K.: Single-channel speech separation using sparse non-negative matrix factorization. In: International Conference on Spoken Language Processing (2006)

    Google Scholar 

  11. Fischer, S., Kammeyer, K., Simmer, K.: Adaptive microphone arrays for speech en-hancement in coherent and incoherent noise fields. In: 3rd meeting of the Acoustical Society of America and the Acoustical Society of Japan, pp. 1–30 (1996)

    Google Scholar 

  12. Busso, C., Hernanz, S., Chu, C.-W., Kwon, S.-I., Lee, S., Georgiou, P., Cohen, I., Narayanan, S.: Smart room: participant and speaker localization and identification. In: IEEE International Conference on Acoustics, Speech, Signal Process, pp. 1117–1120 (2015)

    Google Scholar 

  13. Khalidov, V., Forbes, F., Hansard, M., Arnaud, E., Horaud, R.: Audio-visual clustering for multiple speaker localization. In: 5th International Workshop on Machine Learning for Multimodal Interaction (2008)

    Google Scholar 

  14. Knapp, C.H., Carter, G.C.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially financially supported by the Govern-ment of the Russian Federation, Grant 074-U01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleksandr Lavrentyev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Shuranov, E., Lavrentyev, A., Kozlyaev, A., Lavrentyeva, G., Volkovaya, V. (2016). Finding Speaker Position Under Difficult Acoustic Conditions. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43958-7_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43957-0

  • Online ISBN: 978-3-319-43958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics