Analysis of Monaural and Binaural Statistical Properties for the Estimation of Distance of a Target Speaker

  • R. Venkatesan
  • A. Balaji GaneshEmail author


The paper presents an auditory distance perception model that is based on the extraction of statistical properties from monaural and binaural features in a reverberant room environment. The developed framework has considered both mono and stereo speech signals originated from different distances at various reverberation time periods. Hence, two models, namely single-channel monaural statistics and binaural-channel monaural statistics, have been discussed in this study. The distance-dependent statistical features from fused monaural coefficients, namely cepstral and envelope features, are chosen as an input to the different classification algorithms such as Gaussian mixture model-expectation maximization, support vector machine and random forest for the estimation of distance of a desired target user. The monaural coefficients are extracted in addition with the binaural cues, such as interaural time and level differences and interaural coherence (ITD, ILD and IC) for the binaural speech signals and eventually applied for the estimation of distance. The proposed monaural and binaural models observe an average of more than 5% better results compared to existing baseline techniques even at lower signal-to-noise ratio, 0 dB.


Monaural features Room acoustics Distance-dependent statistical properties Hilbert envelope features Binaural cues Classification models 



The authors wish to thank Department of Science and Technology for awarding a project under Cognitive Science Initiative Programme (DST File No.: SR/CSI/09/2011) through which the work has been implemented. Also, authors are very much grateful to the anonymous reviewers for their valuable and constructive suggestions.


  1. 1.
    A.K.H. Al-Ali, D. Dean, B. Senadji, V. Chandran, G.R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5(99), 1–1 (2017). CrossRefGoogle Scholar
  2. 2.
    A. Alinaghi, W. Wang, P.J. Jackson, Spatial and coherence cues based time-frequency masking for binaural reverberant speech separation, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2013), pp 684–688Google Scholar
  3. 3.
    N. Almaadeed, M. Asim, S. Al-Maadeed, A. Bouridane, A. Beghdadi, Automatic detection and classification of audio events for road surveillance applications. Sensors 18(6), 1858 (2018)CrossRefGoogle Scholar
  4. 4.
    C.C. Chang, C.J. Lin, LIBSVM: A Library for Support Vector Machines (2001).
  5. 5.
    J. Chen, Y. Wang, D.L. Wang, A feature study for classification-based speech separation at low signal-to-noise ratios. IEEE/ACM Trans. Audio Speech Language Process. 22(12), 1993–2002 (2014)CrossRefGoogle Scholar
  6. 6.
    M. Cobos, J.J. Lopez, D. Martinez, Two-microphone multi-speaker localization based on a Laplacian mixture model. Digital Signal Process. 21(1), 66–76 (2011)CrossRefGoogle Scholar
  7. 7.
    T.L.T. da Silveira, A.J. Kozakevicius, C.R. Rodrigues, Single-channel EEG sleep stage classification based on a streamlined set of statistical features in wavelet domain. Med. Biol. Eng. Comput. 55(2), 343–352 (2017)CrossRefGoogle Scholar
  8. 8.
    D. Ellis, PLP and RASTA (and MFCC, and Inversion) in Matlab (2005).
  9. 9.
    D.P. Ellis, X. Zeng, J.H. McDermott, Classifying soundtracks with audio texture features, in IEEE international conference on acoustics, speech and signal processing (ICASSP) (IEEE) (2011), pp. 5880–5883Google Scholar
  10. 10.
    J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus (Linguistic Data Consortium, Philadelphia, 1993)Google Scholar
  11. 11.
    E. Georganti, T. May, S. Van de Par, A. Harma, J. Mourjopoulos, Speaker distance detection using a single microphone. IEEE Trans. Audio Speech Language Process. 19(7), 1949–1961 (2011)CrossRefGoogle Scholar
  12. 12.
    E. Georganti, T. May, S. Van de Par, J. Mourjopoulos, Sound source distance estimation in rooms based on statistical properties of binaural signals. IEEE Trans. Audio Speech Lang. Process. 21(8), 1727–1741 (2013)CrossRefGoogle Scholar
  13. 13.
    Y. Hioka, K. Niwa, S. Sakauchi, K. Furuya, Y. Haneda, Estimating direct-to-reverberant energy ratio using D/R spatial correlation matrix model. IEEE Trans. Audio Speech Language Process. 19(8), 2374–2384 (2011)CrossRefGoogle Scholar
  14. 14.
    Y. Hu, P. Loizou, Subjective evaluation and comparison of speech enhancement algorithms. Speech Commun. 49, 588–601 (2007)CrossRefGoogle Scholar
  15. 15.
    S.L. Jayalakshmi, S. Chandrakala, R. Nedunchelian, Global statistical features-based approach for acoustic event detection. Appl. Acoust. 139, 113–118 (2018)CrossRefGoogle Scholar
  16. 16.
    M. Jeub, M. Schäfer, P. Vary, A binaural room impulse response database for the evaluation of dereverberation algorithms, in Proceedings of International Conference on Digital Signal Processing (DSP) (2009), pp 1–4Google Scholar
  17. 17.
    Y. Jiang, D.L. Wang, R. Sheng Liu, Z. Feng, Binaural classification for reverberant speech segregation using deep neural networks. IEEE Trans. Audio Speech Lang. Process. 22(12), 2112–2121 (2014)CrossRefGoogle Scholar
  18. 18.
    H.K. Kim, S.H. Choi, GMM-based matching ability measurement of a speech recognizer and a feature set, in Future Communication, Computing, Control and Management. Lecture Notes in Electrical Engineering, vol. 142 (Springer, Berlin, 2012)Google Scholar
  19. 19.
    A. Kohlrausch, J. Braasch, D. Kolossa, J. Blauert, The Technology of binaural Listening (Springer, Berlin, 2013)Google Scholar
  20. 20.
    S. Kuchibhotla, H.D. Vankayalapati, R.S. Vaddi, K.R. Anne, A comparative analysis of classifiers in emotion recognition through acoustic features. Int. J. Speech Technol. 17(4), 401–408 (2014)CrossRefGoogle Scholar
  21. 21.
    H. Lim, M.J. Kim, H. Kim, Robust sound event classification using LBP-HOG based bag-of-audio-words feature representation, in Sixteenth Annual Conference of the International Speech Communication Association (2015)Google Scholar
  22. 22.
    Y.C. Lu, M. Cooke, Binaural estimation of sound source distance via the direct reverberant energy ratio for static and moving sources. IEEE Trans. Audio Speech Language Process. 18(7), 1793–1805 (2010)CrossRefGoogle Scholar
  23. 23.
    Y.C. Lu, M. Cooke, Motion strategies for binaural localisation of speech sources in azimuth and distance by artificial listeners. Speech Commun. 53(5), 622–642 (2011)CrossRefGoogle Scholar
  24. 24.
    T. May, S. Van de Par, A. Kohlrausch, A binaural scene analyzer for joint localization and recognition of speakers in the presence of interfering noise sources and reverberation. IEEE Trans. Audio Speech Lang. Process. 20(7), 2016–2030 (2012)CrossRefGoogle Scholar
  25. 25.
    G. Piñero, P.A. Naylor, Channel estimation for crosstalk cancellation in wireless acoustic networks, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017), pp. 586–590Google Scholar
  26. 26.
    S.O. Sadjadi, J.H.L. Hansen, Mean hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72, 138–148 (2015)CrossRefGoogle Scholar
  27. 27.
    S.O. Sadjadi, J.H.L. Hansen, Blind spectral weighting for robust speaker identification under reverberation mismatch. IEEE/ACM Trans. Audio Speech Language Process. (TASLP) 22(5), 937–945 (2014)CrossRefGoogle Scholar
  28. 28.
    S.O. Sadjadi, T. Hasan, J.H.L. Hansen, Mean hilbert envelope coefficients (MHEC) for robust speaker recognition, in Thirteenth Annual Conference of the International Speech Communication Association (2012)Google Scholar
  29. 29.
    B. Şen, M. Peker, A. Çavuşoğlu, F.V. Çelebi, A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms. J. Med. Syst. 38(3), 18 (2014)CrossRefGoogle Scholar
  30. 30.
    N. Sengupta, Md. Sahidullah, G. Saha, Lung sound classification using cepstral-based statistical features. Comput. Biol. Med. 75, 118–129 (2016)CrossRefGoogle Scholar
  31. 31.
    N. Sengupta, Md. Sahidullah, G. Saha (2015) Optimization of cepstral features for robust lung sound classification, in 2015 Annual IEEE India Conference (INDICON) IEEE (2015)Google Scholar
  32. 32.
    K. Sreenivasa Rao, S. Sarkar, Robust speaker recognition in noisy environments (Springer, Cham, 2014)Google Scholar
  33. 33.
    D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, M.D. Plumbley, Detection and classification of acoustic scenes and events. IEEE Trans. Multimed. 17(10), 1733–1746 (2015)CrossRefGoogle Scholar
  34. 34.
    M.K. Uçar, M.R. Bozkurt, C. Bilgin, K. Polat, Automatic detection of respiratory arrests in OSA patients using PPG and machine learning techniques. Neural Comput. Appl. 28(10), 2931–2945 (2017)CrossRefGoogle Scholar
  35. 35.
    M.K. Uçar, M.R. Bozkurt, C. Bilgin, K. Polat, Automatic sleep staging in obstructive sleep apnea patients using photoplethysmography, heart rate variability signal and machine learning techniques. Neural Comput. Appl. 29(8), 1–16 (2018)CrossRefGoogle Scholar
  36. 36.
    R. Venkatesan, A. Balaji Ganesh, Unsupervised auditory saliency enabled binaural scene analyzer for speaker localization and recognition. Adv. Signal Process. Intell. Recognit. Syst. 674, 337–350 (2018)CrossRefGoogle Scholar
  37. 37.
    R.Venkatesan, A. Balaji Ganesh, Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest. Multimed. Tools Appl. 67(3) (2017)Google Scholar
  38. 38.
    S. Vesa, Binaural source distance learning in rooms. IEEE Trans. Audio Speech Language Process. 17(8), 1498–1507 (2009)CrossRefGoogle Scholar
  39. 39.
    Y. Wang, K. Han, D.L. Wang, Exploring monaural features for classification-based speech segregation. IEEE Trans. Audio Speech Lang. Process. 21(2), 270–279 (2013)CrossRefGoogle Scholar
  40. 40.
    J. Woodruff, D. Wang, Binaural localization of multiple sources in reverberant and noisy environments. IEEE Trans. Audio Speech Language Process. 20(5), 1503–1512 (2012)CrossRefGoogle Scholar
  41. 41.
    X. Yan, W. Kang, F. Deng, Q. Wu, Palm vein recognition based on multi-sampling and feature-level fusion. Neurocomputing 151, 798–807 (2015)CrossRefGoogle Scholar
  42. 42.
    Y. Yu, W. Wang, P. Han, Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural network. J Audio Speech Music Proc. (2016). CrossRefGoogle Scholar
  43. 43.
    X. Zhao, Y. Wang, D.L. Wang, Robust speaker identification in noisy and reverberant conditions. IEEE Trans. Audio Speech Lang. Process. 22(4), 836–845 (2014)CrossRefGoogle Scholar
  44. 44.
    J. Zhou, Wu X-m, W-j Zeng, Automatic detection of sleep apnea based on EEG detrended fluctuation analysis and support vector machine. J. Clin. Monit. Comput. 29(6), 767–772 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Electronic System Design LaboratoryVelammal Engineering CollegeChennaiIndia

Personalised recommendations