Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions

  • Khamis A. Al-KarawiEmail author
  • Duraid Y. Mohammed


A speech signal captured by a distant microphone is generally smeared by reverberation, which severely degrades automatic speaker recognition performance. To improve system performance, an effective and robust method is proposed to extract features for speech processing. In this paper, a room impulse response is presumed to comprise of three parts: a direct-path response, early reflections and late reverberations. Since late reverberations are known to be a major cause of system performance degradation, this paper focuses on dealing with the effect of early reflection because the early reflections and their properties play a necessary role within the acoustics of an enclosure. The proposed method first estimates the early reflection using autocorrelation function from the presentation of speech signals in the first stage, the estimates are combined with an anechoic signal for use into training the system in the second stage. The employed method looks to be promising, achieving a substantial improvement in system performance relating to reduced equal error rate and detection trade-off, especially at longer reverberation time.


Speaker recognition Early reflection Autocorrelation function GMM GFCC 



  1. Al-Karawi, K. A. (2019). Robustness speaker recognition based on feature space in clean and noisy condition. International Journal of Sensors, Wireless Communications and Control, 9, 1–10.CrossRefGoogle Scholar
  2. Al-Karawi, K. A., & Li, F. (2017). Robust speaker verification in reverberant conditions using estimated acoustic parameters: A maximum likelihood estimation and training on the fly approach. 2017 Seventh International Conference on Innovative Computing Technology (INTECH) (pp. 52–57).Google Scholar
  3. Al-Noori, A. H., Al-Karawi, K. A., & Li, F. (2015). Improving robustness of speaker recognition in noisy and reverberant conditions via training. 2015 European Intelligence and Security Informatics Conference (EISIC) (p. 180).Google Scholar
  4. Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., et al. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing, 2004, 430–451.Google Scholar
  5. Bradley, J., Sato, H., & Picard, M. (2003). On the importance of early reflections for speech in rooms. The Journal of the Acoustical Society of America, 113, 3233–3244.CrossRefGoogle Scholar
  6. CATT-Acoustic. (2010). v8.0c, Room acoustic modelling software. Retrieved October 18, 2010 from
  7. Defrance, G., Daudet, L., & Polack, J.-D. (2008). Detecting arrivals within room impulse responses using matching pursuit. Proceedings of the 11th International Conference on Digital Audio Effects (DAFx-08), Espoo, Finland (pp. 307–316).Google Scholar
  8. Guillemain, P., & Kronland-Martinet, R. (1996). Characterization of acoustic signals through continuous linear time-frequency representations. Proceedings of the IEEE, 84, 561–585.CrossRefGoogle Scholar
  9. Jeub, M., Schafer, M., & Vary, P. (2009). A binaural room impulse response database for the evaluation of dereverberation algorithms. 2009 16th International Conference on Digital Signal Processing (pp. 1–5).Google Scholar
  10. Kinoshita, K., Delcroix, M., Nakatani, T., & Miyoshi, M. (2009). Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Transactions on Audio, Speech and Language Processing, 17, 534–545.CrossRefGoogle Scholar
  11. Kuster, M. (2008). Reliability of estimating the room volume from a single room impulse response. The Journal of the Acoustical Society of America, 124, 982–993.CrossRefGoogle Scholar
  12. Kuttruff, H. (2009). Room acoustics. Boca Raton: CRC Press.Google Scholar
  13. Li, F. F. (2016). Robust speaker recognition by means of acoustic transmission channel matching: An acoustic parameter estimation approach. 2016 Sixth International Conference on Innovative Computing Technology (INTECH) (pp. 194–198).Google Scholar
  14. Loutridis, S. J. (2005). Decomposition of impulse responses using complex wavelets. Journal of the Audio Engineering Society, 53, 796–811.Google Scholar
  15. Mammone, R. J., Zhang, X., & Ramachandran, R. P. (1996). Robust speaker recognition: A feature-based approach. IEEE Signal Processing Magazine, 13, 58.CrossRefGoogle Scholar
  16. Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. A. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech and Language Processing, 15, 1711–1723.CrossRefGoogle Scholar
  17. Ristić, D. M., Pavlović, M., Pavlović, D. Š., & Reljin, I. (2013). Detection of early reflections using multifractals. The Journal of the Acoustical Society of America, 133, EL235–EL241.CrossRefGoogle Scholar
  18. Sadjadi, S. O., & Hansen, J. H. (2012). Blind reverberation mitigation for robust speaker identification. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4225–4228).Google Scholar
  19. Sadjadi, S. O., Slaney, M., & Heck, L. (2013). MSR identity toolbox v1. 0: A MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter.Google Scholar
  20. Schonle, M., Fliege, N., & Zolzer, U. (1993). Parametric approximation of room impulse responses based on wavelet decomposition. 1993 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1993. Final Program and Paper Summaries (pp. 68–71).Google Scholar
  21. Suits, B. H. (2015). Autocorrelation (for sound signals). Retrieved March 10, 2015 from
  22. Vesa, S. (2009). Binaural sound source distance learning in rooms. IEEE Transactions on Audio, Speech and Language Processing, 17, 1498–1507.CrossRefGoogle Scholar
  23. Wang, N., Ching, P., Zheng, N., & Lee, T. (2011). Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Transactions on Audio, Speech and Language Processing, 19, 196–205.CrossRefGoogle Scholar
  24. Wang, L., & Nakagawa, S. (2009). Speaker identification/verification for reverberant speech using phase information. Proceedings of WESPAC 2009.Google Scholar
  25. Zhao, X., Shao, Y., & Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech and Language Processing, 20, 1608–1616.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of DiyalaBaqubahIraq
  2. 2.School of EngineeringAl-Iraqia UniversityBaghdadIraq

Personalised recommendations