Mitigate the Reverberant Effects on Speaker Recognition via Multi-training

Mohammed, Duraid Y.; Al-Karawi, Khamis A.; Husien, Idress Mohammed; Ghulam, Marwah Abdullah

doi:10.1007/978-3-030-38752-5_8

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1174))

Included in the following conference series:

International Conference on Applied Computing to Support Industry: Innovation and Technology

765 Accesses
8 Citations

Abstract

Speaker recognition techniques have been developed into a relatively mature status over the past few decades through continuous research and development work. Existing methods typically use robust features extracted from clean speech signals, and therefore in idealized conditions can achieve very high recognition accuracy. For critical applications, such as security forensics robustness and reliability of the system is crucial. The reverberation condition can be represented by two main parameters namely Reverberation Time (RT) and Direct to Reverberation Ratio (DRR) (which represent the distance of the microphone to the source). This paper presents an efficient method to mitigating or at least alleviates the impacts of reverberation upon speaker verification. Multi-condition training approaches are investigated to alleviate such detrimental effects. Three multi-condition training methods are then investigated to mitigate such detrimental effects. The first uses matched train/test speaker models based on estimated reverberation time (RT) values. The second utilizes two-condition training where clean and reverberant models are used. Lastly, a four-condition training setup is proposed and conducted to improve the system performance. The utilized data set building, for SV experiments, training, and speech test material are obtained from the University of Salford Anechoic chamber database (SALU-AC). Experimental results show the first and the last types of multi-condition training providing significant gains in performance relative to the baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29, 254–272 (1981)
Article Google Scholar
Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech Audio Process. 2, 578–589 (1994)
Article Google Scholar
Reynolds, D.A.: Channel robust speaker verification via feature mapping. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2003), vol. 2, pp. II-53-6 (2003)
Google Scholar
Ganapathy, S., Pelecanos, J., Omar, M.K.: Feature normalization for speaker verification in room reverberation. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4836–4839 (2011)
Google Scholar
Jin, Q., Schultz, T., Waibel, A.: Far-field speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15, 2023–2032 (2007)
Article Google Scholar
González-Rodríguez, J., Ortega-García, J., Martín, C., Hernández, L.: Increasing robustness in GMM speaker recognition systems for noisy and reverberant speech with low complexity microphone arrays. In: 1996 Proceedings of the Fourth International Conference on Spoken Language, ICSLP 1996, pp. 1333–1336 (1996)
Google Scholar
Peer, I., Rafaely, B., Zigel, Y.: Reverberation matching for speaker recognition. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, pp. 4829–4832 (2008)
Google Scholar
Sadjadi, S.O., Hansen, J.H.: Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5448–5451 (2011)
Google Scholar
Falk, T.H., Chan, W.-Y.: Modulation spectral features for robust far-field speaker identification. IEEE Trans. Audio Speech Lang. Process. 18, 90–100 (2010)
Article Google Scholar
Falk, T.H., Chan, W.-Y.: Spectro-temporal features for robust far-field speaker identification. In: INTERSPEECH, pp. 634–637 (2008)
Google Scholar
Gammal, J.S., Goubran, R.A.: Combating reverberation in speaker verification. In: 2005 Proceedings of the IEEE Instrumentation and Measurement Technology Conference, IMTC 2005, pp. 687–690 (2005)
Google Scholar
Zhao, X., Wang, Y., Wang, D.: Robust speaker identification in noisy and reverberant conditions (2014)
Google Scholar
Ming, J., Hazen, T.J., Glass, J.R., Reynolds, D.A.: Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15, 1711–1723 (2007)
Article Google Scholar
Wang, N., Ching, P., Zheng, N., Lee, T.: Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Trans. Audio Speech Lang. Process. 19, 196–205 (2011)
Article Google Scholar
Sadjadi, S.O., Slaney, M., Heck, L.: MSR identity toolbox v1. 0: a MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter (2013)
Google Scholar
Kinnunen, T., Koh, C., Wang, L., Li, H., Chng, E.: Temporal discrete cosine transform: towards longer term temporal features for speaker verification. In: Proceedings Fifth International Symposium on Chinese Spoken Language Processing (ISCSLP 2006), Singapore, pp. 547–558 (2006)
Google Scholar
Turk, U., Schiel, F.: Speaker verification based on the German VeriDat database. In: Eighth European Conference on Speech Communication and Technology (2003)
Google Scholar
Larcher, A., Bonastre, J.-F., Fauve, B.G., Lee, K.-A., Lévy, C., Li, H., et al.: ALIZE 3.0-open source toolkit for state-of-the-art speaker recognition. In: INTERSPEECH, pp. 2768–2772 (2013)
Google Scholar
Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85, 1437–1462 (1997)
Article Google Scholar
Rose, R.C., Reynolds, D.A.: Text-independent speaker identification using automatic acoustic segmentation. In: 1990 International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1990, pp. 293–296 (1990)
Google Scholar
International Standard: 3382. Acoustics–measurement of the reverberation time of rooms with reference to other acoustical parameters. International Standards Organization (1997)
Google Scholar
Doddington, G.R., Przybocki, M.A., Martin, A.F., Reynolds, D.A.: The NIST speaker recognition evaluation–overview, methodology, systems, results, perspective. Speech Commun. 31, 225–254 (2000)
Article Google Scholar
Chen, Y.-W., Lin, C.-J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 315–324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13
Chapter Google Scholar
El Bachir, T., Benabbou, A., Harti, M.: Design of an automatic speaker recognition system based on adapted MFCC and GMM methods for Arabic speech. Int. J. Comput. Sci. Netw. Secur. 10 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Education for Women, Al-Iraqia University, Baghdad, Iraq
Duraid Y. Mohammed
University of Diyala, Diyala, Iraq
Khamis A. Al-Karawi
School of Computer Science, Kirkuk University, Kirkuk, Iraq
Idress Mohammed Husien
Ministry of Higher Education and Scientific, Baghdad, Iraq
Marwah Abdullah Ghulam

Authors

Duraid Y. Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Khamis A. Al-Karawi
View author publications
You can also search for this author in PubMed Google Scholar
Idress Mohammed Husien
View author publications
You can also search for this author in PubMed Google Scholar
Marwah Abdullah Ghulam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duraid Y. Mohammed .

Editor information

Editors and Affiliations

Al Maaref University College, Ramadi, Iraq
Mohammed I. Khalaf
Liverpool John Moores University, Liverpool, UK
Dhiya Al-Jumeily
University of Liverpool, Liverpool, UK
Alexei Lisitsa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohammed, D.Y., Al-Karawi, K.A., Husien, I.M., Ghulam, M.A. (2020). Mitigate the Reverberant Effects on Speaker Recognition via Multi-training. In: Khalaf, M., Al-Jumeily, D., Lisitsa, A. (eds) Applied Computing to Support Industry: Innovation and Technology. ACRIT 2019. Communications in Computer and Information Science, vol 1174. Springer, Cham. https://doi.org/10.1007/978-3-030-38752-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-38752-5_8
Published: 08 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38751-8
Online ISBN: 978-3-030-38752-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics