Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds

Aljinu Khadar, K. V.; Sunil Kumar, R. K.; Sreekanth, N. S.

doi:10.1007/978-3-031-50993-3_22

K. V. Aljinu Khadar¹⁰,
R. K. Sunil Kumar¹⁰ &
N. S. Sreekanth¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1973))

Included in the following conference series:

International Conference on Computational Sciences and Sustainable Technologies

73 Accesses

Abstract

The performance of the GMM-UBM-I vector in a forensic speaker verification system has been examined in the context of noisy speech samples. This analysis utilised both Mel-frequency cepstral coefficients (MFCC) and MFCCs generated from auto-correlated speech signals. The noisy signal’s auto correlation coefficients are concentrated around the lower lag, whereas the autocorrelation coefficients near the higher lag are very small. Thus, in addition to retain the periodic nature, autocorrelation-based MFCC is also robust for analyzing speech signals in intense background noise. The performance of MFCC and auto-correlated MFCC depends heavily on the quality of the sample. It works best with data that is free of noise, but it suffers when used on real-world examples, ie, with noisy data. The experiment on speaker verification for forensic purposes involved the addition of White Gaussian Noise, Red Noise, and Pink Noise, with a Signal-to-Noise Ratio (SNR) range spanning from −20 dB to + 20 dB. The performance of both methods was affected drastically in call cases but autocorrelation-based MFCC gave better results than MFCC. Thus, autocorrelation-based MFCC is a valuable method for robust feature extraction when compared with MFCC for speaker verification purposes in intense background noise. The verification accuracy in our method is improved even in very high noise levels (−20 dB) than the reported research work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Article 09 March 2020

Multitaper MFCC and normalized multitaper phase-based features for speaker verification

Article 02 March 2019

Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers

Article 19 November 2014

References

Furui, S.: Speaker-independent and speaker-adaptive recognition techniques. Adv. Speech Signal Process. 597–622 (1992)
Google Scholar
Chiu, T.-L., Liou, H.-C., Yeh, Y.: A study of web-based oral activities enhanced by automatic speech recognition for EFL college learning. Comput. Assisted Lang. Learn. 20(3), 209–233 (2007)
Article Google Scholar
Kabir, M.M., et al.: A survey of speaker recognition: fundamental theories, recognition methods and opportunities. IEEE Access 9, 79236–79263 (2021)
Article Google Scholar
Ajili, M., et al.: Phonological content impact on wrongful convictions in forensic voice comparison context. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)
Google Scholar
Amino, K., Osanai, T., Kamada, T., Makinae, H., Arai, T.: Historical and procedural overview of forensic speaker recognition as a science. In: Neustein, A., Patil, H. (eds.) Forensic Speaker Recognition, pp. 3–20. Springer, New York (2012). https://doi.org/10.1007/978-1-4614-0263-3_1
Chapter Google Scholar
Tull, R.G., Rutledge, J.C.: Cold speech for automatic speaker recognition. In: Acoustical Society of America 131st Meeting Lay Language Papers (1996)
Google Scholar
Benzeghiba, M., et al.: Impact of variabilities on speech recognition. In: Proceeding of the SPECOM (2006)
Google Scholar
Mandasari, M., McLaren, M., van Leeuwen, D.A.: The effect of noise on modern automatic speaker recognition systems. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2012)
Google Scholar
Hasan, T., et al.: CRSS systems for 2012 NIST speaker recognition evaluation. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)
Google Scholar
Logan, B.: Mel frequency cepstral coefficients for music modeling. ISMIR 270(1) (2000)
Google Scholar
Atal, B.S.: The history of linear prediction. IEEE Signal Process. Mag. 23(2), 154–161 (2006)
Article Google Scholar
Hariharan, M., Chee, L.S., Yaacob, S.: Analysis of infant cry through weighted linear prediction cepstral coefficients and probabilistic neural network. J. Med. Syst. 36, 1309–1315 (2012)
Article Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoustical Soc. Am. 87(4), 1738–1752 (1990)
Article Google Scholar
Naing, H.M.S., et al.: Filterbank analysis of MFCC feature extraction in robust children speech recognition. In: 2019 International Symposium on Multimedia and Communication Technology (ISMAC). IEEE (2019)
Google Scholar
Bai, Z., Zhang, X.-L.: Speaker recognition based on deep learning: an overview. Neural Netw. 140, 65–99 (2021)
Article Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1–3), 19–41 (2000)
Article Google Scholar
Kenny, P., et al.: Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)
Article Google Scholar
Kenny, P., et al.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)
Article Google Scholar
Dehak, N., et al.: Cosine similarity scoring without score normalization techniques. Odyssey (2010)
Google Scholar
Matějka, P., et al.: Full-covariance UBM and heavy-tailed PLDA in I-vector speaker verification. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2011)
Google Scholar
Dehak, N., et al.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)
Article Google Scholar
Dev, A., Bansal, P.: Robust features for noisy speech recognition using MFCC computation from magnitude spectrum of higher order autocorrelation coefficients. Int. J. Comput. Appl. 10(8), 36–38 (2010)
Google Scholar
Farahani, G., Ahadi, S.M.: Robust features for noisy speech recognition based on filtering and spectral peaks in autocorrelation domain. In: 2005 13th European Signal Processing Conference, pp. 1–4. IEEE (2005)
Google Scholar
Shan, Z., Yang, Y.: Scores selection for emotional speaker recognition. In: Tistarelli, M., Nixon, M.S. (eds.) ICB 2009. LNCS, vol. 5558, pp. 494–502. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01793-3_51
Chapter Google Scholar
Lau, L.: This is a sample template for authors. J. Digit. Forensics Secur. Law 9(2), 1–2 (2014)
Google Scholar
Farahani, G.: Autocorrelation-based noise subtraction method with smoothing, overestimation, energy, and cepstral mean and variance normalization for noisy speech recognition. EURASIP J. Audio Speech Music Process. 2017(1), 1–16 (2017). https://doi.org/10.1186/s13636-017-0110-8
Article Google Scholar
Bibish Kumar, K.T., Sunil Kumar, R.K.: Viseme identification and analysis for recognition of Malayalam speech intense background noise. Ph.D. thesis (2021)
Google Scholar
Kim, J., et al.: Extended U-net for speaker verification in noisy environments. arXiv:2206.13044v1 (2022)

Download references

Author information

Authors and Affiliations

Department of Information Technology, Kannur University, Kannur, India
K. V. Aljinu Khadar, R. K. Sunil Kumar & N. S. Sreekanth

Authors

K. V. Aljinu Khadar
View author publications
You can also search for this author in PubMed Google Scholar
R. K. Sunil Kumar
View author publications
You can also search for this author in PubMed Google Scholar
N. S. Sreekanth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. V. Aljinu Khadar .

Editor information

Editors and Affiliations

CHRIST University, Bangalore, India
Sagaya Aurelia
CHRIST University, Bangalore, India
Chandra J.
CHRIST University, Bangalore, India
Ashok Immanuel
Modern College of Business and Science, Muscat, Oman
Joseph Mani
Modern College of Business and Science, Muscat, Oman
Vijaya Padmanabha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aljinu Khadar, K.V., Sunil Kumar, R.K., Sreekanth, N.S. (2024). Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds. In: Aurelia, S., J., C., Immanuel, A., Mani, J., Padmanabha, V. (eds) Computational Sciences and Sustainable Technologies. ICCSST 2023. Communications in Computer and Information Science, vol 1973. Springer, Cham. https://doi.org/10.1007/978-3-031-50993-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-50993-3_22
Published: 03 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50992-6
Online ISBN: 978-3-031-50993-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds

Abstract

Access this chapter

Similar content being viewed by others

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Multitaper MFCC and normalized multitaper phase-based features for speaker verification

Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds

Abstract

Access this chapter

Similar content being viewed by others

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Multitaper MFCC and normalized multitaper phase-based features for speaker verification

Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation