Generative and Discriminative Modelling of Linear Energy Sub-bands for Spoof Detection in Speaker Verification Systems

Rupesh Kumar, Suvidha; Bharathi, B.

doi:10.1007/s00034-022-01957-0

Generative and Discriminative Modelling of Linear Energy Sub-bands for Spoof Detection in Speaker Verification Systems

Published: 20 January 2022

Volume 41, pages 3811–3831, (2022)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Suvidha Rupesh Kumar¹ &
B. Bharathi¹

285 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Classification of genuine and spoofed utterance is the basis for most of the countermeasure detecting spoof attacks on automatic speaker verification system. The choice of a good discriminating feature and a complementing classifier adds to the robustness of the countermeasure. Cepstral coefficients of the linear sub-band energy analysis have proved its worth in countering unknown attacks as witnessed by the literature. The intention behind the proposed work is to assess the behaviour of a spoof detection countermeasure using linear frequency cepstral coefficients with both generative and discriminative classifiers. The same are considered as baseline systems for further analysis. Parallelly, the paper proposes modifications to the traditional weighting function used in the retrieval of energy sub-bands on linear scale in order to leverage its full potential in spoof detection. The weighting function used is Gaussian, and hence, the modified feature is referred as GaussFCC. The aforementioned analysis is carried out on non-pre-emphasised utterances. The classifiers used are Gaussian mixture model (generative) and bidirectional long short-term memory (discriminative) classifiers. The empirical results show that the generative classifier has performed significantly in the detection of spoof attacks under logical access condition and discriminative classifier has shown drastic improvement in spoof detection under physical access condition over the generative model. Tandem detection cost function for logical access scenario (LA) using GMM classifier is 0.000 for development data and 0.113 for evaluation data, and in physical access scenario using BiLSTM classifier, it is 0.030 for development data and 0.044 for evaluation data. A detailed comparative analysis of the performance of the countermeasure is carried out based on different types of attacks, features, classifiers and utterances from female and male speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

References

C.M. Bishop, J. Lasserre, Generative or discrimative? Getting the best of both worlds, vol. 8, pp. 3–23 (2007)
D.R. Campbell, K.J. Palomäki, G. Brown, A matlab simulation of “shoebox’’ room acoustics for use in research and teaching. Comput. Inf. Syst. J. 9(3), 48 (2005). (ISSN 1352-9404)
Google Scholar
K. Conrad, Probability distributions and maximum entropy (2005)
R.K. Das, J. Yang, H. Li, Long range acoustic features for spoofed speech detection, in INTERSPEECH (2019)
L. Deng, D. O’Shaughnessy, Speech processing: a dynamic and optimization-oriented approach. Marcel Dekker Inc., (2003). https://doi.org/10.1201/9781482276237
A.R. Douglas, R.C. Rose, Robust text-independent speaker identification using Gaussian mixture models. IEEE Trans. Speech Audio Process. 1, 72–83 (1995). https://doi.org/10.1109/89.365379
Article Google Scholar
S.K. Ergünay, E. Khoury, A. Lazaridis, S. Marcel, On the vulnerability of speaker verification to realistic voice spoofing (2015), pp. 1–6. https://doi.org/10.1109/BTAS.2015.7358783
M.D. Femila, A.A. Irudhayaraj, Biometric system. in 2011 3rd International Conference on Electronics Computer Technology, , vol 1, pp. 152–156 (2011). https://doi.org/10.1109/ICECTECH.2011.5941580
C. Hanilci, T. Kinnunen, Md. Sahidullah, A. Sizov, Classifiers for synthetic speech detection: a comparison, in 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015) (2015), pp. 2057–2061
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (ISSN 0899-7667)
Article Google Scholar
M.R. Kamble, H.A. Patil, Analysis of reverberation via Teager energy features for replay spoof speech detection, in ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019), pp. 2607–2611. https://doi.org/10.1109/ICASSP.2019.8683830
T. Kinnunen, Z. Wu, K.A. Lee, F. Sedlak, E.S. Chng, H. Li, Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 4401—4404
T. Kinnunen, K.A. Lee, H. Delgado, N. Evans, M. Todisco, M. Sahidullah, J. Yamagishi, D.A. Reynolds, t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification, in Proceedings, Odyssey 2018 (2018)
M. Kudo, J. Toyama, M. Shimbo, Multidimensional curve classification using passing-through regions. Pattern Recognit. Lett. 20(11), 1103–1111 (1999). https://doi.org/10.1016/S0167-8655(99)00077-X (ISSN 0167-8655)
Article Google Scholar
M.G. Kumar, S.R. Kumar, M.S. Saranya, B. Bharathi, H.A. Murthy, Spoof detection using time-delay shallow neural network and feature switching, in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2019). https://doi.org/10.1109/asru46091.2019.9003824
S.R. Kumar, B. Bharathi, A novel approach towards generalization of countermeasure for spoofing attack on ASV systems. Circuits Syst. Signal Process. 40, 872–889 (2021). https://doi.org/10.1007/s00034-020-01501-y (ISSN 1531-5878)
Article Google Scholar
O. Kwon, I. Jang, C. Ahn, H. Kang. Emotional speech synthesis based on style embedded tacotron2 framework (2019), pp. 1–4. https://doi.org/10.1109/ITC-CSCC.2019.8793393
X. Li, N. Li, C. Weng, X. Liu, D. Su, D. Yu, H. Meng, Replay and synthetic speech detection with res2net architecture In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6–11, 2021, pp. 6354–6358. IEEE (2021). https://doi.org/10.1109/ICASSP39728.2021.9413828
D. Mitrovic, M. Zeppelzauer, C. Breiteneder, Features for content-based audio retrieval. Adv. Comput. 78, 71–150 (2010). https://doi.org/10.1016/S0065-2458(10)78003-7
Article Google Scholar
A. Novak, P. Lotton, L. Simon, Synchronized swept-sine: theory, application, and implementation. J. Audio Eng. Soc. 63(10), 786–798 (2015). (ISSN 1352-9404)
Article Google Scholar
S.P. Panda, Intelligent voice-based authentication system (2019), pp. 757–760. https://doi.org/10.1109/I-SMAC47947.2019.9032671
Y. Qian, N. Chen, K. Yu, Deep features for automatic spoofing detection. Speech Commun. 85, 43–52 (2016). https://doi.org/10.1016/j.specom.2016.10.007
Article Google Scholar
R.A. Rashid, N.H. Mahalin, M.A. Sarijari, A.A. Abdul Aziz, Security system using biometric technology: design and implementation of voice recognition system (VRS) (2008), pp. 898–902. https://doi.org/10.1109/ICCCE.2008.4580735
Md. Sahidullah, T. Kinnunen, C. Hanilçi, A comparison of features for synthetic speech detection, in Interspeech (2015), pp. 2087–2091
T.J. Sefara, T.B. Mokgonyane, M.J. Manamela, T.I. Modipa, Hmm-based speech synthesis system incorporated with language identification for low-resourced languages (2019), pp. 1–6. https://doi.org/10.1109/ICABCD.2019.8851055
C.E. Shannon, W. Weaver, A Mathematical Theory of Communication (University of Illinois Press, Illinois, 1963). (ISBN 0252725484)
MATH Google Scholar
K. Sriskandaraja, V. Sethu, P.N. Le, E. Ambikairajah, Investigation of sub-band discriminative information between spoofed and genuine speech. Interspeech 2016, 1710–1714 (2016)
Google Scholar
M. Todisco, H. Delgado, N. Evans, A new feature for automatic speaker verification anti-spoofing: constant q cepstral coefficients, in Proceedings of the Speaker and Language Recognition Workshop (2016), pp. 283–290
M. Todisco, X. Wang, V. Vestman, M. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. Kinnunen, K.A. Lee, Asvspoof 2019: future horizons in spoofed and fake audio detection, in Interspeech 2019 (2019)
E. Vincent. Roomsimove (2008). http://homepages.loria.fr/evincent/software/Roomsimove_1.4.zip
Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, H. Li, Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015). https://doi.org/10.1016/j.specom.2014.10.005 (ISSN 0167-6393)
Article Google Scholar
Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, M. Hanil çi, C. Sahidullah, A. Sizov, Asvspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge, in Interspeech (2015), pp. 2037–2041
Z. Xie, W. Zhang, Z. Chen, X. Xu, A comparison of features for replay attack detection. J. Phys. Conf. Ser. (JPCS) 1229, 8 (2019)
Google Scholar
J. Yamagishi, M. Todisco, Md. Sahidullah, H. Delgado, X. Wang, N. Evans, T. Kinnunen, K.A. Lee, V. Vestman, A. Nautsch, Asvspoof 2019: automatic speaker verification spoofing and countermeasures challenge evaluation plan (2019)
J. Yang, L. Xu, B. Ren, Y. Ji, Discriminative features based on modified log magnitude spectrum for playback speech detection. EURASIP J. Audio Speech Music Process. (2020). https://doi.org/10.1186/s13636-020-00173-5 (ISSN 1352-9404)
Article Google Scholar
H. Yu, Z.H. Tan, Y. Zhang, Z. Ma, J. Guo, Dnn filter bank cepstral coefficients for spoofing detection. IEEE Access 5, 4779–4787 (2017)
Article Google Scholar
C. Zhang, C. Yu, J.H.L. Hansen, An investigation of deep-learning frameworks for speaker verification antispoofing. IEEE J. Sel. Top. Signal Process. 11(4), 684–694 (2017). https://doi.org/10.1109/JSTSP.2016.2647199
Article Google Scholar
X. Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy-Wilson, S. Shamma, Linear versus mel frequency cepstral coefficients for speaker recognition, in 2011 IEEE Automatic Speech Recognition and Understanding Workshop, pp. 559–564 (2011). https://doi.org/10.1109/ASRU.2011.6163888

Download references

Acknowledgements

We extend our thanks to SSN College of Engineering for providing us with the required infrastructure to carry out our research work.

Author information

Authors and Affiliations

Department of CSE, SSN College of Engineering, Chennai, Tamil Nadu, India
Suvidha Rupesh Kumar & B. Bharathi

Authors

Suvidha Rupesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
B. Bharathi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suvidha Rupesh Kumar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rupesh Kumar, S., Bharathi, B. Generative and Discriminative Modelling of Linear Energy Sub-bands for Spoof Detection in Speaker Verification Systems. Circuits Syst Signal Process 41, 3811–3831 (2022). https://doi.org/10.1007/s00034-022-01957-0

Download citation

Received: 05 April 2021
Revised: 24 December 2021
Accepted: 28 December 2021
Published: 20 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00034-022-01957-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generative and Discriminative Modelling of Linear Energy Sub-bands for Spoof Detection in Speaker Verification Systems

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generative and Discriminative Modelling of Linear Energy Sub-bands for Spoof Detection in Speaker Verification Systems

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation