On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems

Mankad, Sapan H.; Garg, Sanjay

doi:10.1007/s13748-020-00216-0

On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems

Regular Paper
Published: 29 August 2020

Volume 9, pages 325–339, (2020)
Cite this article

Progress in Artificial Intelligence Aims and scope Submit manuscript

Sapan H. Mankad¹ &
Sanjay Garg¹

307 Accesses
5 Citations
Explore all metrics

Abstract

Automatic speaker verification (ASV) systems have maximum threat from replay spoofing attacks. High frequency regions of the underlying audio signal exhibit the phenomenon about their presence. It is therefore useful to decompose the underlying audio signal into frequency bands or regions for possible analysis. In this paper, an empirical mode decomposition (EMD)-based replay spoofing detection system is presented. Using EMD, each signal is decomposed into several monotonic intrinsic mode functions (IMFs). The signal is reconstructed and represented using one or more subsets of these IMFs by performing different combinations for spoofing detection. Results on ASVspoof 2017 version 2.0 and AVspoof benchmark replay attack datasets indicate that there is a potential in initial IMFs to carry replay attack patterns, and that is sufficient rather than processing the entire signal. The proposed approach can also serve as a preprocessing technique by employing dimension reduction strategy. Cross-corpus experiments on the systems indicate the limitations of ASV antispoofing systems due to mismatched conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Notes

https://www.bbc.com/news/technology-39965545.
Last two IMFs do not include residual.
https://www.idiap.ch/dataset/avspoof.
https://www.asvspoof.org/.
https://www.asvspoof.org/asvspoof2019/ASVspoof_2019_baseline_CM_v1.zip.

References

Campbell, J.P.J.: Speaker Recognition. In: Jain, A.K., Bolle, R., Pankanti, S. (eds.) Biometrics, pp. 165–189. Springer, New York (1996)
Chapter Google Scholar
Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40 (2010)
Article Google Scholar
Reynolds, D.A.: Automatic speaker recognition: current approaches and future trends. Technical report
Lau, Y.W., Wagner, M., Tran, D.: Vulnerability of speaker verification to voice mimicking. In: Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 145–148 (2004). https://doi.org/10.1109/ISIMP.2004.1434021
Fazel, A., Chakrabartty, S.: An overview of statistical pattern recognition techniques for speaker verification. IEEE Circuits Syst. Mag. 11(2), 62–81 (2011). https://doi.org/10.1109/MCAS.2011.941080
Article Google Scholar
Leon, P.L., Apsingekar, V.R., Pucher, M., Yamagishi, J.: Revisiting the security of speaker verification systems against imposture using synthetic speech. In: ICASSP (2010)
Sahidullah, M., Kinnunen, T.: Local spectral variability features for speaker verification. Digital Signal Process. 50, 1–11 (2016). https://doi.org/10.1016/j.dsp.2015.10.011
Article Google Scholar
Poddar, A., Sahidullah, M., Saha, G.: Speaker verification with short utterances: a review of challenges, trends and opportunities. IET Biometrics 7(2), 91–101 (2018). https://doi.org/10.1049/iet-bmt.2017.0065
Article Google Scholar
Kucur Ergunay, S., Khoury, E., Lazaridis, A., Marcel, S.: On the vulnerability of speaker verification to realistic voice spoofing. In: IEEE International Conference on Biometrics: Theory, Applications and Systems, pp. 1–8. IEEE (2015). https://doi.org/10.1109/BTAS.2015.7358783
Kinnunen, T., Wu, Z., Lee, K.A., Sedlak, F., Chng, E.S., Li, H.: Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4401–4404 (2012). https://doi.org/10.1109/ICASSP.2012.6288895
Alegre, F., Janicki, A., Evans, N.: Re-assessing the threat of replay spoofing attacks against automatic speaker verification. In: International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–6 (2014)
Wu, Z., Kinnunen, T., Evans, N., Yamagishi, J.: ASVspoof 2015: Automatic speaker verification spoofing and countermeasures challenge evaluation plan (2015)
Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015). https://doi.org/10.1016/j.specom.2014.10.005
Article Google Scholar
Korshunov, P., Marcel, S.: Impact of score fusion on voice biometrics and presentation attack detection in cross-database evaluations. IEEE J. Sel. Top. Signal Process. 11(4), 695–705 (2017)
Article Google Scholar
Alepis, E., Patsakis, C.: Monkey says, monkey does: security and privacy on voice assistants. IEEE Access 5, 17841–17851 (2017)
Article Google Scholar
Sharma, R., Prasanna, S.M.: A better decomposition of speech obtained using modified empirical mode decomposition. Digital Signal Process. 58, 26–39 (2016). https://doi.org/10.1016/j.dsp.2016.07.012
Article Google Scholar
Sahidullah, M., Kinnunen, T., Hanilçi, C.: A comparison of features for synthetic speech detection. In: INTERSPEECH (2015)
Mankad, S.H., Garg, S., Patel, M., Adalja, H.: Investigating feature reduction strategies for replay antispoofing in voice biometrics. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D.K., Bora, P.K., Pal, S.K. (eds.) Pattern Recognition and Machine Intelligence, pp. 400–408. Springer, Cham (2019)
Chapter Google Scholar
Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995 (1998). https://doi.org/10.1098/rspa.1998.0193
Article MathSciNet MATH Google Scholar
Wu, J.D., Tsai, Y.J.: Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst. Appl. 38(5), 6112–6117 (2011). https://doi.org/10.1016/j.eswa.2010.11.013
Article Google Scholar
He, L., Lech, M., Maddage, N.C., Allen, N.B.: Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomed. Signal Process. Control 6(2), 139–146 (2011). https://doi.org/10.1016/j.bspc.2010.11.001. (Special Issue: The Advance of Signal Processing for Bioelectronics)
Article Google Scholar
Karan, B., Sahu, S.S., Mahto, K.: Parkinson disease prediction using intrinsic mode function based features from speech signal. Biocybern. Biomed. Eng. (2019). https://doi.org/10.1016/j.bbe.2019.05.005
Article Google Scholar
Chen, T., Ju, S., Yuan, X., Elhoseny, M., Ren, F., Fan, M., Chen, Z.: Emotion recognition using Empirical Mode Decomposition and approximation entropy. Comput. Electr. Eng. 72, 383–392 (2018). https://doi.org/10.1016/j.compeleceng.2018.09.022
Article Google Scholar
Nunes, J., Bouaoune, Y., Delechelle, E., Niang, O., Bunel, P.: Image analysis by bidimensional empirical mode decomposition. Image Vis. Comput. 21(12), 1019–1026 (2003). https://doi.org/10.1016/S0262-8856(03)00094-5
Article MATH Google Scholar
Seger, K.D., Al-Badrawi, M.H., Miksis-Olds, J.L., Kirsch, N.J., Lyons, A.P.: An empirical mode decomposition-based detection and classification approach for marine mammal vocal signals. J. Acoust. Soc. Am. 144(6), 3181–3190 (2018). https://doi.org/10.1121/1.5067389
Article Google Scholar
Tang, X., Ma, Z., Niu, X., Yang, Y.: Robust audio watermarking algorithm based on empirical mode decomposition. Chin. J. Electron. 25(6), 1005–1010 (2016). https://doi.org/10.1049/cje.2016.06.007
Article Google Scholar
Khaldi, K., Boudraa, A.: Audio watermarking via EMD. IEEE Trans. Audio Speech Lang. Process. 21(3), 675–680 (2013). https://doi.org/10.1109/TASL.2012.2227733
Article Google Scholar
Khaldi, K., Boudraa, A., Turki, M., Chonavel, T., Samaali, I.: Audio encoding based on the empirical mode decomposition. In: 17th European Signal Processing Conference, pp. 924–928 (2009)
Mankad, S.H., Pradhan, S.N.: Application of software defined radio for noise reduction using empirical mode decomposition. In: Wyld, D.C., Zizka, J., Nagamalai, D. (eds.) Advances in Computer Science, Engineering & Applications, pp. 113–121. Springer, Berlin (2012)
Chapter Google Scholar
Molla, M.K.I., Hirose, K.: Single-mixture audio source separation by subspace decomposition of hilbert spectrum. IEEE Trans. Audio Speech Lang. Process. 15(3), 893–900 (2007). https://doi.org/10.1109/TASL.2006.885254
Article Google Scholar
Molla, M.K.I., Hirose, K., Minematsu, N.: Audio source separation from the mixture using empirical mode decomposition with independent subspace analysis. In: INTERSPEECH (2004)
Mijović, B., De Vos, M., Gligorijević, I., Taelman, J., Van Huffel, S.: Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE Trans. Biomed. Eng. 57(9), 2188–2196 (2010)
Article Google Scholar
He, P., Qi, M., Liu, G., Yu, Z., Fu, Q.: An adaptive single channel EMD-TNMF blind source separation algorithm for both instantaneous and convolutive mixed signal. IOP Conf. Ser. Mater. Sci. Eng. 658, 012003 (2019)
Article Google Scholar
Kemiha, M., Kacha, A.: Complex blind source separation. Circuits Syst. Signal Process. 36, 4670–4687 (2017)
Article Google Scholar
Wu, W., Peng, H.: Application of emd denoising approach in noisy blind source separation. JCM 9, 506–514 (2014)
Article Google Scholar
Gao, C., Li, H., Ma, L.: An intrinsic mode function basis dictionary for auditory signal processing. In: International Conference on Audio, Language and Image Processing, pp. 16–21 (2014). https://doi.org/10.1109/ICALIP.2014.7009748
Sharma, R., Vignolo, L., Schlotthauer, G., Colominas, M., Rufiner, H.L., Prasanna, S.: Empirical mode decomposition for adaptive am-fm analysis of speech: a review. Speech Commun. 88, 39–64 (2017). https://doi.org/10.1016/j.specom.2016.12.004
Article Google Scholar
Çelebi, A.T., Ertürk, S.: Visual enhancement of underwater images using empirical mode decomposition. Expert Syst. Appl. 39(1), 800–805 (2012).https://doi.org/10.1016/j.eswa.2011.07.077
Tapkir, P., Patil, H.A.: Novel empirical mode decomposition cepstral features for replay spoof detection. In: Interspeech, pp. 721–725 (2018)
Linderhed, A.: Image empirical mode decomposition: a new tool for image processing. Adv. Adapt. Data Anal. 1, 265–294 (2009)
Article MathSciNet Google Scholar
Delgado, H., Todisco, M., Sahidullah, M., Evans, N.W.D., Kinnunen, T., Lee, K.A., Yamagishi, J.: Asvspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In: Odyssey (2018)
Public-domain test images for homeworks and projects. https://homepages.cae.wisc.edu/~ece533/images/ (2019)
Hu, J., Xie, Q., Wang, X., Liu, X.: A novel bi-dimensional EMD algorithm and its application in image enhancement. Inf. Technol. J. 13, 469–476 (2014)
Article Google Scholar
Tom, F., Jain, M., Dey, P.: End-to-end audio replay attack detection using deep convolutional networks with attention. In: INTERSPEECH (2018)
Patil, H.A., Kamble, M.R.: A survey on replay attack detection for automatic speaker verification (ASV) system. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1047–1053. IEEE (2018)
Li, J., Zhang, X., Sun, M., Zou, X., Zheng, C.: Attention-based LSTM algorithm for audio replay detection in noisy environments. Appl. Sci. 9(8), 1539 (2019)
Article Google Scholar
Rafi, B.S.M., Murty, K.S.R., Nayak, S.: A new approach for robust replay spoof detection in ASV systems. In: IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 51–55 (2017)
Garg, S., Mankad, S.H.: Voice liveness detection under feature fusion and cross-environment scenario. Multimed. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-09281-y
Article Google Scholar
McFee, B., Raffel, C., Liang, D., Ellis, D.P.W., McVicar, M., Battenberg, E., Nieto, O.: Librosa: audio and music signal analysis in python (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Sizov, A., Khoury, E., Kinnunen, T., Wu, Z., Marcel, S.: Joint speaker verification and antispoofing in the \(i\)-vector space. IEEE Trans. Inf. Forensics Secur. 10(4), 821–832 (2015). https://doi.org/10.1109/TIFS.2015.2407362
Article Google Scholar
Todisco, M., Delgado, H., Evans, N.: Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017). https://doi.org/10.1016/j.csl.2017.01.001
Article Google Scholar

Download references

Author information

Authors and Affiliations

CSE Department, Institute of Technology, Nirma University, Ahmedabad, India
Sapan H. Mankad & Sanjay Garg

Authors

Sapan H. Mankad
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Garg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanjay Garg.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mankad, S.H., Garg, S. On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems. Prog Artif Intell 9, 325–339 (2020). https://doi.org/10.1007/s13748-020-00216-0

Download citation

Received: 19 January 2020
Accepted: 14 August 2020
Published: 29 August 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s13748-020-00216-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems

Abstract

Access this article

Similar content being viewed by others

A Deep Learning Framework for Audio Deepfake Detection

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems

Abstract

Access this article

Similar content being viewed by others

A Deep Learning Framework for Audio Deepfake Detection

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation