Abstract
Automatic speaker verification (ASV) systems have maximum threat from replay spoofing attacks. High frequency regions of the underlying audio signal exhibit the phenomenon about their presence. It is therefore useful to decompose the underlying audio signal into frequency bands or regions for possible analysis. In this paper, an empirical mode decomposition (EMD)-based replay spoofing detection system is presented. Using EMD, each signal is decomposed into several monotonic intrinsic mode functions (IMFs). The signal is reconstructed and represented using one or more subsets of these IMFs by performing different combinations for spoofing detection. Results on ASVspoof 2017 version 2.0 and AVspoof benchmark replay attack datasets indicate that there is a potential in initial IMFs to carry replay attack patterns, and that is sufficient rather than processing the entire signal. The proposed approach can also serve as a preprocessing technique by employing dimension reduction strategy. Cross-corpus experiments on the systems indicate the limitations of ASV antispoofing systems due to mismatched conditions.
Similar content being viewed by others
Notes
References
Campbell, J.P.J.: Speaker Recognition. In: Jain, A.K., Bolle, R., Pankanti, S. (eds.) Biometrics, pp. 165–189. Springer, New York (1996)
Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40 (2010)
Reynolds, D.A.: Automatic speaker recognition: current approaches and future trends. Technical report
Lau, Y.W., Wagner, M., Tran, D.: Vulnerability of speaker verification to voice mimicking. In: Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 145–148 (2004). https://doi.org/10.1109/ISIMP.2004.1434021
Fazel, A., Chakrabartty, S.: An overview of statistical pattern recognition techniques for speaker verification. IEEE Circuits Syst. Mag. 11(2), 62–81 (2011). https://doi.org/10.1109/MCAS.2011.941080
Leon, P.L., Apsingekar, V.R., Pucher, M., Yamagishi, J.: Revisiting the security of speaker verification systems against imposture using synthetic speech. In: ICASSP (2010)
Sahidullah, M., Kinnunen, T.: Local spectral variability features for speaker verification. Digital Signal Process. 50, 1–11 (2016). https://doi.org/10.1016/j.dsp.2015.10.011
Poddar, A., Sahidullah, M., Saha, G.: Speaker verification with short utterances: a review of challenges, trends and opportunities. IET Biometrics 7(2), 91–101 (2018). https://doi.org/10.1049/iet-bmt.2017.0065
Kucur Ergunay, S., Khoury, E., Lazaridis, A., Marcel, S.: On the vulnerability of speaker verification to realistic voice spoofing. In: IEEE International Conference on Biometrics: Theory, Applications and Systems, pp. 1–8. IEEE (2015). https://doi.org/10.1109/BTAS.2015.7358783
Kinnunen, T., Wu, Z., Lee, K.A., Sedlak, F., Chng, E.S., Li, H.: Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4401–4404 (2012). https://doi.org/10.1109/ICASSP.2012.6288895
Alegre, F., Janicki, A., Evans, N.: Re-assessing the threat of replay spoofing attacks against automatic speaker verification. In: International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–6 (2014)
Wu, Z., Kinnunen, T., Evans, N., Yamagishi, J.: ASVspoof 2015: Automatic speaker verification spoofing and countermeasures challenge evaluation plan (2015)
Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015). https://doi.org/10.1016/j.specom.2014.10.005
Korshunov, P., Marcel, S.: Impact of score fusion on voice biometrics and presentation attack detection in cross-database evaluations. IEEE J. Sel. Top. Signal Process. 11(4), 695–705 (2017)
Alepis, E., Patsakis, C.: Monkey says, monkey does: security and privacy on voice assistants. IEEE Access 5, 17841–17851 (2017)
Sharma, R., Prasanna, S.M.: A better decomposition of speech obtained using modified empirical mode decomposition. Digital Signal Process. 58, 26–39 (2016). https://doi.org/10.1016/j.dsp.2016.07.012
Sahidullah, M., Kinnunen, T., Hanilçi, C.: A comparison of features for synthetic speech detection. In: INTERSPEECH (2015)
Mankad, S.H., Garg, S., Patel, M., Adalja, H.: Investigating feature reduction strategies for replay antispoofing in voice biometrics. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D.K., Bora, P.K., Pal, S.K. (eds.) Pattern Recognition and Machine Intelligence, pp. 400–408. Springer, Cham (2019)
Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995 (1998). https://doi.org/10.1098/rspa.1998.0193
Wu, J.D., Tsai, Y.J.: Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst. Appl. 38(5), 6112–6117 (2011). https://doi.org/10.1016/j.eswa.2010.11.013
He, L., Lech, M., Maddage, N.C., Allen, N.B.: Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomed. Signal Process. Control 6(2), 139–146 (2011). https://doi.org/10.1016/j.bspc.2010.11.001. (Special Issue: The Advance of Signal Processing for Bioelectronics)
Karan, B., Sahu, S.S., Mahto, K.: Parkinson disease prediction using intrinsic mode function based features from speech signal. Biocybern. Biomed. Eng. (2019). https://doi.org/10.1016/j.bbe.2019.05.005
Chen, T., Ju, S., Yuan, X., Elhoseny, M., Ren, F., Fan, M., Chen, Z.: Emotion recognition using Empirical Mode Decomposition and approximation entropy. Comput. Electr. Eng. 72, 383–392 (2018). https://doi.org/10.1016/j.compeleceng.2018.09.022
Nunes, J., Bouaoune, Y., Delechelle, E., Niang, O., Bunel, P.: Image analysis by bidimensional empirical mode decomposition. Image Vis. Comput. 21(12), 1019–1026 (2003). https://doi.org/10.1016/S0262-8856(03)00094-5
Seger, K.D., Al-Badrawi, M.H., Miksis-Olds, J.L., Kirsch, N.J., Lyons, A.P.: An empirical mode decomposition-based detection and classification approach for marine mammal vocal signals. J. Acoust. Soc. Am. 144(6), 3181–3190 (2018). https://doi.org/10.1121/1.5067389
Tang, X., Ma, Z., Niu, X., Yang, Y.: Robust audio watermarking algorithm based on empirical mode decomposition. Chin. J. Electron. 25(6), 1005–1010 (2016). https://doi.org/10.1049/cje.2016.06.007
Khaldi, K., Boudraa, A.: Audio watermarking via EMD. IEEE Trans. Audio Speech Lang. Process. 21(3), 675–680 (2013). https://doi.org/10.1109/TASL.2012.2227733
Khaldi, K., Boudraa, A., Turki, M., Chonavel, T., Samaali, I.: Audio encoding based on the empirical mode decomposition. In: 17th European Signal Processing Conference, pp. 924–928 (2009)
Mankad, S.H., Pradhan, S.N.: Application of software defined radio for noise reduction using empirical mode decomposition. In: Wyld, D.C., Zizka, J., Nagamalai, D. (eds.) Advances in Computer Science, Engineering & Applications, pp. 113–121. Springer, Berlin (2012)
Molla, M.K.I., Hirose, K.: Single-mixture audio source separation by subspace decomposition of hilbert spectrum. IEEE Trans. Audio Speech Lang. Process. 15(3), 893–900 (2007). https://doi.org/10.1109/TASL.2006.885254
Molla, M.K.I., Hirose, K., Minematsu, N.: Audio source separation from the mixture using empirical mode decomposition with independent subspace analysis. In: INTERSPEECH (2004)
Mijović, B., De Vos, M., Gligorijević, I., Taelman, J., Van Huffel, S.: Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE Trans. Biomed. Eng. 57(9), 2188–2196 (2010)
He, P., Qi, M., Liu, G., Yu, Z., Fu, Q.: An adaptive single channel EMD-TNMF blind source separation algorithm for both instantaneous and convolutive mixed signal. IOP Conf. Ser. Mater. Sci. Eng. 658, 012003 (2019)
Kemiha, M., Kacha, A.: Complex blind source separation. Circuits Syst. Signal Process. 36, 4670–4687 (2017)
Wu, W., Peng, H.: Application of emd denoising approach in noisy blind source separation. JCM 9, 506–514 (2014)
Gao, C., Li, H., Ma, L.: An intrinsic mode function basis dictionary for auditory signal processing. In: International Conference on Audio, Language and Image Processing, pp. 16–21 (2014). https://doi.org/10.1109/ICALIP.2014.7009748
Sharma, R., Vignolo, L., Schlotthauer, G., Colominas, M., Rufiner, H.L., Prasanna, S.: Empirical mode decomposition for adaptive am-fm analysis of speech: a review. Speech Commun. 88, 39–64 (2017). https://doi.org/10.1016/j.specom.2016.12.004
Çelebi, A.T., Ertürk, S.: Visual enhancement of underwater images using empirical mode decomposition. Expert Syst. Appl. 39(1), 800–805 (2012).https://doi.org/10.1016/j.eswa.2011.07.077
Tapkir, P., Patil, H.A.: Novel empirical mode decomposition cepstral features for replay spoof detection. In: Interspeech, pp. 721–725 (2018)
Linderhed, A.: Image empirical mode decomposition: a new tool for image processing. Adv. Adapt. Data Anal. 1, 265–294 (2009)
Delgado, H., Todisco, M., Sahidullah, M., Evans, N.W.D., Kinnunen, T., Lee, K.A., Yamagishi, J.: Asvspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In: Odyssey (2018)
Public-domain test images for homeworks and projects. https://homepages.cae.wisc.edu/~ece533/images/ (2019)
Hu, J., Xie, Q., Wang, X., Liu, X.: A novel bi-dimensional EMD algorithm and its application in image enhancement. Inf. Technol. J. 13, 469–476 (2014)
Tom, F., Jain, M., Dey, P.: End-to-end audio replay attack detection using deep convolutional networks with attention. In: INTERSPEECH (2018)
Patil, H.A., Kamble, M.R.: A survey on replay attack detection for automatic speaker verification (ASV) system. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1047–1053. IEEE (2018)
Li, J., Zhang, X., Sun, M., Zou, X., Zheng, C.: Attention-based LSTM algorithm for audio replay detection in noisy environments. Appl. Sci. 9(8), 1539 (2019)
Rafi, B.S.M., Murty, K.S.R., Nayak, S.: A new approach for robust replay spoof detection in ASV systems. In: IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 51–55 (2017)
Garg, S., Mankad, S.H.: Voice liveness detection under feature fusion and cross-environment scenario. Multimed. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-09281-y
McFee, B., Raffel, C., Liang, D., Ellis, D.P.W., McVicar, M., Battenberg, E., Nieto, O.: Librosa: audio and music signal analysis in python (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Sizov, A., Khoury, E., Kinnunen, T., Wu, Z., Marcel, S.: Joint speaker verification and antispoofing in the \(i\)-vector space. IEEE Trans. Inf. Forensics Secur. 10(4), 821–832 (2015). https://doi.org/10.1109/TIFS.2015.2407362
Todisco, M., Delgado, H., Evans, N.: Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017). https://doi.org/10.1016/j.csl.2017.01.001
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mankad, S.H., Garg, S. On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems. Prog Artif Intell 9, 325–339 (2020). https://doi.org/10.1007/s13748-020-00216-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-020-00216-0