Skip to main content

Advertisement

Log in

On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

Automatic speaker verification (ASV) systems have maximum threat from replay spoofing attacks. High frequency regions of the underlying audio signal exhibit the phenomenon about their presence. It is therefore useful to decompose the underlying audio signal into frequency bands or regions for possible analysis. In this paper, an empirical mode decomposition (EMD)-based replay spoofing detection system is presented. Using EMD, each signal is decomposed into several monotonic intrinsic mode functions (IMFs). The signal is reconstructed and represented using one or more subsets of these IMFs by performing different combinations for spoofing detection. Results on ASVspoof 2017 version 2.0 and AVspoof benchmark replay attack datasets indicate that there is a potential in initial IMFs to carry replay attack patterns, and that is sufficient rather than processing the entire signal. The proposed approach can also serve as a preprocessing technique by employing dimension reduction strategy. Cross-corpus experiments on the systems indicate the limitations of ASV antispoofing systems due to mismatched conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://www.bbc.com/news/technology-39965545.

  2. Last two IMFs do not include residual.

  3. https://www.idiap.ch/dataset/avspoof.

  4. https://www.asvspoof.org/.

  5. https://www.asvspoof.org/asvspoof2019/ASVspoof_2019_baseline_CM_v1.zip.

References

  1. Campbell, J.P.J.: Speaker Recognition. In: Jain, A.K., Bolle, R., Pankanti, S. (eds.) Biometrics, pp. 165–189. Springer, New York (1996)

    Chapter  Google Scholar 

  2. Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40 (2010)

    Article  Google Scholar 

  3. Reynolds, D.A.: Automatic speaker recognition: current approaches and future trends. Technical report

  4. Lau, Y.W., Wagner, M., Tran, D.: Vulnerability of speaker verification to voice mimicking. In: Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 145–148 (2004). https://doi.org/10.1109/ISIMP.2004.1434021

  5. Fazel, A., Chakrabartty, S.: An overview of statistical pattern recognition techniques for speaker verification. IEEE Circuits Syst. Mag. 11(2), 62–81 (2011). https://doi.org/10.1109/MCAS.2011.941080

    Article  Google Scholar 

  6. Leon, P.L., Apsingekar, V.R., Pucher, M., Yamagishi, J.: Revisiting the security of speaker verification systems against imposture using synthetic speech. In: ICASSP (2010)

  7. Sahidullah, M., Kinnunen, T.: Local spectral variability features for speaker verification. Digital Signal Process. 50, 1–11 (2016). https://doi.org/10.1016/j.dsp.2015.10.011

    Article  Google Scholar 

  8. Poddar, A., Sahidullah, M., Saha, G.: Speaker verification with short utterances: a review of challenges, trends and opportunities. IET Biometrics 7(2), 91–101 (2018). https://doi.org/10.1049/iet-bmt.2017.0065

    Article  Google Scholar 

  9. Kucur Ergunay, S., Khoury, E., Lazaridis, A., Marcel, S.: On the vulnerability of speaker verification to realistic voice spoofing. In: IEEE International Conference on Biometrics: Theory, Applications and Systems, pp. 1–8. IEEE (2015). https://doi.org/10.1109/BTAS.2015.7358783

  10. Kinnunen, T., Wu, Z., Lee, K.A., Sedlak, F., Chng, E.S., Li, H.: Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4401–4404 (2012). https://doi.org/10.1109/ICASSP.2012.6288895

  11. Alegre, F., Janicki, A., Evans, N.: Re-assessing the threat of replay spoofing attacks against automatic speaker verification. In: International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–6 (2014)

  12. Wu, Z., Kinnunen, T., Evans, N., Yamagishi, J.: ASVspoof 2015: Automatic speaker verification spoofing and countermeasures challenge evaluation plan (2015)

  13. Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)

  14. Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015). https://doi.org/10.1016/j.specom.2014.10.005

    Article  Google Scholar 

  15. Korshunov, P., Marcel, S.: Impact of score fusion on voice biometrics and presentation attack detection in cross-database evaluations. IEEE J. Sel. Top. Signal Process. 11(4), 695–705 (2017)

    Article  Google Scholar 

  16. Alepis, E., Patsakis, C.: Monkey says, monkey does: security and privacy on voice assistants. IEEE Access 5, 17841–17851 (2017)

    Article  Google Scholar 

  17. Sharma, R., Prasanna, S.M.: A better decomposition of speech obtained using modified empirical mode decomposition. Digital Signal Process. 58, 26–39 (2016). https://doi.org/10.1016/j.dsp.2016.07.012

    Article  Google Scholar 

  18. Sahidullah, M., Kinnunen, T., Hanilçi, C.: A comparison of features for synthetic speech detection. In: INTERSPEECH (2015)

  19. Mankad, S.H., Garg, S., Patel, M., Adalja, H.: Investigating feature reduction strategies for replay antispoofing in voice biometrics. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D.K., Bora, P.K., Pal, S.K. (eds.) Pattern Recognition and Machine Intelligence, pp. 400–408. Springer, Cham (2019)

    Chapter  Google Scholar 

  20. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995 (1998). https://doi.org/10.1098/rspa.1998.0193

    Article  MathSciNet  MATH  Google Scholar 

  21. Wu, J.D., Tsai, Y.J.: Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst. Appl. 38(5), 6112–6117 (2011). https://doi.org/10.1016/j.eswa.2010.11.013

    Article  Google Scholar 

  22. He, L., Lech, M., Maddage, N.C., Allen, N.B.: Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomed. Signal Process. Control 6(2), 139–146 (2011). https://doi.org/10.1016/j.bspc.2010.11.001. (Special Issue: The Advance of Signal Processing for Bioelectronics)

    Article  Google Scholar 

  23. Karan, B., Sahu, S.S., Mahto, K.: Parkinson disease prediction using intrinsic mode function based features from speech signal. Biocybern. Biomed. Eng. (2019). https://doi.org/10.1016/j.bbe.2019.05.005

    Article  Google Scholar 

  24. Chen, T., Ju, S., Yuan, X., Elhoseny, M., Ren, F., Fan, M., Chen, Z.: Emotion recognition using Empirical Mode Decomposition and approximation entropy. Comput. Electr. Eng. 72, 383–392 (2018). https://doi.org/10.1016/j.compeleceng.2018.09.022

    Article  Google Scholar 

  25. Nunes, J., Bouaoune, Y., Delechelle, E., Niang, O., Bunel, P.: Image analysis by bidimensional empirical mode decomposition. Image Vis. Comput. 21(12), 1019–1026 (2003). https://doi.org/10.1016/S0262-8856(03)00094-5

    Article  MATH  Google Scholar 

  26. Seger, K.D., Al-Badrawi, M.H., Miksis-Olds, J.L., Kirsch, N.J., Lyons, A.P.: An empirical mode decomposition-based detection and classification approach for marine mammal vocal signals. J. Acoust. Soc. Am. 144(6), 3181–3190 (2018). https://doi.org/10.1121/1.5067389

    Article  Google Scholar 

  27. Tang, X., Ma, Z., Niu, X., Yang, Y.: Robust audio watermarking algorithm based on empirical mode decomposition. Chin. J. Electron. 25(6), 1005–1010 (2016). https://doi.org/10.1049/cje.2016.06.007

    Article  Google Scholar 

  28. Khaldi, K., Boudraa, A.: Audio watermarking via EMD. IEEE Trans. Audio Speech Lang. Process. 21(3), 675–680 (2013). https://doi.org/10.1109/TASL.2012.2227733

    Article  Google Scholar 

  29. Khaldi, K., Boudraa, A., Turki, M., Chonavel, T., Samaali, I.: Audio encoding based on the empirical mode decomposition. In: 17th European Signal Processing Conference, pp. 924–928 (2009)

  30. Mankad, S.H., Pradhan, S.N.: Application of software defined radio for noise reduction using empirical mode decomposition. In: Wyld, D.C., Zizka, J., Nagamalai, D. (eds.) Advances in Computer Science, Engineering & Applications, pp. 113–121. Springer, Berlin (2012)

    Chapter  Google Scholar 

  31. Molla, M.K.I., Hirose, K.: Single-mixture audio source separation by subspace decomposition of hilbert spectrum. IEEE Trans. Audio Speech Lang. Process. 15(3), 893–900 (2007). https://doi.org/10.1109/TASL.2006.885254

    Article  Google Scholar 

  32. Molla, M.K.I., Hirose, K., Minematsu, N.: Audio source separation from the mixture using empirical mode decomposition with independent subspace analysis. In: INTERSPEECH (2004)

  33. Mijović, B., De Vos, M., Gligorijević, I., Taelman, J., Van Huffel, S.: Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE Trans. Biomed. Eng. 57(9), 2188–2196 (2010)

    Article  Google Scholar 

  34. He, P., Qi, M., Liu, G., Yu, Z., Fu, Q.: An adaptive single channel EMD-TNMF blind source separation algorithm for both instantaneous and convolutive mixed signal. IOP Conf. Ser. Mater. Sci. Eng. 658, 012003 (2019)

    Article  Google Scholar 

  35. Kemiha, M., Kacha, A.: Complex blind source separation. Circuits Syst. Signal Process. 36, 4670–4687 (2017)

    Article  Google Scholar 

  36. Wu, W., Peng, H.: Application of emd denoising approach in noisy blind source separation. JCM 9, 506–514 (2014)

    Article  Google Scholar 

  37. Gao, C., Li, H., Ma, L.: An intrinsic mode function basis dictionary for auditory signal processing. In: International Conference on Audio, Language and Image Processing, pp. 16–21 (2014). https://doi.org/10.1109/ICALIP.2014.7009748

  38. Sharma, R., Vignolo, L., Schlotthauer, G., Colominas, M., Rufiner, H.L., Prasanna, S.: Empirical mode decomposition for adaptive am-fm analysis of speech: a review. Speech Commun. 88, 39–64 (2017). https://doi.org/10.1016/j.specom.2016.12.004

    Article  Google Scholar 

  39. Çelebi, A.T., Ertürk, S.: Visual enhancement of underwater images using empirical mode decomposition. Expert Syst. Appl. 39(1), 800–805 (2012).https://doi.org/10.1016/j.eswa.2011.07.077

  40. Tapkir, P., Patil, H.A.: Novel empirical mode decomposition cepstral features for replay spoof detection. In: Interspeech, pp. 721–725 (2018)

  41. Linderhed, A.: Image empirical mode decomposition: a new tool for image processing. Adv. Adapt. Data Anal. 1, 265–294 (2009)

    Article  MathSciNet  Google Scholar 

  42. Delgado, H., Todisco, M., Sahidullah, M., Evans, N.W.D., Kinnunen, T., Lee, K.A., Yamagishi, J.: Asvspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In: Odyssey (2018)

  43. Public-domain test images for homeworks and projects. https://homepages.cae.wisc.edu/~ece533/images/ (2019)

  44. Hu, J., Xie, Q., Wang, X., Liu, X.: A novel bi-dimensional EMD algorithm and its application in image enhancement. Inf. Technol. J. 13, 469–476 (2014)

    Article  Google Scholar 

  45. Tom, F., Jain, M., Dey, P.: End-to-end audio replay attack detection using deep convolutional networks with attention. In: INTERSPEECH (2018)

  46. Patil, H.A., Kamble, M.R.: A survey on replay attack detection for automatic speaker verification (ASV) system. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1047–1053. IEEE (2018)

  47. Li, J., Zhang, X., Sun, M., Zou, X., Zheng, C.: Attention-based LSTM algorithm for audio replay detection in noisy environments. Appl. Sci. 9(8), 1539 (2019)

    Article  Google Scholar 

  48. Rafi, B.S.M., Murty, K.S.R., Nayak, S.: A new approach for robust replay spoof detection in ASV systems. In: IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 51–55 (2017)

  49. Garg, S., Mankad, S.H.: Voice liveness detection under feature fusion and cross-environment scenario. Multimed. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-09281-y

    Article  Google Scholar 

  50. McFee, B., Raffel, C., Liang, D., Ellis, D.P.W., McVicar, M., Battenberg, E., Nieto, O.: Librosa: audio and music signal analysis in python (2015)

  51. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  52. Sizov, A., Khoury, E., Kinnunen, T., Wu, Z., Marcel, S.: Joint speaker verification and antispoofing in the \(i\)-vector space. IEEE Trans. Inf. Forensics Secur. 10(4), 821–832 (2015). https://doi.org/10.1109/TIFS.2015.2407362

    Article  Google Scholar 

  53. Todisco, M., Delgado, H., Evans, N.: Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017). https://doi.org/10.1016/j.csl.2017.01.001

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjay Garg.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mankad, S.H., Garg, S. On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems. Prog Artif Intell 9, 325–339 (2020). https://doi.org/10.1007/s13748-020-00216-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-020-00216-0

Keywords

Navigation