Skip to main content

Advertisement

Log in

Combining evidences from Hilbert envelope and residual phase for detecting replay attacks

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this work, the Hilbert envelope of the linear prediction (LP) residual and the residual phase have been explored for detecting replay attacks. The two source features namely, LP residual Hilbert envelope mel frequency cepstral coefficient (LPRHEMFCC) and residual phase cepstral coefficient (RPCC) are used for replay detection. From the signal perspectives, Hilbert envelope represents the amplitude information of LP residual samples. Residual phase represents to excitation information present in the sequence of LP residual samples. Hence, both can be considered as two components of the raw LP residual signal. In this direction, score level fusion of LPRHEMFCC and RPCC features is compared with a third source feature named as, residual mel frequency cepstral coefficient (RMFCC) derived from the raw LP residual using LP analysis. Comparative analysis has been performed using Gaussian mixtures model-universal background model (GMM-UBM) ASV experiments (IITG-MV replay database) and spoof detection experiments (ASVspoof 2017 database). For IITG-MV database, relative (RFAR-ZFAR) improvements of 86.10% (males), 27.45% (females) and 54.14% (whole-set) are achieved for (LPRHEMFCC + RPCC) + MFCC combination over RMFCC + MFCC combination. The RFAR and ZFAR stands for false acceptance rate under replay attacks and zero effort impostor attacks, respectively. In terms of tandem-detection cost function (t-DCF) metrics, the obtained relative improvements are 40.50%, 13.13% and 26.16%, respectively. For ASVspoof 2017 database, relative EER improvements of 11.72% and 6.74% are achieved for (LPRHEMFCC + RPCC) + MFCC and (LPRHEMFCC + RPCC) + CQCC over RMFCC + MFCC and RMFCC + CQCC, respectively. These observations justify the usefulness of exploring Hilbert envelope and residual phase components of the LP residual over direct processing of the LP residual signal for detecting replay attacks. Moreover, score level fusion of LPRHEMFCC, RPCC and CQCC provides 8.86% EER.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Bonastre, J. F., Matrouf, D., & Fredouille, C. (2007). Artificial impostor voice transformation effects on false acceptance rates. In: Proceedings of interspeech, pp 2053–2056

  • Campbell, J. P, Jr. (1997). Speaker recognition: A tutorial. Proceedings on IEEE, 85(9), 1437–1462.

    Article  Google Scholar 

  • Das, R. K., & Prasanna, S. M. (2016). Exploring different attributes of source information for speaker verification with limited test data. The Journal of the Acoustical Society of America, 140(1), 184–190.

    Article  Google Scholar 

  • De Leon, P. L., Apsingekar, V. R., Pucher, M., & Yamagishi, J. (2010a). Revisiting the security of speaker verification systems against imposture using synthetic speech. In: Proceedings of ICASSP, pp 1798–1801

  • De Leon, P. L., Pucher, M., & Yamagishi, J. (2010b). Evaluation of the vulnerability of speaker verification to synthetic speech. In: Proceeding of Odyssey: The Speaker and Language Recognition Workshop p 28

  • Evans, N., Kinnunen, T., & Yamagishi, J. (2013). Spoofing and countermeasures for automatic speaker verification. In: Proceedings of interspeech, pp 925–929

  • Font, R., Espın, J. M., & Cano, M. J. (2017). Experimental analysis of features for replay attack detection—Results on the ASVspoof 2017 challenge. In: Proceedings of interspeech pp 7–11

  • Hanilçi, C. (2017). Linear prediction residual features for automatic speaker verification anti-spoofing. Multimedia Tools and Applications pp 1–13

  • Hanilçi, C., Kinnunen T, Tomi., Sahidullah, M., & Sizov, A. (2015). Classifiers for synthetic speech detection: A comparison. In: Proceeding of interspeech, pp 2057–2061

  • Haris, B. C., Pradhan, G., Prasanna, S. R. M., Das, R. K., & Sinha, R. (2012). Multivaribility speaker recognition database in Indian scenario. International Journal of Speech Technology (Springer), 15(4), 441–453.

    Article  Google Scholar 

  • Hautamäki, R. G., Kinnunen, T., Hautamäki, V., Leino, T., & Laukkanen, A. M. (2013). I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proceeding of interspeech, pp 930–934

  • Hautamäki, R. G., Kinnunen, T., Hautamäki, V., & Laukkanen, A. M. (2015). Automatic versus human speaker verification: The case of voice mimicry. Speech Communication, 72, 13–31.

    Article  Google Scholar 

  • Jelil, S., Das, R. K., Prasanna, S. M., & Sinha, R. (2017). Spoof detection using source, instantaneous frequency and cepstral features. In: Proceedings on interspeech pp 22–26

  • Jelil, S., Kalita, S., Prasanna, S. R. M., & Sinha, R. (2018). Exploration of compressed ILPR features for replay attack detection. In: Proceedings on interspeech, pp 631–635

  • Ji, Z., Li, Z. Y., Li, P., An, M., Gao, S., Wu, D., & Zhao, F. (2017). Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof2017. In: Proceedings of interspeech, pp 87–91

  • Kamble, M., Tak, H., & Patil, H. (2018). Effectiveness of speech demodulation-based features for replay detection. In: Proceeding of interspeech, pp 641–645

  • Kinnunen, T., Lee, K. A., Delgado, H., Evans, N., Todisco, M., Sahidullah, M., Yamagishi, J., & Reynolds, D. A. (2018). t-dcf: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. In: Proceeding of Odyssey the speaker and language recognition workshop, pp 312–319

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.

    Article  Google Scholar 

  • Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., & Lee, K. A. (2017). The asvspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. In: Proceeding of interspeech, pp 2–6

  • Kinnunen, T., Wu, Z. Z., Lee, K. A., Sedlak, F., Chng, E. S., & Li, H. (2012). Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech. In: Proceeding of ICASSP, pp 4401–4404

  • Larcher, A., Lee, K. A., Ma, B., & Li, H. (2012). RSR2015: Database for text-dependent speaker verification using multiple pass-phrases. In: Proceeding of interspeech, pp 1580–1583

  • Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., & Shchemelinin, V. (2017), Audio replay attack detection with deep learning frameworks. In: Proceeding of interspeech, pp 82–86

  • Li, D., Wang, L., Dang, J., Liu, M., Oo, Z., Nakagawa, S., Guan, H., & Li, X. (2018). Multiple phase information combination for replay attacks detection. In: Proceeding of interspeech, pp 656–660

  • Lindberg, J., & Blomberg, M. (1999). Vulnerability in speaker verification: A study of technical impostor techniques. In: Proceeding of EUROSPEECH, pp 5–9

  • Makhoul, J. (1975). Linear prediction: A tutorial review. Proceeding of IEEE, 63(4), 561–580.

    Article  Google Scholar 

  • Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In: Proceeding on European conference on speech communication technology, Rhodes, Greece, 4, pp 1895–1898

  • Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process Letter, 13(1), 52–55.

    Article  Google Scholar 

  • Nagarsheth, P., Khoury, E., Patil, K., & Garland, M. (2017). Replay attack detection using DNN for channel discrimination. In: Proceeding of interspeech, pp 97–101

  • Nocerino, N., Soong, F., Rabiner, L., & Klatt, D. (1985). Comparative study of several distortion measures for speech recognition. Proceeding of ICASSP, 10, 25–28.

    Google Scholar 

  • Pépiot, E. (2014). Male and female speech: A study of mean F0, F0 range, phonation type and speech rate in parisian french and American English speakers. Speech Prosody, 7, 305–309.

    Google Scholar 

  • Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.

    Article  Google Scholar 

  • Rabiner, L. R., & Schafer, R. W. (1978). Digital Processing of Speech Signals. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Raju Alluri, K., & Gangashetty, A. K. V. (2017). SFF anti-spoofer: IIIT-H submission for automatic speaker verification spoofing and countermeasures challenge 2017. In: Proceeding of interspeech, pp 107–111

  • Raykar, V. C., Yegnanarayana, B., Prasanna, S. M., & Duraiswami, R. (2005). Speaker localization using excitation source information in speech. IEEE Transactions on Speech and Audio Processing, 13(5), 751–761.

    Article  Google Scholar 

  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.

    Article  Google Scholar 

  • Sailor, H., Kamble, M., & Patil, H. (2018). Auditory filterbank learning for temporal modulation features in replay spoof speech detection. In: Proceeding of interspeech, pp 666–670

  • Singh, M., & Pati, D. (2018). Linear prediction residual based short-term cepstral features for replay attacks detection. Proceeding of interspeech, 2018, 751–755.

    Article  Google Scholar 

  • Suthokumar, G., Sethu, V., Wijenayake, C., & Ambikairajah, E. (2018). Modulation dynamic features for the detection of replay attacks. In: Proceeding of interspeech, pp 691–695

  • Tak, H., & Patil, H. (2018). Novel linear frequency residual cepstral features for replay attack detection. In: Proceeding of interspeech, pp 726–730

  • Tapkir, P., & Patil, H. (2018). Novel empirical mode decomposition cepstral features for replay spoof detection. In: Proceeding of interspeech, pp 721–725

  • The Bosaris toolkit [software package]. Retrieved from https://sites.google.com/site/bosaristoolkit

  • Todisco, M., Delgado, H., & Evans, N. (2017). Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Computer Speech and Language, 45, 516–535.

    Article  Google Scholar 

  • Villalba, J., & Lleida, E. (2010). Speaker verification performance degradation against spoofing and tampering attacks. In: FALA 10 workshop, pp 131–134

  • Villaba, J., & Lieida, E. (2011). Preventing replay attacks on speaker verification systems. In: Proceeding of International carnahan conference on security technology (ICCST), pp 1–8

  • Wang, J., & Johnson, M. (2012). Residual phase cepstrum coefficients with application to cross-lingual speaker verification. In: Interspeech

  • Wang, Z., Wei, G., & He, Q. H. (2011). Channel pattern noise based playback attack detection algorithm for speaker recognition. In: Proceeding of IEEE Int conference of the biometrics special interest Group (BIOSIG) on machine learning and cybernetics, pp 1708–1713

  • Witkowski, M., Kacprzak, S., Zelasko, P., Kowalczyk, K., & Gałka, J. (2017). Audio replay attack detection using high-frequency features. In: Proceeding of interspeech, pp 27–31

  • Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015a). Spoofing and counter measures for speaker verification: A survey. Speech Communication, 66, 130–153.

    Article  Google Scholar 

  • Wu, Z., Kinnunen, T., Evans, N., Yamagishi, J., Hanilçi, C., Sahidullah, M., & Sizov, A. (2015b). ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Proceeding of interspeech, pp 2037–2041

Download references

Acknowledgements

This research work is funded by Ministry of Electronics and Information Technology (MeitY), Govt. of India through the project “Development of Excitation Source Features Based Spoof Resistant and Robust Audio-Visual Person Identification System”. The research work is carried out in Speech Processing and Pattern Recognition (SPARC) laboratory at National Institute of Technology Nagaland, Dimapur, India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madhusudan Singh.

Additional information

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, M., Pati, D. Combining evidences from Hilbert envelope and residual phase for detecting replay attacks. Int J Speech Technol 22, 313–326 (2019). https://doi.org/10.1007/s10772-019-09604-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-019-09604-x

Keywords

Navigation