Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features

Bharath, K. P.; Kumar, M. Rajesh

doi:10.1007/s11042-022-12380-7

Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features

Published: 29 April 2022

Volume 81, pages 39343–39366, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

273 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Spoofing attack detection is one of the essential components in automatic speaker verification (ASV) systems. The success of\ ASV-2015 shows a great perspective by detecting the voice conversion and speech synthesis spoofs. However, the researchers address fewer replay attack spoof detection systems, and non-professional impersonators most likely use the replay attacks. This paper detects replay attacks on the ASV system using the ASVspoof-2017-v2.0 corpus. This work is mainly partitioned into two parts. The first part shows the significance of Empirical Mode Decomposition (EMD) and Hilbert Spectrum (HS) to detect the replay attack detection by extracting the instantaneous frequency (IF) and instantaneous energies (IE) from frequency components of the speech signal to differentiate the characteristics of genuine and spoof speech, then it given to rectangular filter cepstral coefficients (RFCC) to obtain the desired set of features to detect whether the given speech sample is genuine or spoof. In the second part, a new score-level fusion system is proposed to increase the system performance. Along with the proposed stand-alone method, Constant-Q cepstral coefficients (CQCC) and All-Pole Group Delay Function (APGDF) methods are used to extract the magnitude and phase features set, respectively. The proposed stand-alone and score-level fusion method improves performance accuracy than other state-of-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Replay spoofing countermeasures using high spectro-temporal resolution features

Article 20 February 2019

Linear prediction residual features for automatic speaker verification anti-spoofing

Article 05 September 2017

On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems

Article 29 August 2020

References

Alam MJ, Kenny P, Stafylakis T (2015) Combining amplitude and phase-based features for speaker verification with short duration utterances. Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH 2015-Janua:249–253
Bakar B, Hanilçi C (2018) An experimental study on audio replay attack detection using deep neural networks. 2018 IEEE Spoken Language Technology Workshop. IEEE
Banno H, Lu J, Nakamura S, Shikano K, Kawahara H (2003) Efficient representation of short-time phase based on time-domain smoothed group delay. Electron Commun Japan, Part III Fundam Electron Sci (English Transl Denshi Tsushin Gakkai Ronbunshi) 86(10):56–64. https://doi.org/10.1002/ecjc.10120
Article Google Scholar
Brown JC (1991) Calculation of a constant Q spectral transform. J Acoustical Soc Am 89(1):425–434
Article Google Scholar
Brümmer N, de Villiers E (2013) The bosaris toolkit: Theory, algorithms and code for surviving the new dcf. arXiv preprint arXiv:1304.2865
Campbell JP (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462. https://doi.org/10.1109/5.628714
Article Google Scholar
Chen Z, Zhang W, Xie Z, Xu X, Chen D (2018) Recurrent neural networks for automatic replay spoofing attack detection. ICASSP, IEEE Int Conf Acoust Speech Signal Process - Proc 2018-April:2052–2056. https://doi.org/10.1109/ICASSP.2018.8462644
Article Google Scholar
Cheng Z, Shen J (2016) On effective location-aware music recommendation. ACM Trans Inform Syst (TOIS) 34(2):1–32
Article Google Scholar
Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2010) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Eargle J (2003) Loudspeaker handbook Springer Science & Business Media
Font R, Espín JM, Cano MJ (2017) Experimental analysis of features for replay attack detection-Results on the ASVspoof 2017 Challenge. Proc Annu Conf Int Speech Commun Assoc INTERSPEECH 2017-Augus:7–11. https://doi.org/10.21437/Interspeech.2017-450
Article Google Scholar
Glasberg BR, Moore BCJ (1990) Derivation of auditory filter shapes from notched-noise data. Hear Res 47(1–2):103–138. https://doi.org/10.1016/0378-5955(90)90170-T
Article Google Scholar
Hanilçi C, Kinnunen T, Sahidullah M, Sizov A (2016) Spoofing detection goes noisy: an analysis of synthetic speech detection in the presence of additive noise. Speech Commun 85:83–97. https://doi.org/10.1016/j.specom.2016.10.002
Article Google Scholar
Hanilçi C, Kinnunen T, Sahidullah M, Sizov A (2015) Classifiers for synthetic speech detection: A comparison Proc Annu Conf Int Speech Commun Assoc INTERSPEECH 2057:2015–2061
Hautamäki RG, Kinnunen T, Hautamäki V, Leino T, Laukkanen AM (2013) I-vectors meet imitators: On vulnerability of speaker verification systems against voice mimicry. Proc Annu Conf Int Speech Commun Assoc. INTERSPEECH (August):930–934
Hegde RM, Murthy HA, Gadde VRR (2007) Significance of the modified group delay feature in speech recognition. IEEE Trans Audio Speech Lang Process 15(1):190–202. https://doi.org/10.1109/TASL.2006.876858
Article Google Scholar
Huang H, Pan J (2006) Speech pitch determination based on Hilbert-Huang transform. Signal Process 86(4):792–803. https://doi.org/10.1016/j.sigpro.2005.06.011
Article MATH Google Scholar
Huang NE, Liu HH (1998) The empirical mode decomposition and the Hubert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc A Math Phys Eng Sci 454(1971):903–995. https://doi.org/10.1098/rspa.1998.0193
Jelil S, Das RK, Prasanna SRM, Sinha R (2017) Spoof detection using source, instantaneous frequency and cepstral features. Proc Annu Conf Int Speech Commun Assoc INTERSPEECH 2017-Augus:22–26. https://doi.org/10.21437/Interspeech.2017-930
Article Google Scholar
Jelil S, Kalita S, Mahadeva Prasanna SR, Sinha R (2018) Exploration of compressed ILPR features for replay attack detection. Proc Annu Conf Int Speech Commun Assoc. INTERSPEECH 2018-Septe(September):631–635. https://doi.org/10.21437/Interspeech.2018-1297
Article Google Scholar
Kamble MR, Patil HA (2018) Novel variable length energy separation algorithm using instantaneous amplitude features for replay detection. Proc Annu Conf Int Speech Commun Assoc. INTERSPEECH 2018-Septe(June):646–650. https://doi.org/10.21437/Interspeech.2018-1687
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40. https://doi.org/10.1016/j.specom.2009.08.009
Article Google Scholar
Lai C-I et al (2019) Attentive filtering networks for audio replay attack detection. ICASSP 2019, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
Larcher A, Lee KA, Ma B, Li H (2014) Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun 60:56–77. https://doi.org/10.1016/j.specom.2014.03.001
Article Google Scholar
Lavrentyeva G, Novoselov S, Malykh E, Kozlov A, Kudashev O, Shchemelinin V (2017) Audio replay attack detection with deep learning frameworks. Proc Annu Conf Int Speech Commun Assoc. INTERSPEECH 2017-Augus:82–86. https://doi.org/10.21437/Interspeech.2017-360
Article Google Scholar
Liu Y, Tian Y, He L, Liu J, Johnson MT (2015) Simultaneous utilization of spectral magnitude and phase information to extract supervectors for speaker verification anti-spoofing. Proc Annu Conf Int Speech Commun Assoc. INTERSPEECH 2015-Janua:2082:2015–2086
Google Scholar
Lukic Y, Vogt C, Dürr O, Stadelmann T (2016) Speaker identification and clustering using convolutional neural networks. In 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP), pp. 1–6. IEEE
Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63(4):561–580
Article Google Scholar
Murthy HA, Yegnanarayana B (2011 Oct 1) Group delay functions and its applications in speech technology. Sadhana 36(5):745–782
Article Google Scholar
Murthy HA, Gudde V, Avenue R, Park M (2003) The modified group delay function and its application to phoneme recognition, no. 3, pp. 68–71
Pal M, Saha G (2015) On robustness of speech based biometric systems against voice conversion attack. Appl Soft Comput J 30:214–228. https://doi.org/10.1016/j.asoc.2015.01.036
Article Google Scholar
Phapatanaburi K, Iwahashi M (2019) Replay attack detection using linear prediction analysis-based relative phase features. IEEE Access 7:183614–183625
Rajan P, Kinnunen T, Hanilci C, Pohjalainen J, Alku P (2013) Using group delay functions from all-pole models for speaker recognition. Proc Annu Conf Int Speech Commun Assoc INTERSPEECH:2489–2493
Sailor HB, Kamble MR, Patil HA (2018) Auditory filterbank learning for temporal modulation features in replay spoof speech detection. Proc Annu Conf Int Speech Commun Assoc. INTERSPEECH 2018-Septe(September):666–670. https://doi.org/10.21437/Interspeech.2018-1651
Schlotthauer G, Torres ME, Rufiner HL (2009) Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies. IFMBE Proc 25(4):984–987. https://doi.org/10.1007/978-3-642-03882-2-262
Article Google Scholar
Sharma R, Mahadeva Prasanna SR (2016) A better decomposition of speech obtained using modified empirical mode decomposition. Digit Signal Process A Rev J 58:26–39. https://doi.org/10.1016/j.dsp.2016.07.012
Article Google Scholar
Sharma R, Prasanna SRM, Bhukya RK, Kumar Das R (2017) Analysis of the intrinsic mode functions for speaker information. Speech Commun 91:1–16. https://doi.org/10.1016/j.specom.2017.04.006
Article Google Scholar
Sharma R, Vignolo L, Schlotthauer G, Colominas MA, Leonardo Rufiner H, Prasanna SRM (2017) Empirical mode decomposition for adaptive AM-FM analysis of speech: a review. Speech Comm 88:39–64
Article Google Scholar
Sharma R, Bhukya RK, Prasanna SRM (2018) Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification. Speech Commun 96(March 2017):207–224. https://doi.org/10.1016/j.specom.2017.12.001
Article Google Scholar
Shen J, Shepherd J, Ngu AHH (2006) Towards effective content-based music retrieval with multiple acoustic feature combination. IEEE Trans Multimed 8(6):1179–1189
Article Google Scholar
Shen J, Tao M, Qiang Q, Tao D, Rui Y (2019) Toward efficient indexing structure for scalable content-based music retrieval. Multimedia Syst 25(6):639–653
Article Google Scholar
Suthokumar G, Sethu V, Wijenayake C, Ambikairajah E (2018) Modulation dynamic features for the detection of replay attacks. Proc Annu Conf Int Speech Commun Assoc INTERSPEECH 2018-Septe(September):691–695. https://doi.org/10.21437/Interspeech.2018-1846
Article Google Scholar
Sztahó D, Szaszák G, Beke A (2019) Deep learning methods in speaker recognition: a review. arXiv preprint arXiv:1911.06615
Tapkir PA et al (2018) Replay spoof detection using power function based features. 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE
Todisco M, Evans N, Kinnunen T, Lee KA, Yamagishi J (2017) ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements
Todisco M, Delgado H, Evans N (2017) Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput Speech Lang 45:516–535. https://doi.org/10.1016/j.csl.2017.01.001
Article Google Scholar
Todisco M, Delgado H, Evans N (2019) A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients. Odyssey 2016 Speak Lang Recognit Work:283–290. https://doi.org/10.21437/Odyssey.2016-41
Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Commun 66:130–153. https://doi.org/10.1016/j.specom.2014.10.005
Article Google Scholar
Wu Z, Gao S, Cling ES, Li H A study on replay attack and anti-spoofing for text-dependent speaker verification. 2014 Asia-Pacific Signal Inf Process Assoc Annu Summit Conf APSIPA 2014:2014. https://doi.org/10.1109/APSIPA.2014.7041636
Yegnanarayana B (1978) Formant extraction from linear-prediction phase spectra. J Acoust Soc Am 63(5):1638–1640. https://doi.org/10.1121/1.381864
Article Google Scholar
Yoshioka T, Sehr A, Delcroix M, Kinoshita K, Maas R, Nakatani T, Kellermann W (2012) Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition. IEEE Signal Process Mag 29(6):114–126
Article Google Scholar
Zeyan et al (2019) Replay attack detection with auditory filter-based relative phase features. EURASIP Journal on Audio, Speech, and Music Processing 2019.1

Download references

Acknowledgments

First author Bharath K P, (CSIR-Senior Research Fellow) would like to thank the Council of Scientific & Industrial Research (CSIR) Human Resource Development Group (HRDG), Govt of India, for financial assistance (CSIR-SRF, Ack. No.: 143672/2 k18/1, File No.: 09/844(0084)/2019 EMR-I.)

Author information

Authors and Affiliations

School of Electronics Engineering, VIT University, Vellore, India
K. P. Bharath & M. Rajesh Kumar

Authors

K. P. Bharath
View author publications
You can also search for this author in PubMed Google Scholar
M. Rajesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Rajesh Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bharath, K.P., Kumar, M.R. Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features. Multimed Tools Appl 81, 39343–39366 (2022). https://doi.org/10.1007/s11042-022-12380-7

Download citation

Received: 30 August 2020
Revised: 17 January 2022
Accepted: 21 January 2022
Published: 29 April 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11042-022-12380-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features

Abstract

Access this article

Similar content being viewed by others

Replay spoofing countermeasures using high spectro-temporal resolution features

Linear prediction residual features for automatic speaker verification anti-spoofing

On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features

Abstract

Access this article

Similar content being viewed by others

Replay spoofing countermeasures using high spectro-temporal resolution features

Linear prediction residual features for automatic speaker verification anti-spoofing

On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation