Texture analysis of edge mapped audio spectrogram for spoofing attack detection

Meriem, Fedila; Messaoud, Bengherabi; Bahia, Yahya-Zoubir

doi:10.1007/s11042-023-15329-6

Texture analysis of edge mapped audio spectrogram for spoofing attack detection

Published: 26 May 2023

Volume 83, pages 15915–15937, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Fedila Meriem¹,
Bengherabi Messaoud¹ &
Yahya-Zoubir Bahia¹

461 Accesses
3 Citations
Explore all metrics

This article has been updated

Abstract

The vulnerability of the Automatic Speaker verification technology to different presentation attacks has drawn much interest to design anti-spoofing countermeasures. Following their success in face anti-spoofing, some techniques based on texture analysis were cross-pollinated recently to the voice anti-spoofing domain with very promising results. In this same research direction and motivated by the fact that the spectrogram image local gradient amplitude is sensitive to image distortions presented by spoof attacks, a novel voice anti-spoofing countermeasure based on texture analysis of spectrogram image is devised in this paper. The main novelty of the proposed approach resides in the use of edge detection techniques applied to the spectrogram image. The recorded speech was initially converted to a 2-D image. Then, the spectrogram is mapped by the use of the edge detection method. Finally, different texture descriptors are used to extract the visual features from the original and the mapped spectrograms combination. Experimental results on the ASVspoof 2017 v2.0 dataset show that the proposed countermeasure clearly outperforms some well-known techniques, providing an Equal-Error Rate (EER) as low as 3.52% and 22.86% for the development and evaluation sets, respectively. For the ASVspoof 2019 dataset, in turn, the proposed method achieves up to 27.62% and 29.15% relative EER improvement over the baseline CQCC feature in the Logical and Physical Access scenarios, respectively. In addition, the results show that the proposed hand-crafted features-based approach with an SVM classifier can deliver superior performance than more sophisticated solutions that rely upon the complex neural network architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Robust and Real-Time Face Anti-spoofing Method Based on Texture Feature Analysis

Physiological-physical feature fusion for automatic voice spoofing detection

Article 27 August 2022

Face presentation attack detection using guided scale texture

Article 13 May 2017

Change history

30 November 2023
Author name "Yahya-zoubir" has been change to "Yahya-Zoubir".

References

Adiban M, Sameti H, Shehnepoor S (2020) Replay spoofing countermeasure using autoencoder and siamese networks on asvspoof 2019 challenge. Comput Speech Lang 64:101105
Article Google Scholar
Al-Karawi K A, Mohammed D Y (2021) Improving short utterance speaker verification by combining mfcc and entrocy in noisy conditions. Multimed Tools Applic 80(14):22231–22249
Article Google Scholar
Alegre F, Amehraye A, Evans N (2013). In: 2013 IEEE Sixth international conference on Biometrics: Theory, Applications and Systems (BTAS). IEEE, pp 1–8
Alegre F, Vipperla R, Amehraye A, Evans N (2013). In: INTERSPEECH 2013, 14th annual conference of the international speech communication association, Lyon: France (2013), p 5p
Alzantot M, Wang Z, Srivastava M B (2019) Deep residual neural networks for audio spoofing detection. arXiv:1907.00501
Avila A R, Alam J, Prado F O C, O’Shaughnessy D, Falk T H (2021) On the use of blind channel response estimation and a residual neural network to detect physical access attacks to speaker verification systems. Comput Speech Lang 66:101163
Article Google Scholar
Bakar B, Hanilçi C (2018). In: 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, pp 132–138
Bharath KP, Kumar M R (2022) Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features. Multimedia Tools and Applications, 1–24
Biagio M S, Crocco M, Cristani M, Martelli S, Murino V (2013). In: Proceedings of the IEEE international conference on computer vision, pp 809–816
Boashash B (2015) Time-frequency signal analysis and processing: a comprehensive reference. Academic Press
Brown J C (1991) Calculation of a constant q spectral transform. J Acoust Soc Amer 89(1):425–434
Article Google Scholar
Brown J C, Puckette M S (1992) An efficient algorithm for the calculation of a constant q transform. J Acoust Soc Amer 92(5):2698–2701
Article Google Scholar
Bruna J, Mallat S (2013) Invariant scattering convolution networks. IEEE Trans Pattern Anal Mach Intell 35(8):1872–1886
Article Google Scholar
Burger W, Burge M J, Burge M J, Burge M J (2009) Principles of digital image processing, vol 111. Springer
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/%7Ecjlin/libsvm
Article Google Scholar
Chang S-Y, Wu K-C, Chen C-P (2019). In: INTERSPEECH, pp 1063–1067
Chen Z, Zhang W, Xie Z, Xu X, Chen D (2018). In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2052–2056
Cheng X, Xu M, Zheng T F (2019). In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp 540–545
Chettri B, Mishra S, Sturm B L, Benetos E (2018) A study on convolutional neural network based end-to-end replay anti-spoofing. arXiv:1805.09164
Chettri B, Kinnunen T, Benetos E (2020) Deep generative variational autoencoding for replay spoof detection in automatic speaker verification. Comput Speech Lang 63:101092
Article Google Scholar
Dehak N, Dehak R, Glass J R, Reynolds D A, Kenny P et al (2010). In: Odyssey, p 15
Delgado H, Todisco M, Sahidullah M, Evans N, Kinnunen T, Lee K A, Yamagishi J (2018). In: Odyssey 2018-the speaker and language recognition workshop
Evans NWD, Kinnunen T, Yamagishi J (2013) Spoofing and countermeasures for automatic speaker verification. In: Interspeech, pp 925–929
Fedila M, Amrouche A (2012). In: 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA). IEEE, pp 1034–1038
Fedila M, Amrouche A (2012). In: 2012 24th International Conference on Microelectronics (ICM). IEEE, pp 1–4
Fedila M, Bengherabi M, Amrouche A (2018) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimed Tools Applic 77(13):16721–16739
Article Google Scholar
Fitzgerald D (2010). In: Proceedings of the International Conference on Digital Audio Effects (DAFx), vol 13, pp 1–4
Gonzalez-Soler L J, Patino J, Gomez-Barrero M, Todisco M, Busch C, Evans N (2020). In: 2020 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, pp 1–6
Gupta P, Chodingala P K, Patil H A (2022) Replay spoof detection using energy separation based instantaneous frequency estimation from quadrature and in-phase components. Computer Speech & Language, 101423
Hanilçi C (2018) Data selection for i-vector based automatic speaker verification anti-spoofing. Digital Signal Process 72:171–180
Article Google Scholar
Jung J-, Shim H-, Heo H-S, Yu H-J (2019) Replay attack detection with complementary high-resolution information using end-to-end dnn for the asvspoof 2019 challenge. arXiv:1904.10134
Kamble M R, Patil H A (2021) Detection of replay spoof speech using teager energy feature cues. Comput Speech Lang 65:101140
Article Google Scholar
Kanagasundaram A, Dean D, Sridharan S, McLaren M, Vogt R (2014) I-vector based speaker recognition using advanced channel compensation techniques. Comput Speech Lang 28(1):121–140
Article Google Scholar
Kannala J, Rahtu E (2012). In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 1363–1366
Khoury E, Kinnunen T, Sizov A, Wu Z, Marcel S (2014). In: Fifteenth annual conference of the international speech communication association. Citeseer
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40
Article Google Scholar
Kinnunen T, Sahidullah M, Falcone M, Costantini L, Hautamäki R G, Thomsen D, Sarkar A, Tan Z-H, Delgado H, Todisco M et al (2017). In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 5395–5399
Kinnunen T, Lee K A, Delgado H, Evans N, Todisco M, Sahidullah M, Yamagishi J, Reynolds D A (2018) t-dcf: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. arXiv:1804.09618
Krobba A, Debyeche M, Selouani S-A (2020) Mixture linear prediction gammatone cepstral features for robust speaker verification under transmission channel noise. Multimed Tools Applic 79(25):18679–18693
Article Google Scholar
Lai C-I, Abad A, Richmond K, Yamagishi J, Dehak N, King S (2019). In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 6316–6320
Lai C-I, Chen N, Villalba J, Dehak N (2019) Assert: anti-spoofing with squeeze-excitation and residual networks. arXiv:1904.01120
Lavrentyeva G, Novoselov S, Malykh E, Kozlov A, Kudashev O, Shchemelinin V (2017). In: Interspeech, pp 82–86
Li J, Wang H, He P, Abdullahi S M, Li B (2022) Long-term variable q transform: a novel time-frequency transform algorithm for synthetic speech detection. Digit Signal Process 120:103256
Article Google Scholar
Li R, Zhao M, Li Z, Li L, Hong Q (2019). In: Interspeech, pp 1048–1052
Liu Y, Tian Y, He L, Liu J, Johnson M T (2015). In: Sixteenth annual conference of the international speech communication association
Liu M, Wang L, Dang J, Lee K A, Nakagawa S (2021) Replay attack detection using variable-frequency resolution phase and magnitude features. Comput Speech Lang 66:101161
Article Google Scholar
Maini R, Aggarwal H (2009) Study and comparison of various image edge detection techniques. International Journal of Image Processing (IJIP) 3 (1):1–11
Google Scholar
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The det curve in assessment of detection task performance. Tech. rep., National Inst of Standards and Technology Gaithersburg MD
Monteiro J, Alam J, Falk T H (2020) Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers. Comput Speech Lang 63:101096
Article Google Scholar
Nagarsheth P, Khoury E, Patil K, Garland M (2017). In: Interspeech, pp 97–101
Ojansivu V, Heikkilä J (2008). In: International conference on image and signal processing. Springer, pp 236–243
Patel T B, Patil H A (2015). In: Sixteenth annual conference of the international speech communication association
Patil A T, Acharya R, Sai P K A, Patil H A (2019). In: Interspeech, pp 2898–2902
Patil A T, Acharya R, Patil H A, Guido R C (2022) . Comput Speech Lang 72:101281
Article Google Scholar
Rahmeni R, Aicha A B, Ayed Y B (2022) Voice spoofing detection based on acoustic and glottal flow features using conventional machine learning techniques. Multimedia Tools and Applications, 1–25
Rakotomamonjy A, Gasso G (2014) Histogram of gradients of time–frequency representations for audio scene classification. IEEE/ACM Trans Audio Speech Lang Process 23(1):142–153
Google Scholar
Reynolds D A (1995) Speaker identification and verification using gaussian mixture speaker models. Speech Commun 17(1-2):91–108
Article Google Scholar
Rose P (2006) Technical forensic speaker recognition: Evaluation, types and testing of evidence. Comput Speech Lang 20(2-3):159–191
Article Google Scholar
Schluter R, Bezrukov I, Wagner H, Ney H (2007). In: 2007 IEEE International conference on acoustics, speech and signal processing-ICASSP’07, vol 4. IEEE, pp IV–649
Schörkhuber C, Klapuri A, Holighaus N, Dörfler M (2014). In: Audio engineering society conference: 53rd international conference: semantic audio. Audio Engineering Society
Singh M, Pati D (2019) Usefulness of linear prediction residual for replay attack detection. AEU-Int Journal of Electronics and Communications 110:152837
Article Google Scholar
Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S (2018). In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 5329–5333
Solomon C, Breckon T (2011) Fundamentals of digital image processing: a practical approach with examples in matlab. Wiley
Spoofing A S V (2019) Countermeasures challenge evaluation plan
Sriskandaraja K, Sethu V, Ambikairajah E (2018). In: Interspeech, pp 671–675
Standard I (2017) Information technology–biometric presentation attack detection–part 3: testing and reporting. International Organization for Standardization: Geneva, Switzerland
Tapkir P A, Kamble M R, Patil H A, Madhavi M (2018). In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp 1019–1023
Todisco M, Delgado H, Evans NWD (2016). In: Odyssey, vol 2016, pp 283–290
Todisco M, Delgado H, Evans N (2017) Constant q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput Speech Lang 45:516–535
Article Google Scholar
Todisco M, Wang X, Vestman V, Sahidullah M, Delgado H, Nautsch A, Yamagishi J, Evans N, Kinnunen T, Lee K A (2019) Asvspoof 2019: future horizons in spoofed and fake audio detection. arXiv:1904.05441
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. Journal of machine learning research 9:11
Google Scholar
Vapnik V N (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10 5:988–99
Article Google Scholar
Wang X, Xiao Y, Zhu X (2017). In: Interspeech, pp 32–36
Wang X, Yamagishi J, Todisco M, Delgado H, Nautsch A, Evans N, Sahidullah M, Vestman V, Kinnunen T, Lee K A et al (2020) Asvspoof 2019: a large-scale public database of synthesized, converted and replayed speech. Comput Speech Lang 64:101114
Article Google Scholar
Wei L, Long Y, Wei H, Li Y (2022) New acoustic features for synthetic and replay spoofing attack detection. Symmetry 14(2):274
Article Google Scholar
Witkowski M, Kacprzak S, Zelasko P, Kowalczyk K, Galka J (2017) Audio replay attack detection using high-frequency features. In: Interspeech, pp 27–31
Wu Z, Li H (2016) On the study of replay and voice conversion attacks to text-dependent speaker verification. Multimed Tools Applic 75 (9):5311–5327
Article Google Scholar
Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Commun 66:130–153
Article Google Scholar
Wu Z, Kinnunen T, Evans N, Yamagishi J, Hanilçi C, Sahidullah M, Sizov A (2015). In: Sixteenth annual conference of the international speech communication association
Yang J, Das R K (2019) Low frequency frame-wise normalization over constant-q transform for playback speech detection. Digit Signal Process 89:30–39
Article MathSciNet Google Scholar
Yang J, Das R K (2020) Long-term high frequency features for synthetic speech detection. Digit Signal Process 97:102622
Article Google Scholar
Yang J, Das R K, Zhou N (2019) Extraction of octave spectra information for spoofing attack detection. IEEE/ACM Trans Audio Speech Lang Process 27(12):2373–2384
Article Google Scholar
Zhang C, Ranjan S, Nandwana M K, Zhang Q, Misra A, Liu G, Kelly F, Hansen JHL (2016). In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 5035–5039
Zhang C, Yu C, Hansen John HL (2017) An investigation of deep-learning frameworks for speaker verification antispoofing. IEEE J Selected Topics Signal Process 11(4):684–694
Article Google Scholar
Zhao Y, Togneri R, Sreeram V (2020) Replay anti-spoofing countermeasure based on data augmentation with post selection. Comput Speech Lang 64:101115
Article Google Scholar

Download references

Acknowledgment

This work is financially supported by the DGRSDT (Direction Générale de la Recherche Scientifique et du Développement Technologique) and CDTA (Centre de Développement des Technologies Avancées) Algeria, as part of the CDTA’s triennial research protocol 2019-2021. (Project code: N004-1/BIOSMC/DTELECOM/CDTA/PT 19-21)

Author information

Authors and Affiliations

Centre de Développement des Technologies Avancées (CDTA), P.O. Box 17 Baba-Hassen, City, 16303, Algiers, Algeria
Fedila Meriem, Bengherabi Messaoud & Yahya-Zoubir Bahia

Authors

Fedila Meriem
View author publications
You can also search for this author in PubMed Google Scholar
Bengherabi Messaoud
View author publications
You can also search for this author in PubMed Google Scholar
Yahya-Zoubir Bahia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fedila Meriem.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Meriem, F., Messaoud, B. & Bahia, YZ. Texture analysis of edge mapped audio spectrogram for spoofing attack detection. Multimed Tools Appl 83, 15915–15937 (2024). https://doi.org/10.1007/s11042-023-15329-6

Download citation

Received: 11 August 2022
Revised: 09 February 2023
Accepted: 06 April 2023
Published: 26 May 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-15329-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Texture analysis of edge mapped audio spectrogram for spoofing attack detection

Abstract

Access this article

Similar content being viewed by others

A Robust and Real-Time Face Anti-spoofing Method Based on Texture Feature Analysis

Physiological-physical feature fusion for automatic voice spoofing detection

Face presentation attack detection using guided scale texture

Change history

30 November 2023

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Texture analysis of edge mapped audio spectrogram for spoofing attack detection

Abstract

Access this article

Similar content being viewed by others

A Robust and Real-Time Face Anti-spoofing Method Based on Texture Feature Analysis

Physiological-physical feature fusion for automatic voice spoofing detection

Face presentation attack detection using guided scale texture

Change history

30 November 2023

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation