Skip to main content
Log in

Texture analysis of edge mapped audio spectrogram for spoofing attack detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

This article has been updated

Abstract

The vulnerability of the Automatic Speaker verification technology to different presentation attacks has drawn much interest to design anti-spoofing countermeasures. Following their success in face anti-spoofing, some techniques based on texture analysis were cross-pollinated recently to the voice anti-spoofing domain with very promising results. In this same research direction and motivated by the fact that the spectrogram image local gradient amplitude is sensitive to image distortions presented by spoof attacks, a novel voice anti-spoofing countermeasure based on texture analysis of spectrogram image is devised in this paper. The main novelty of the proposed approach resides in the use of edge detection techniques applied to the spectrogram image. The recorded speech was initially converted to a 2-D image. Then, the spectrogram is mapped by the use of the edge detection method. Finally, different texture descriptors are used to extract the visual features from the original and the mapped spectrograms combination. Experimental results on the ASVspoof 2017 v2.0 dataset show that the proposed countermeasure clearly outperforms some well-known techniques, providing an Equal-Error Rate (EER) as low as 3.52% and 22.86% for the development and evaluation sets, respectively. For the ASVspoof 2019 dataset, in turn, the proposed method achieves up to 27.62% and 29.15% relative EER improvement over the baseline CQCC feature in the Logical and Physical Access scenarios, respectively. In addition, the results show that the proposed hand-crafted features-based approach with an SVM classifier can deliver superior performance than more sophisticated solutions that rely upon the complex neural network architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Change history

  • 30 November 2023

    Author name "Yahya-zoubir" has been change to "Yahya-Zoubir".

References

  1. Adiban M, Sameti H, Shehnepoor S (2020) Replay spoofing countermeasure using autoencoder and siamese networks on asvspoof 2019 challenge. Comput Speech Lang 64:101105

    Article  Google Scholar 

  2. Al-Karawi K A, Mohammed D Y (2021) Improving short utterance speaker verification by combining mfcc and entrocy in noisy conditions. Multimed Tools Applic 80(14):22231–22249

    Article  Google Scholar 

  3. Alegre F, Amehraye A, Evans N (2013). In: 2013 IEEE Sixth international conference on Biometrics: Theory, Applications and Systems (BTAS). IEEE, pp 1–8

  4. Alegre F, Vipperla R, Amehraye A, Evans N (2013). In: INTERSPEECH 2013, 14th annual conference of the international speech communication association, Lyon: France (2013), p 5p

  5. Alzantot M, Wang Z, Srivastava M B (2019) Deep residual neural networks for audio spoofing detection. arXiv:1907.00501

  6. Avila A R, Alam J, Prado F O C, O’Shaughnessy D, Falk T H (2021) On the use of blind channel response estimation and a residual neural network to detect physical access attacks to speaker verification systems. Comput Speech Lang 66:101163

    Article  Google Scholar 

  7. Bakar B, Hanilçi C (2018). In: 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, pp 132–138

  8. Bharath KP, Kumar M R (2022) Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features. Multimedia Tools and Applications, 1–24

  9. Biagio M S, Crocco M, Cristani M, Martelli S, Murino V (2013). In: Proceedings of the IEEE international conference on computer vision, pp 809–816

  10. Boashash B (2015) Time-frequency signal analysis and processing: a comprehensive reference. Academic Press

  11. Brown J C (1991) Calculation of a constant q spectral transform. J Acoust Soc Amer 89(1):425–434

    Article  Google Scholar 

  12. Brown J C, Puckette M S (1992) An efficient algorithm for the calculation of a constant q transform. J Acoust Soc Amer 92(5):2698–2701

    Article  Google Scholar 

  13. Bruna J, Mallat S (2013) Invariant scattering convolution networks. IEEE Trans Pattern Anal Mach Intell 35(8):1872–1886

    Article  Google Scholar 

  14. Burger W, Burge M J, Burge M J, Burge M J (2009) Principles of digital image processing, vol 111. Springer

  15. Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/%7Ecjlin/libsvm

    Article  Google Scholar 

  16. Chang S-Y, Wu K-C, Chen C-P (2019). In: INTERSPEECH, pp 1063–1067

  17. Chen Z, Zhang W, Xie Z, Xu X, Chen D (2018). In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2052–2056

  18. Cheng X, Xu M, Zheng T F (2019). In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp 540–545

  19. Chettri B, Mishra S, Sturm B L, Benetos E (2018) A study on convolutional neural network based end-to-end replay anti-spoofing. arXiv:1805.09164

  20. Chettri B, Kinnunen T, Benetos E (2020) Deep generative variational autoencoding for replay spoof detection in automatic speaker verification. Comput Speech Lang 63:101092

    Article  Google Scholar 

  21. Dehak N, Dehak R, Glass J R, Reynolds D A, Kenny P et al (2010). In: Odyssey, p 15

  22. Delgado H, Todisco M, Sahidullah M, Evans N, Kinnunen T, Lee K A, Yamagishi J (2018). In: Odyssey 2018-the speaker and language recognition workshop

  23. Evans NWD, Kinnunen T, Yamagishi J (2013) Spoofing and countermeasures for automatic speaker verification. In: Interspeech, pp 925–929

  24. Fedila M, Amrouche A (2012). In: 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA). IEEE, pp 1034–1038

  25. Fedila M, Amrouche A (2012). In: 2012 24th International Conference on Microelectronics (ICM). IEEE, pp 1–4

  26. Fedila M, Bengherabi M, Amrouche A (2018) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimed Tools Applic 77(13):16721–16739

    Article  Google Scholar 

  27. Fitzgerald D (2010). In: Proceedings of the International Conference on Digital Audio Effects (DAFx), vol 13, pp 1–4

  28. Gonzalez-Soler L J, Patino J, Gomez-Barrero M, Todisco M, Busch C, Evans N (2020). In: 2020 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, pp 1–6

  29. Gupta P, Chodingala P K, Patil H A (2022) Replay spoof detection using energy separation based instantaneous frequency estimation from quadrature and in-phase components. Computer Speech & Language, 101423

  30. Hanilçi C (2018) Data selection for i-vector based automatic speaker verification anti-spoofing. Digital Signal Process 72:171–180

    Article  Google Scholar 

  31. Jung J-, Shim H-, Heo H-S, Yu H-J (2019) Replay attack detection with complementary high-resolution information using end-to-end dnn for the asvspoof 2019 challenge. arXiv:1904.10134

  32. Kamble M R, Patil H A (2021) Detection of replay spoof speech using teager energy feature cues. Comput Speech Lang 65:101140

    Article  Google Scholar 

  33. Kanagasundaram A, Dean D, Sridharan S, McLaren M, Vogt R (2014) I-vector based speaker recognition using advanced channel compensation techniques. Comput Speech Lang 28(1):121–140

    Article  Google Scholar 

  34. Kannala J, Rahtu E (2012). In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 1363–1366

  35. Khoury E, Kinnunen T, Sizov A, Wu Z, Marcel S (2014). In: Fifteenth annual conference of the international speech communication association. Citeseer

  36. Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40

    Article  Google Scholar 

  37. Kinnunen T, Sahidullah M, Falcone M, Costantini L, Hautamäki R G, Thomsen D, Sarkar A, Tan Z-H, Delgado H, Todisco M et al (2017). In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 5395–5399

  38. Kinnunen T, Lee K A, Delgado H, Evans N, Todisco M, Sahidullah M, Yamagishi J, Reynolds D A (2018) t-dcf: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. arXiv:1804.09618

  39. Krobba A, Debyeche M, Selouani S-A (2020) Mixture linear prediction gammatone cepstral features for robust speaker verification under transmission channel noise. Multimed Tools Applic 79(25):18679–18693

    Article  Google Scholar 

  40. Lai C-I, Abad A, Richmond K, Yamagishi J, Dehak N, King S (2019). In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 6316–6320

  41. Lai C-I, Chen N, Villalba J, Dehak N (2019) Assert: anti-spoofing with squeeze-excitation and residual networks. arXiv:1904.01120

  42. Lavrentyeva G, Novoselov S, Malykh E, Kozlov A, Kudashev O, Shchemelinin V (2017). In: Interspeech, pp 82–86

  43. Li J, Wang H, He P, Abdullahi S M, Li B (2022) Long-term variable q transform: a novel time-frequency transform algorithm for synthetic speech detection. Digit Signal Process 120:103256

    Article  Google Scholar 

  44. Li R, Zhao M, Li Z, Li L, Hong Q (2019). In: Interspeech, pp 1048–1052

  45. Liu Y, Tian Y, He L, Liu J, Johnson M T (2015). In: Sixteenth annual conference of the international speech communication association

  46. Liu M, Wang L, Dang J, Lee K A, Nakagawa S (2021) Replay attack detection using variable-frequency resolution phase and magnitude features. Comput Speech Lang 66:101161

    Article  Google Scholar 

  47. Maini R, Aggarwal H (2009) Study and comparison of various image edge detection techniques. International Journal of Image Processing (IJIP) 3 (1):1–11

    Google Scholar 

  48. Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The det curve in assessment of detection task performance. Tech. rep., National Inst of Standards and Technology Gaithersburg MD

  49. Monteiro J, Alam J, Falk T H (2020) Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers. Comput Speech Lang 63:101096

    Article  Google Scholar 

  50. Nagarsheth P, Khoury E, Patil K, Garland M (2017). In: Interspeech, pp 97–101

  51. Ojansivu V, Heikkilä J (2008). In: International conference on image and signal processing. Springer, pp 236–243

  52. Patel T B, Patil H A (2015). In: Sixteenth annual conference of the international speech communication association

  53. Patil A T, Acharya R, Sai P K A, Patil H A (2019). In: Interspeech, pp 2898–2902

  54. Patil A T, Acharya R, Patil H A, Guido R C (2022) . Comput Speech Lang 72:101281

    Article  Google Scholar 

  55. Rahmeni R, Aicha A B, Ayed Y B (2022) Voice spoofing detection based on acoustic and glottal flow features using conventional machine learning techniques. Multimedia Tools and Applications, 1–25

  56. Rakotomamonjy A, Gasso G (2014) Histogram of gradients of time–frequency representations for audio scene classification. IEEE/ACM Trans Audio Speech Lang Process 23(1):142–153

    Google Scholar 

  57. Reynolds D A (1995) Speaker identification and verification using gaussian mixture speaker models. Speech Commun 17(1-2):91–108

    Article  Google Scholar 

  58. Rose P (2006) Technical forensic speaker recognition: Evaluation, types and testing of evidence. Comput Speech Lang 20(2-3):159–191

    Article  Google Scholar 

  59. Schluter R, Bezrukov I, Wagner H, Ney H (2007). In: 2007 IEEE International conference on acoustics, speech and signal processing-ICASSP’07, vol 4. IEEE, pp IV–649

  60. Schörkhuber C, Klapuri A, Holighaus N, Dörfler M (2014). In: Audio engineering society conference: 53rd international conference: semantic audio. Audio Engineering Society

  61. Singh M, Pati D (2019) Usefulness of linear prediction residual for replay attack detection. AEU-Int Journal of Electronics and Communications 110:152837

    Article  Google Scholar 

  62. Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S (2018). In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 5329–5333

  63. Solomon C, Breckon T (2011) Fundamentals of digital image processing: a practical approach with examples in matlab. Wiley

  64. Spoofing A S V (2019) Countermeasures challenge evaluation plan

  65. Sriskandaraja K, Sethu V, Ambikairajah E (2018). In: Interspeech, pp 671–675

  66. Standard I (2017) Information technology–biometric presentation attack detection–part 3: testing and reporting. International Organization for Standardization: Geneva, Switzerland

  67. Tapkir P A, Kamble M R, Patil H A, Madhavi M (2018). In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp 1019–1023

  68. Todisco M, Delgado H, Evans NWD (2016). In: Odyssey, vol 2016, pp 283–290

  69. Todisco M, Delgado H, Evans N (2017) Constant q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput Speech Lang 45:516–535

    Article  Google Scholar 

  70. Todisco M, Wang X, Vestman V, Sahidullah M, Delgado H, Nautsch A, Yamagishi J, Evans N, Kinnunen T, Lee K A (2019) Asvspoof 2019: future horizons in spoofed and fake audio detection. arXiv:1904.05441

  71. Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. Journal of machine learning research 9:11

    Google Scholar 

  72. Vapnik V N (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10 5:988–99

    Article  Google Scholar 

  73. Wang X, Xiao Y, Zhu X (2017). In: Interspeech, pp 32–36

  74. Wang X, Yamagishi J, Todisco M, Delgado H, Nautsch A, Evans N, Sahidullah M, Vestman V, Kinnunen T, Lee K A et al (2020) Asvspoof 2019: a large-scale public database of synthesized, converted and replayed speech. Comput Speech Lang 64:101114

    Article  Google Scholar 

  75. Wei L, Long Y, Wei H, Li Y (2022) New acoustic features for synthetic and replay spoofing attack detection. Symmetry 14(2):274

    Article  Google Scholar 

  76. Witkowski M, Kacprzak S, Zelasko P, Kowalczyk K, Galka J (2017) Audio replay attack detection using high-frequency features. In: Interspeech, pp 27–31

  77. Wu Z, Li H (2016) On the study of replay and voice conversion attacks to text-dependent speaker verification. Multimed Tools Applic 75 (9):5311–5327

    Article  Google Scholar 

  78. Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Commun 66:130–153

    Article  Google Scholar 

  79. Wu Z, Kinnunen T, Evans N, Yamagishi J, Hanilçi C, Sahidullah M, Sizov A (2015). In: Sixteenth annual conference of the international speech communication association

  80. Yang J, Das R K (2019) Low frequency frame-wise normalization over constant-q transform for playback speech detection. Digit Signal Process 89:30–39

    Article  MathSciNet  Google Scholar 

  81. Yang J, Das R K (2020) Long-term high frequency features for synthetic speech detection. Digit Signal Process 97:102622

    Article  Google Scholar 

  82. Yang J, Das R K, Zhou N (2019) Extraction of octave spectra information for spoofing attack detection. IEEE/ACM Trans Audio Speech Lang Process 27(12):2373–2384

    Article  Google Scholar 

  83. Zhang C, Ranjan S, Nandwana M K, Zhang Q, Misra A, Liu G, Kelly F, Hansen JHL (2016). In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 5035–5039

  84. Zhang C, Yu C, Hansen John HL (2017) An investigation of deep-learning frameworks for speaker verification antispoofing. IEEE J Selected Topics Signal Process 11(4):684–694

    Article  Google Scholar 

  85. Zhao Y, Togneri R, Sreeram V (2020) Replay anti-spoofing countermeasure based on data augmentation with post selection. Comput Speech Lang 64:101115

    Article  Google Scholar 

Download references

Acknowledgment

This work is financially supported by the DGRSDT (Direction Générale de la Recherche Scientifique et du Développement Technologique) and CDTA (Centre de Développement des Technologies Avancées) Algeria, as part of the CDTA’s triennial research protocol 2019-2021. (Project code: N004-1/BIOSMC/DTELECOM/CDTA/PT 19-21)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fedila Meriem.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meriem, F., Messaoud, B. & Bahia, YZ. Texture analysis of edge mapped audio spectrogram for spoofing attack detection. Multimed Tools Appl 83, 15915–15937 (2024). https://doi.org/10.1007/s11042-023-15329-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15329-6

Keywords

Navigation