Skip to main content
Log in

Mel spectrogram-based audio forgery detection using CNN

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In this time of technology, digital speech can be created and falsified by a very diverse of hardware and software technologies. Audio copy-move forgery is an audio forgery technique that goals to create forged audio by hiding undesirable words or repeating wanted words in identical speech. Therefore, audio authentication has been a necessary requisition. In this study, an effective approach to spectral images based on audio copy-move forgery detection using convolutional neural networks (CNN) with data augmentation is proposed. There are only a few handcrafted methods conducted for the detection of audio copy-move forgery. None of the existing works on audio copy-move forgery detection has proposed deep feature learning from speech recording with Mel spectrogram. This is the first method to employ deep learning with Mel spectrogram of audio for the detection of audio copy-move forgery. The proposed CNN architecture classifies the suspicious Mel spectrogram images into two classes: original and forged. The proposed CNN system is successfully trained on these Mel spectrogram image feature extraction. The proposed algorithm has been tested on our datasets generated from Arabic Speech Corpus and TIMIT speech database. The results show the effectiveness, robustness of post-processing operations, and high accuracy of the proposed approach compared to other studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and materials

Not applicable.

References

  1. Huang, X., Liu, Z., Lu, W., Liu, H., Xiang, S.: Fast and effective copy-move detection of digital audio based on auto segment. In: Digital Forensics and Forensic Investigations: Breakthroughs in Research and Practice, pp. 127–142 (2020)

  2. Wang, F., Li, C., Tian, L.: An algorithm of detecting audio copy-move forgery based on DCT and SVD. In: 2017 IEEE 17th International Conference on Communication Technology (ICCT), pp. 1652–1657 (2017).

  3. Imran, M., Ali, Z., Bakhsh, S.T., Akram, S.: Blind detection of copy-move forgery in digital audio forensics. IEEE Access 5, 12843–12855 (2017)

    Article  Google Scholar 

  4. Yan, Q., Yang, R., Huang, J.: Copy-move detection of audio recording with pitch similarity. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1782–1786 (2015)

  5. Yan, Q., Yang, R., Huang, J.: Robust copy—move detection of speech recording using similarities of pitch and formant. IEEE Trans. Inf. Forensics Secur. 14(9), 2331–2341 (2019)

    Article  Google Scholar 

  6. Pan, X., Zhang, X., Lyu, S.: Detecting splicing in digital audios using local noise level estimation. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1841–1844 (2012)

  7. Chen, J., Xiang, S., Huang, H., Liu, W.: Detecting and locating digital audio forgeries based on singularity analysis with wavelet packet. Multimed. Tools Appl. 75(4), 2303–2325 (2016)

    Article  Google Scholar 

  8. Gupta, V., Boulianne, G., Cardinal, P.: Content-based audio copy detection using nearest-neighbor mapping. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 261–264 (2010)

  9. Muhammad, G., Alotaibi, Y.A., Alsulaiman, M., Huda, M.N.: Environment recognition using selected MPEG-7 audio features and mel-frequency cepstral coefficients. In: 2010 Fifth International Conference on Digital Telecommunications, pp. 11–16 (2010)

  10. Zhao, H., Chen, Y., Wang, R., Malik, H.: Audio splicing detection and localization using environmental signature. Multimed. Tools Appl. 76(12), 13897–13927 (2017)

    Article  Google Scholar 

  11. Cuccovillo, L., Mann, S., Tagliasacchi, M., Aichroth, P.: Audio tampering detection via microphone classification. In: 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP), pp. 177–182 (2013)

  12. Buchholz, R., Kraetzer, C., Dittmann, J.: Microphone classification using Fourier coefficients. In: International Workshop on Information Hiding, pp. 235–246 (2009)

  13. Luo, D., Yang, R., Li, B., Huang, J.: Detection of double compressed AMR audio using stacked autoencoder. IEEE Trans. Inf. Forensics Secur. 12(2), 432–444 (2016)

    Article  Google Scholar 

  14. Lin, X., Kang, X.: Supervised audio tampering detection using an autoregressive model. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2142–2146 (2017)

  15. Zhang, Y., Dai, S., Song, W., Zhang, L., Li, D.: Exposing speech resampling manipulation by local texture analysis on spectrogram images. Electronics 9(1), 23 (2019)

    Article  Google Scholar 

  16. Capoferri, D., Borrelli, C., Bestagini, P., Antonacci, F., Sarti, A., Tubaro, S.: Speech audio splicing detection and localization exploiting reverberation cues. In: 2020 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–6 (2020)

  17. Jadhav, S., Patole, R., & Rege, P.: Audio splicing detection using convolutional neural network. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5 (2020)

  18. Xiao, J.N., Jia, Y.Z., Fu, E.D., Huang, Z., Li, Y., Shi, S.P.: Audio authenticity: duplicated audio segment detection in waveform audio file. J. Shanghai Jiaotong Univ. (Science) 19(4), 392–397 (2014)

    Article  Google Scholar 

  19. Xie, Z., Lu, W., Liu, X., Xue, Y., Yeung, Y.: Copy-move detection of digital audio based on multi-feature decision. J. Inf. Secur. Appl. 43, 37–46 (2018)

    Google Scholar 

  20. Zahorian, S.A., Hu, H.: A spectral/temporal method for robust fundamental frequency tracking. J. Acoust. Soc. Am. 123(6), 4559–4571 (2008)

    Article  Google Scholar 

  21. Todisco, M., Delgado, H., Evans, N.W.: A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients. In: Odyssey, vol. 2016, pp. 283–290 (2016)

  22. Alluri, K.R., Achanta, S., Kadiri, S.R., Gangashetty, S.V., Vuppala, A.K.: Detection of replay attacks using single frequency filtering cepstral coefficients. In: Interspeech, pp. 2596–2600 (2017)

  23. Das, R.K., Yang, J., Li, H.: Long range acoustic features for spoofed speech detection. In: Interspeech, pp 1058–1062 (2019)

  24. Kumar, M.G., Kumar, S.R., Saranya, M.S., Bharathi, B., Murthy, H.A.: Spoof detection using time-delay shallow neural network and feature switching. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1011–1017 (2019)

  25. Chettri, B., Stoller, D., Morfi, V., Ramírez, M.A.M., Benetos, E., Sturm, B.L.: Ensemble models for spoofing detection in automatic speaker verification (2019). arXiv preprint arXiv:1904.04589

  26. Chen, T., Kumar, A., Nagarsheth, P., Sivaraman, G., Khoury, E.: Generalization of audio deepfake detection. In: Odyssey, pp. 132–137 (2020)

  27. Zhang, Y., Jiang, F., Duan, Z.: One-class learning towards synthetic voice spoofing detection. IEEE Signal Process. Lett. 28, 937–941 (2021)

    Article  Google Scholar 

  28. http://en.arabicspeechcorpus.com/

  29. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon technical report n, 93, 27403 (1993)

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) with Project No: 122E013.

Author information

Authors and Affiliations

Authors

Contributions

AU contributed to conceptualization, methodology, and software. BU contributed to data curation, writing—original draft preparation, and software. GU contributed to visualization, investigation, and methodology.

Corresponding author

Correspondence to Beste Ustubioglu.

Ethics declarations

Competing interests

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

The authors declare that this article is original, has not been published before, and is not currently being considered for publication elsewhere. The authors confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. The authors further confirm that the order of authors listed in the manuscript has been approved by all of them.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ustubioglu, A., Ustubioglu, B. & Ulutas, G. Mel spectrogram-based audio forgery detection using CNN. SIViP 17, 2211–2219 (2023). https://doi.org/10.1007/s11760-022-02436-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-022-02436-4

Keywords

Navigation