Mel spectrogram-based audio forgery detection using CNN

Ustubioglu, Arda; Ustubioglu, Beste; Ulutas, Guzin

doi:10.1007/s11760-022-02436-4

Mel spectrogram-based audio forgery detection using CNN

Original Paper
Published: 19 December 2022

Volume 17, pages 2211–2219, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Arda Ustubioglu²,
Beste Ustubioglu¹ &
Guzin Ulutas¹

8 Citations
2 Altmetric
Explore all metrics

Abstract

In this time of technology, digital speech can be created and falsified by a very diverse of hardware and software technologies. Audio copy-move forgery is an audio forgery technique that goals to create forged audio by hiding undesirable words or repeating wanted words in identical speech. Therefore, audio authentication has been a necessary requisition. In this study, an effective approach to spectral images based on audio copy-move forgery detection using convolutional neural networks (CNN) with data augmentation is proposed. There are only a few handcrafted methods conducted for the detection of audio copy-move forgery. None of the existing works on audio copy-move forgery detection has proposed deep feature learning from speech recording with Mel spectrogram. This is the first method to employ deep learning with Mel spectrogram of audio for the detection of audio copy-move forgery. The proposed CNN architecture classifies the suspicious Mel spectrogram images into two classes: original and forged. The proposed CNN system is successfully trained on these Mel spectrogram image feature extraction. The proposed algorithm has been tested on our datasets generated from Arabic Speech Corpus and TIMIT speech database. The results show the effectiveness, robustness of post-processing operations, and high accuracy of the proposed approach compared to other studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

Detecting audio copy-move forgery with an artificial neural network

Article 11 January 2024

Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection

Article 29 March 2024

Availability of data and materials

Not applicable.

References

Huang, X., Liu, Z., Lu, W., Liu, H., Xiang, S.: Fast and effective copy-move detection of digital audio based on auto segment. In: Digital Forensics and Forensic Investigations: Breakthroughs in Research and Practice, pp. 127–142 (2020)
Wang, F., Li, C., Tian, L.: An algorithm of detecting audio copy-move forgery based on DCT and SVD. In: 2017 IEEE 17th International Conference on Communication Technology (ICCT), pp. 1652–1657 (2017).
Imran, M., Ali, Z., Bakhsh, S.T., Akram, S.: Blind detection of copy-move forgery in digital audio forensics. IEEE Access 5, 12843–12855 (2017)
Article Google Scholar
Yan, Q., Yang, R., Huang, J.: Copy-move detection of audio recording with pitch similarity. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1782–1786 (2015)
Yan, Q., Yang, R., Huang, J.: Robust copy—move detection of speech recording using similarities of pitch and formant. IEEE Trans. Inf. Forensics Secur. 14(9), 2331–2341 (2019)
Article Google Scholar
Pan, X., Zhang, X., Lyu, S.: Detecting splicing in digital audios using local noise level estimation. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1841–1844 (2012)
Chen, J., Xiang, S., Huang, H., Liu, W.: Detecting and locating digital audio forgeries based on singularity analysis with wavelet packet. Multimed. Tools Appl. 75(4), 2303–2325 (2016)
Article Google Scholar
Gupta, V., Boulianne, G., Cardinal, P.: Content-based audio copy detection using nearest-neighbor mapping. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 261–264 (2010)
Muhammad, G., Alotaibi, Y.A., Alsulaiman, M., Huda, M.N.: Environment recognition using selected MPEG-7 audio features and mel-frequency cepstral coefficients. In: 2010 Fifth International Conference on Digital Telecommunications, pp. 11–16 (2010)
Zhao, H., Chen, Y., Wang, R., Malik, H.: Audio splicing detection and localization using environmental signature. Multimed. Tools Appl. 76(12), 13897–13927 (2017)
Article Google Scholar
Cuccovillo, L., Mann, S., Tagliasacchi, M., Aichroth, P.: Audio tampering detection via microphone classification. In: 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP), pp. 177–182 (2013)
Buchholz, R., Kraetzer, C., Dittmann, J.: Microphone classification using Fourier coefficients. In: International Workshop on Information Hiding, pp. 235–246 (2009)
Luo, D., Yang, R., Li, B., Huang, J.: Detection of double compressed AMR audio using stacked autoencoder. IEEE Trans. Inf. Forensics Secur. 12(2), 432–444 (2016)
Article Google Scholar
Lin, X., Kang, X.: Supervised audio tampering detection using an autoregressive model. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2142–2146 (2017)
Zhang, Y., Dai, S., Song, W., Zhang, L., Li, D.: Exposing speech resampling manipulation by local texture analysis on spectrogram images. Electronics 9(1), 23 (2019)
Article Google Scholar
Capoferri, D., Borrelli, C., Bestagini, P., Antonacci, F., Sarti, A., Tubaro, S.: Speech audio splicing detection and localization exploiting reverberation cues. In: 2020 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–6 (2020)
Jadhav, S., Patole, R., & Rege, P.: Audio splicing detection using convolutional neural network. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5 (2020)
Xiao, J.N., Jia, Y.Z., Fu, E.D., Huang, Z., Li, Y., Shi, S.P.: Audio authenticity: duplicated audio segment detection in waveform audio file. J. Shanghai Jiaotong Univ. (Science) 19(4), 392–397 (2014)
Article Google Scholar
Xie, Z., Lu, W., Liu, X., Xue, Y., Yeung, Y.: Copy-move detection of digital audio based on multi-feature decision. J. Inf. Secur. Appl. 43, 37–46 (2018)
Google Scholar
Zahorian, S.A., Hu, H.: A spectral/temporal method for robust fundamental frequency tracking. J. Acoust. Soc. Am. 123(6), 4559–4571 (2008)
Article Google Scholar
Todisco, M., Delgado, H., Evans, N.W.: A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients. In: Odyssey, vol. 2016, pp. 283–290 (2016)
Alluri, K.R., Achanta, S., Kadiri, S.R., Gangashetty, S.V., Vuppala, A.K.: Detection of replay attacks using single frequency filtering cepstral coefficients. In: Interspeech, pp. 2596–2600 (2017)
Das, R.K., Yang, J., Li, H.: Long range acoustic features for spoofed speech detection. In: Interspeech, pp 1058–1062 (2019)
Kumar, M.G., Kumar, S.R., Saranya, M.S., Bharathi, B., Murthy, H.A.: Spoof detection using time-delay shallow neural network and feature switching. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1011–1017 (2019)
Chettri, B., Stoller, D., Morfi, V., Ramírez, M.A.M., Benetos, E., Sturm, B.L.: Ensemble models for spoofing detection in automatic speaker verification (2019). arXiv preprint arXiv:1904.04589
Chen, T., Kumar, A., Nagarsheth, P., Sivaraman, G., Khoury, E.: Generalization of audio deepfake detection. In: Odyssey, pp. 132–137 (2020)
Zhang, Y., Jiang, F., Duan, Z.: One-class learning towards synthetic voice spoofing detection. IEEE Signal Process. Lett. 28, 937–941 (2021)
Article Google Scholar
http://en.arabicspeechcorpus.com/
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon technical report n, 93, 27403 (1993)

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) with Project No: 122E013.

Author information

Authors and Affiliations

Department of Computer Engineering, Karadeniz Technical University, 61080, Trabzon, Turkey
Beste Ustubioglu & Guzin Ulutas
Department of Management Information Systems, Trabzon University, Trabzon, Turkey
Arda Ustubioglu

Authors

Arda Ustubioglu
View author publications
You can also search for this author in PubMed Google Scholar
Beste Ustubioglu
View author publications
You can also search for this author in PubMed Google Scholar
Guzin Ulutas
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AU contributed to conceptualization, methodology, and software. BU contributed to data curation, writing—original draft preparation, and software. GU contributed to visualization, investigation, and methodology.

Corresponding author

Correspondence to Beste Ustubioglu.

Ethics declarations

Competing interests

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

The authors declare that this article is original, has not been published before, and is not currently being considered for publication elsewhere. The authors confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. The authors further confirm that the order of authors listed in the manuscript has been approved by all of them.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ustubioglu, A., Ustubioglu, B. & Ulutas, G. Mel spectrogram-based audio forgery detection using CNN. SIViP 17, 2211–2219 (2023). https://doi.org/10.1007/s11760-022-02436-4

Download citation

Received: 05 July 2022
Revised: 23 October 2022
Accepted: 04 December 2022
Published: 19 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11760-022-02436-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mel spectrogram-based audio forgery detection using CNN

Abstract

Access this article

Similar content being viewed by others

A Deep Learning Framework for Audio Deepfake Detection

Detecting audio copy-move forgery with an artificial neural network

Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Consent for publication

Ethics approval and consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mel spectrogram-based audio forgery detection using CNN

Abstract

Access this article

Similar content being viewed by others

A Deep Learning Framework for Audio Deepfake Detection

Detecting audio copy-move forgery with an artificial neural network

Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Consent for publication

Ethics approval and consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation