Abstract
Given how easily audio data can be obtained, audio recordings are subject to both malicious and unmalicious tampering and manipulation that can compromise the integrity and reliability of audio data. Because audio recordings can be used in many strategic areas, detecting such tampering and manipulation of audio data is critical. Although the literature demonstrates the lack of any accurate, integrated system for detecting copy-move forgery, the field shows great promise for research. Thus, our proposed method seeks to support the detection of the passive technique of audio copy-move forgery. For our study, forgery audio data were obtained from the TIMIT dataset, and 4378 audio recordings were used: 2189 of original audio and 2189 of audio created by copy-move forgery. After the voiced and unvoiced regions in the audio signal were determined by the yet another algorithm for pitch tracking, the features were obtained from the signals using Mel frequency cepstrum coefficients (MFCCs), delta (Δ) MFCCs, and ΔΔMFCCs coefficients together, along with linear prediction coefficients (LPCs). In turn, those features were classified using artificial neural networks. Our experimental results demonstrate that the best results were 75.34% detection with the MFCC method, 73.97% detection with the ΔMFCC method, 72.37% detection with the ΔΔMFCC method, 76.48% detection with the MFCC + ΔMFCC + ΔΔMFCC method, and 74.77% detection with the LPC method. Using the MFCC + ΔMFCC + ΔΔMFCC method, in which the features are used together, we determined that the models give far superior results even with relatively few epochs. The proposed method is also more robust than other methods in the literature because it does not use threshold values.
Similar content being viewed by others
Data availability
Data will be made available on reasonable request.
References
Khan, M.K., Zakariah, M., Malik, H., Choo, K.K.R.: A novel audio forensic data-set for digital multimedia forensics. Aust. J. Forensic Sci. 50(5), 525–542 (2018). https://doi.org/10.1080/00450618.2017.1296186
Bourouis, S., Alroobaea, R., Alharbi, A.M., Andejany, M., Rubaiee, S.: Recent advances in digital multimedia tampering detection for forensics analysis. Symmetry 12(11), 1811 (2020). https://doi.org/10.3390/sym12111811
Sunitha, K., Krishna, A.N., Prasad, B.G.: Copy-move tampering detection using keypoint based hybrid feature extraction and improved transformation model. Appl. Intell. 52(13), 15405–15416 (2022)
Patel, R., Lad, K., Patel, M.: Study and investigation of video steganography over uncompressed and compressed domain: a comprehensive review. Multimedia Syst. 27(5), 985–1024 (2021)
Kasapoğlu, B., Turgay, K.O.Ç.: Sentetik ve Dönüştürülmüş Konuşmaların Tespitinde Genlik ve Faz Tabanlı Spektral Özniteliklerin Kullanılması. Avrupa Bilim ve Teknoloji Dergisi, pp. 398–406. (2020). https://doi.org/10.31590/ejosat.780650
Javed, A., Malik, K.M., Irtaza, A., Malik, H.: Towards protecting cyber-physical and IoT systems from single-and multi-order voice spoofing attacks. Appl. Acoust. 183, 108283 (2021). https://doi.org/10.1016/j.apacoust.2021.108283
Yan, Q., Yang, R., Huang, J.: Copy-move detection of audio recording with pitch similarity. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1782–1786). IEEE (2015)
Imran, M., Ali, Z., Bakhsh, S.T., Akram, S.: Blind detection of copy-move forgery in digital audio forensics. IEEE Access 5, 12843–12855 (2017). https://doi.org/10.1109/ACCESS.2017.2717842
Wang, Z., Yang, Y., Zeng, C., Kong, S., Feng, S., Zhao, N.: Shallow and deep feature fusion for digital audio tampering detection. EURASIP J. Adv. Signal Process. 2022(1), 1–20 (2022)
Maher, R.C.: Audio forensic examination. IEEE Signal Process. Mag. 26(2), 84–94 (2009). https://doi.org/10.1109/MSP.2008.931080
Wang, F., Li, C., Tian, L.: An algorithm of detecting audio copy-move forgery based on DCT and SVD. In: 2017 IEEE 17th International Conference on Communication Technology (ICCT) (pp. 1652–1657). IEEE (2017)
Jadhav, S., Patole, R., Rege, P.: Audio splicing detection using convolutional neural network. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1–5). IEEE (2019)
Chen, J., Xiang, S., Huang, H., Liu, W.: Detecting and locating digital audio forgeries based on singularity analysis with wavelet packet. Multimedia Tools Appl. 75(4), 2303–2325 (2016). https://doi.org/10.1007/s11042-014-2406-3
Yang, R., Qu, Z., Huang, J.: Detecting digital audio forgeries by checking frame offsets. In: Proceedings of the 10th ACM Workshop on Multimedia and Security (pp. 21–26) (2008)
Gupta, S., Cho, S., Kuo, C.C.J.: Current developments and future trends in audio authentication. IEEE Multimedia 19(1), 50–59 (2011). https://doi.org/10.1109/MMUL.2011.74
Yan, Q., Yang, R., Huang, J.: Robust copy-move detection of speech recording using similarities of pitch and formant. IEEE Trans. Inform. Forensics Secur. 14(9), 2331–2341 (2019). https://doi.org/10.1109/TIFS.2019.2895965
Liu, Z., Lu, W.: Fast copy-move detection of digital audio. In: 2017 IEEE Second international conference on data science in cyberspace (DSC) (pp. 625–629). IEEE (2017)
Ali, Z., Imran, M., Alsulaiman, M.: An automatic digital audio authentication/forensics system. IEEE Access 5, 2994–3007 (2017). https://doi.org/10.1109/ACCESS.2017.2672681
Bevinamarad, P.R., Shirldonkar, M.S.: Audio forgery detection techniques: present and past review. In: 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184) (pp. 613–618). IEEE (2020)
Li, C., Sun, Y., Meng, X., Tian, L.: Homologous audio copy-move tampering detection method based on pitch. In: 2019 IEEE 19th International Conference on Communication Technology (ICCT) (pp. 530–534). IEEE (2019)
Xie, Z., Lu, W., Liu, X., Xue, Y., Yeung, Y.: Copy-move detection of digital audio based on multi-feature decision. J. Inform. Secur. Appl. 43, 37–46 (2018). https://doi.org/10.1016/j.jisa.2018.10.003
Akdeniz, F., Becerikli, Y.: Detection of copy-move forgery in audio signal with mel frequency and delta-mel frequency kepstrum coefficients. In: 2021 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1–6). IEEE (2021)
Akdeniz, F., Becerikli, Y.: Linear prediction coefficients based copy-move forgery detection in audio signal. In: 2022 International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) (pp. 770–773). IEEE (2022)
Su, Z., Li, M., Zhang, G., Wu, Q., Wang, Y.: Robust audio copy-move forgery detection on short forged slices using sliding window. J. Inform. Secur. Appl. 75, 103507 (2023)
Huang, X., Liu, Z., Lu, W., Liu, H., Xiang, S.: Fast and effective copy-move detection of digital audio based on auto segment. In: Digital Forensics and Forensic Investigations: Breakthroughs in Research and Practice (pp. 127–142). IGI Global (2020). https://doi.org/10.4018/978-1-7998-3025-2.ch011
Xiao, J.N., Jia, Y.Z., Fu, E.D., Huang, Z., Li, Y., Shi, S.P.: Audio authenticity: duplicated audio segment detection in waveform audio file. J. Shanghai Jiaotong Univ. (Sci.) 19(4), 392–397 (2014). https://doi.org/10.1007/s12204-014-1515-5
Kadiri, S.R., Yegnanarayana, B.: Estimation of fundamental frequency from singing voice using harmonics of impulse-like excitation source. In: Interspeech (pp. 2319–2323) (2018)
Zahorian, S.A., Hu, H.: A spectral/temporal method for robust fundamental frequency tracking. J. Acoust. Soc. Am. 123(6), 4559–4571 (2008). https://doi.org/10.1121/1.2916590
Kasi, K.: Yet another algorithm for pitch tracking: (YAAPT) (Doctoral dissertation, Old Dominion University) (2002)
Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083. (2010). https://doi.org/10.48550/arXiv.1003.4083
Ancilin, J., Milton, A.: Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)
Hasan, M.R., Jamil, M., Rahman, M.G.R.M.S.: Speaker identification using mel frequency cepstral coefficients. Variations 1(4), 565–568 (2004)
Das, P.P., Allayear, S.M., Amin, R., Rahman, Z.: Bangladeshi dialect recognition using Mel frequency cepstral coefficient, delta, delta-delta and Gaussian mixture model. In: 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI) (pp. 359–364). IEEE (2016)
Hossan, M.A., Memon, S., Gregory, M.A.: A novel approach for MFCC feature extraction. In: 2010 4th International Conference on Signal Processing and Communication Systems (pp. 1–5). IEEE (2010)
Abo-Zahhad, M., Farrag, M., Abbas, S.N., Ahmed, S.M.: A comparative approach between cepstral features for human authentication using heart sounds. SIViP 10(5), 843–851 (2016)
YÜCESOY, E.: MFKK Özniteliklerine Eklenen Logaritmik Enerji ve Delta Parametrelerinin Yaş ve Cinsiyet Sınıflandırma Üzerindeki Etkileri. J. Ins. Sci. Technol. 11(1), 32–43 (2021). https://doi.org/10.21597/jist.772804
Akdeniz, F., Kayikcioglu, İ, Kayikcioglu, T.: Classification of cardiac arrhythmias using Zhao-Atlas-Marks time-frequency distribution. Multimedia Tools Appl. 80(20), 30523–30537 (2021). https://doi.org/10.1007/s11042-021-10945-6
Gupta, S., Shukla, R.S., Shukla, R.K.: Weighted Mel frequency cepstral coefficient based feature extraction for automatic assessment of stuttered speech using Bi-directional LSTM. Indian J. Sci. Technol. 14(5), 457–472 (2021). https://doi.org/10.17485/IJST/v14i5.2276
Abeysinghe, A., Fard, M., Jazar, R., Zambetta, F., Davy, J.: Mel frequency cepstral coefficient temporal feature integration for classifying squeak and rattle noise. J. Acoust. Soc. Am. 150(1), 193–201 (2021). https://doi.org/10.1121/10.0005201
Prabakaran, D., Shyamala, R.: A review on performance of voice feature extraction techniques. In: 2019 3rd International Conference on Computing and Communications Technologies (ICCCT) (pp. 221–231). IEEE (2019)
Sharma, G., Umapathy, K., Krishnan, S.: Trends in audio signal feature extraction methods. Appl. Acoust. 158, 107020 (2020)
Valin, J.M., Skoglund, J.: LPCNet: improving neural speech synthesis through linear prediction. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5891–5895). IEEE (2019)
Juvela, L., Tsiaras, V., Bollepalli, B., Airaksinen, M., Yamagishi, J., Alku, P.: Speaker-independent raw waveform model for glottal excitation. arXiv preprint arXiv:1804.09593 (2018)
Siniscalchi, S.M., Svendsen, T., Lee, C.H.: An artificial neural network approach to automatic speech processing. Neurocomputing 140, 326–338 (2014). https://doi.org/10.1016/j.neucom.2014.03.005
Güraksin, G.E.: Kalp seslerinin yapay sinir ağları ile sınıflandırılması (Master's thesis, Fen Bilimleri Enstitüsü).(2009)
Akdeniz, F., Becerikli, Y.: Performance comparison of support vector machine, k-nearest-neighbor, artificial neural networks, and recurrent neural networks in gender recognition from voice signals. In: 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) (pp. 1–4). IEEE (2019)
Machado, T.J., Vieira Filho, J., de Oliveira, M.A.: Forensic speaker verification using ordinary least squares. Sensors 19(20), 4385 (2019). https://doi.org/10.3390/s19204385
Tibdewal, M.N., Fate, R.R., Mahadevappa, M., Ray, A.K., Malokar, M.: Classification of artifactual EEG signal and detection of multiple eye movement artifact zones using novel time-amplitude algorithm. SIViP 11(2), 333–340 (2017)
Wang, F., Chen, Z., Wu, C., Yang, Y.: Prediction on sound insulation properties of ultrafine glass wool mats with artificial neural networks. Appl. Acoust. 146, 164–171 (2019). https://doi.org/10.1016/j.apacoust.2018.11.018
Kır Savaş, B., Becerikli, Y.: Behavior-based driver fatigue detection system with deep belief network. Neural Comput. Appl. (2022). https://doi.org/10.1007/s00521-022-07141-4
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1–1.1. NASA STI/Recon Tech. Rep N 93, 27403 (1993)
Acknowledgements
Our study was supported by the Scientific and Technological Research Council of Turkey (Grant No. 121E725).
Funding
We received research funding from the Scientific and Technological Research Council of Turkey (Grant No. 121E725).
Author information
Authors and Affiliations
Contributions
FA was involved in the conceptualization, method, software, validation, formal analysis, investigation, software, resources, and data curation. YB contributed to the software, validation, supervision, conceptualization, review, and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Akdeniz, F., Becerikli, Y. Detecting audio copy-move forgery with an artificial neural network. SIViP 18, 2117–2133 (2024). https://doi.org/10.1007/s11760-023-02856-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02856-w