Skip to main content
Log in

Noise-aware progressive multi-scale deepfake detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The proliferation of fake images generated by deepfake techniques has significantly threatened the trustworthiness of digital information, leading to a pressing need for face forgery detection. However, due to the similarity between human face images and the subtlety of artefact information, most deep face forgery detection methods face certain challenges, such as incomplete extraction of artefact information, limited performance in detecting low-quality forgeries, and insufficient generalization across different datasets. To address these issues, this paper proposes a novel noise-aware multi-scale deepfake detection model. Firstly, a progressive spatial attention module is introduced, which learns two types of spatial feature weights: boosting weight and suppression weight. The boosting weight highlights salient regions, while the suppression weight enables the model to capture more subtle artifact information. Through multiple boosting-suppression stages, the proposed model progressively focuses on different facial regions and extracts multi-scale RGB features. Additionally, a noise-aware two-stream network is introduced, which leverages frequency-domain features and fuses image noise with multi-scale RGB features. This integration enhances the model’s ability to handle image post-processing. Furthermore, the model learns global features from multi-modal features through multiple convolutional layers, which are combined with local similarity features for deepfake detection, thereby improving the model’s robustness. Experimental results on several benchmark databases demonstrate the superiority of our proposed method over state-of-the-art techniques. Our contributions lie in the progressive spatial attention module, which effectively addresses overfitting in CNNs, and the integration of noise-aware features and multi-scale RGB features. These innovations lead to enhanced accuracy and generalization performance in face forgery detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and material

The data that support the findings of this study are not openly available due to the sensitivity of face data and are available from the corresponding author upon reasonable request. FF++ and DFD datasets are available through the FaceForensicshttps://github.com/ondyari/FaceForensics, DF1.0 dataset is available through the https://github.com/EndlessSora/DeeperForensics-1.0/tree/master/dataset and DFDC dataset is available through thehttps://www.kaggle.com/competitions/deepfake-detection-challenge/data.

Code availability

Our code is available at https://github.com/PPnostalgia/NPMD

References

  1. Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion 64:131–148

    Article  Google Scholar 

  2. Nguyen TT, Nguyen QVH, Nguyen DT, Nguyen DT, Huynh-The T, Nahavandi S, Nguyen TT, Pham QV, Nguyen CM (2022) Deep learning for deepfakes creation and detection: A survey. Comput Vis Image Underst 223:103525

    Article  Google Scholar 

  3. Zhang T (2022) Deepfake generation and detection. Multimedia Tools and Applications 81:6259–6276

    Article  Google Scholar 

  4. Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8261–8265. IEEE

  5. Nguyen HH, Yamagishi J, Echizen I (2019) Capsule-forensics: Using capsule networks to detect forged images and videos. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2307–2311. IEEE

  6. Amerini I, Galteri L, Caldelli R, Del Bimbo A (2019) Deepfake video detection through optical flow based cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0

  7. Dang H, Liu F, Stehouwer J, Liu X, Jain AK (2020) On the detection of digital face manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5781–5790

  8. Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–7. IEEE

  9. Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2018) Face forensics: A large-scale video dataset for forgery detection in human faces. arXiv:1803.09179

  10. Cozzolino D, Poggi G, Verdoliva L (2017) Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, pp. 159–164

  11. Zanardelli M, Guerrini F, Leonardi R, Adami N (2022) Image forgery detection: a survey of recent deep-learning approaches. Multimedia Tools and Applications 82:17521–17566

    Article  Google Scholar 

  12. Wang H, Wu X, Huang Z, Xing EP (2020) High-frequency component helps explain the generalization of convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8684–8694

  13. Qi H, Guo Q, Juefei-Xu F, Xie X, Ma L, Feng W, Liu Y, Zhao J (2020) Deeprhythm: Exposing deepfakes with attentional visual heartbeat rhythms. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 4318–4327

  14. Nguyen HH, Fang F, Yamagishi J, Echizen I (2019) Multi-task learning for detecting and segmenting manipulated facial images and videos. In: 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–8. IEEE

  15. Zhou P, Han X, Morariu VI, Davis LS (2018) Learning rich features for image manipulation detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1053–1061

  16. Masi I, Killekar A, Mascarenhas RM, Gurudatt SP, AbdAlmageed W (2020) Two-branch recurrent network for isolating deepfakes in videos. In: European Conference on Computer Vision, pp. 667–684. Springer

  17. Qian Y, Yin G, Sheng L, Chen Z, Shao J (2020) Thinking in frequency: Face forgery detection by mining frequency-aware clues. In: European Conference on Computer Vision, pp. 86–103. Springer

  18. Fridrich J, Kodovsky J (2012) Rich models for steganalysis of digital images. IEEE Trans Inf Forensics Secur 7(3):868–882

    Article  Google Scholar 

  19. Mayer O, Stamm MC (2019) Forensic similarity for digital images. IEEE Trans Inf Forensics Secur 15:1331–1346

    Article  Google Scholar 

  20. Zhao T, Xu X, Xu M, Ding H, Xiong Y, Xia W (2021) Learning self-consistency for deepfake detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15023–15033

  21. Chen S, Yao T, Chen Y, Ding S, Li J, Ji R (2021) Local relation learning for face forgery detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1081–1088

  22. Kuang L, Wang Y, Hang T, Chen B, Zhao G (2022) A dual-branch neural network for deepfake video detection by detecting spatial and temporal inconsistencies. Multimedia Tools and Applications 81:42591–42606

    Article  Google Scholar 

  23. Durall R, Keuper M, Pfreundt FJ, Keuper J (2019) Unmasking deepfakes with simple features. arXiv:1911.00686

  24. Liu H, Li X, Zhou W, Chen Y, He Y, Xue H, Zhang W, Yu N (2021) Spatial phase shallow learning: rethinking face forgery detection in frequency domain. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 772–781

  25. Li Y, Chang MC, Lyu S (2018) In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking. In: IEEE International Workshop on Information Forensics and Security

  26. Hong C, Yu J, Zhang J, Jin X, Lee K (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Industr Inf 15(7):3952–3961

    Article  Google Scholar 

  27. Yu J, Tao D, Wang M, Rui Y (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Transactions on Cybernetics 45(4):767–779

    Article  PubMed  Google Scholar 

  28. Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578

    Article  Google Scholar 

  29. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoen coder for human pose recovery. IEEE Trans Industr Electron 24(12):5659–5670

    Google Scholar 

  30. Hong C, Yu J, Tao D, Wang M (2014) Image-based 3d human pose recovery by multi-view locality sensitive sparse retrieval. IEEE Trans Industr Electron 62(6):3742–3751

    Google Scholar 

  31. Kohli A, Gupta A (2021) Detecting deepfake, faceswap and face2face facial forgeries using frequency cnn. Multimedia Tools and Applications 80:18461–18478. Springer

  32. Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19

  33. Bayar B, Stamm MC (2018) Constrained convolutional neural networks: A new approach towards general purpose image manipulation detection. IEEE Trans Inf Forensics Secur 13(11):2691–2706

    Article  Google Scholar 

  34. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803

  35. Newman MEJ (2013) DeepFakes. http://github.com/deepfakes/faceswap

  36. Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: Real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395

  37. MarekKowalski: FaceSwap (2013). https://github.com/MarekKowalski/FaceSwap

  38. Thies J, Zollhöfer M, Nießner M (2019) Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38(4):1–12

    Article  Google Scholar 

  39. Li Y, Yang X, Sun P, Qi H, Lyu S (2020) Celeb-df: A large-scale challenging dataset for deepfake forensics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3207–3216

  40. Jiang L, Li R, Wu W, Qian C, Loy CC (2020) Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2889–2898

  41. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR

  42. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  ADS  Google Scholar 

  43. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626

  44. Li L, Bao J, Zhang T, Yang H, Chen D, Wen F, Guo B (2020) Face x-ray for more general face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5001–5010

  45. Nadimpalli A, Rattani A (2023) Facial forgery-based deepfake detection using fine grained features. arXiv:2310.07028v1

  46. Zhao L, Zhang M, Ding H, Cui X (2023) Fine-grained deepfake detection based on cross-modality attention. Neural computing & applications 35(15):10861–10874

    Article  Google Scholar 

  47. Waseem S, Abu-Bakar SARS, Omar Z, Ahmed BA, Baloch S, Hafeezallah A (2023) Multi-attention-based approach for deepfake face and expression swap detection and localization. EURASIP Journal on Image and Video Processing 2023(1)

Download references

Acknowledgements

This work is partly supported by the National Nature Science Foundation of China (No. 62072286, 61876100, 61572296), Shandong Provincial Science and Technology Support Program of Youth Innovation Team in Colleges (No. 2019KJN041, 2020KJN005) and Shandong Provincial Graduate Education Innovation Plan Project (SDYAL21211).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Guo.

Ethics declarations

Conflicts of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, X., Pang, S. & Guo, W. Noise-aware progressive multi-scale deepfake detection. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18836-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18836-2

Keywords

Navigation