Mining collaborative spatio-temporal clues for face forgery detection

Ding, Bo; Fan, Zhenfeng; Zhao, Zejun; Xia, Shihong

doi:10.1007/s11042-023-16173-4

Mining collaborative spatio-temporal clues for face forgery detection

Published: 26 August 2023

Volume 83, pages 27901–27920, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Bo Ding^1,2,
Zhenfeng Fan^1,2,
Zejun Zhao^1,2 &
…
Shihong Xia ORCID: orcid.org/0000-0002-7228-9646^1,2

301 Accesses
1 Citation
Explore all metrics

Abstract

Face forgery detection has been a widespread issue recently due to the adverse effects of face forgery techniques on social media. The state-of-the-art deep learning based methods commonly employ low-level texture features for face forgery detection, since most face forgery methods have difficulty simulating low-level signals in natural images. However, most existing methods only visit the low-level features from the spatial or temporal perspective. In this work, we revisit the face forgery detection problem from a spatio-temporal perspective to cover both for better generalization performance. Specifically, we propose a Spatio-Temporal Difference Network (STDN) to mine low-level clues for face forgery detection. The network contains three different but complementary branches 1) high-frequency channel difference images, 2) inter-frame residual signals, and 3) raw RGB images. It is able to capture face forgery traces through a three-branch collaborative learning framework. Furthermore, we propose a multimodal attention fusion module to effectively fuse the complementary features from different branches. Through comprehensive experiments on several publicly available datasets, we demonstrate the superior performance of the proposed STDN. The effectiveness of low-level spatio-temporal clues in a collaborative learning framework could potentially guide future work in face forgery detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

$$D^3$$ : A Novel Face Forgery Detector Based on Dual-Stream and Dual-Utilization Methods

Research on video face forgery detection model based on multiple feature fusion network

Article 01 March 2024

Temporal Consistency Based Deep Face Forgery Detection Network

Data availability

The datasets generated or analyzed during the current study are available from the corresponding author upon reasonable request.

References

Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: IEEE international workshop on information forensics and security (WIFS), pp 1–7. IEEE. https://doi.org/10.1109/WIFS.2018.8630761
Amerini I, Galteri L, Caldelli R, Del Bimbo A (2019) Deepfake video detection through optical flow based cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 1205–1207. https://doi.org/10.1109/ICCVW.2019.00152
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1409.0473
Bayar B, Stamm MC (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, pp 5–10. https://doi.org/10.1145/2909827.2930786
Brooks R, Yuan Y, Liu Y, Chen H et al (2022) Deepfake and its enabling techniques: a review. APSIPA Transactions on Signal and Information Processing 11(2). https://doi.org/10.1561/116.00000024
Caldelli R, Galteri L, Amerini I, Del Bimbo A (2021) Optical flow based cnn for detection of unlearnt deepfake manipulations. Pattern Recogn Lett 146:31–37. https://doi.org/10.1016/j.patrec.2021.03.005
Chen S, Yao T, Chen Y, Ding S, Li J, Ji R (2021) Local relation learning for face forgery detection. Proceed AAAI Conf Artif Intell 35:1081–1088. https://doi.org/10.48550/arXiv.2105.02577
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1800–1807. https://doi.org/10.1109/CVPR.2017.195
Cozzolino D, Poggi G, Verdoliva L (2017) Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, pp 159–164. https://doi.org/10.1145/3082031.3083247
Dăscălescu AC, Boriga RE (2013) A novel fast chaos-based algorithm for generating random permutations with high shift factor suitable for image scrambling. Nonlinear Dyn 74(1–2):307–318. https://doi.org/10.1007/s11071-013-0969-6
Article MathSciNet Google Scholar
Deepfakes: Deepfakes github (2018) https://github.com/deepfakes/faceswap. Accessed: 2023-01-03
Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: Single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5202–5211. https://doi.org/10.1109/CVPR42600.2020.00525
Durall R, Keuper M, Keuper J (2020) Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7887–7896. https://doi.org/10.1109/CVPR42600.2020.00791
Dzanic T, Shah K, Witherden FD (2020) Fourier spectrum discrepancies in deep network generated images. In: Annual Conference on Neural Information Processing Systems, pp 3022–3032. https://doi.org/10.48550/arXiv.1911.06465
Etemadi Borujeni S, Eshghi M (2009) Chaotic image encryption design using tompkins-paige algorithm. Math Probl Eng. https://doi.org/10.1155/2009/762652
Faceswap: Faceswap github (2018) https://github.com/MarekKowalski/FaceSwap. Accessed: 2023-01-03
Fei J, Dai Y, Yu P, Shen T, Xia Z, Weng J (2022) Learning second order local anomaly for general face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 20238–20248. https://doi.org/10.1109/CVPR52688.2022.01963
Fridrich J, Kodovsky J (2012) Rich models for steganalysis of digital images. IEEE Trans Inf Forensic Secur 7(3):868–882. https://doi.org/10.1109/TIFS.2012.2190402
Article Google Scholar
González Fernández E, Sandoval Orozco AL, Garćıa Villalba L, J., Hernandez-Castro, J. (2018) Digital image tamper detection technique based on spectrum analysis of cfa artifacts. Sensors 18(9):2804. https://doi.org/10.3390/s18092804
Article ADS Google Scholar
Gu Z, Chen Y, Yao T, Ding S, Li J, Huang F, Ma L (2021) Spatiotemporal inconsistency learning for deepfake video detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 3473–3481. https://doi.org/10.1145/3474085.3475508
Guan J, Zhou H, Hong Z, Ding E, Wang J, Quan C, Zhao Y (2022) Delving into sequential patches for deepfake detection. CoRR abs/2207.02803. https://doi.org/10.48550/arXiv.2207.02803
Gunturk BK, Altunbasak Y, Mersereau RM (2002) Color plane interpolation using alternating projections. IEEE Trans Image Process 11(9):997–1013. https://doi.org/10.1109/TIP.2002.801121
Article PubMed ADS Google Scholar
Guo Z, Hu L, Xia M, Yang G (2021) Blind detection of glow-based facial forgery. Multimed Tools Appl 80(5):7687–7710. https://doi.org/10.1007/s11042-020-10098-y
Article Google Scholar
Haliassos A, Vougioukas K, Petridis S, Pantic M (2021) Lips don’t lie: A generalisable and robust approach to face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5039–5049. https://doi.org/10.1109/CVPR46437.2021.00500
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1412.6980
Kirchner M (2010) Efficient estimation of cfa pattern configuration in digital camera images. In: Media forensics and security II, vol 7541. SPIE, pp 383–394. https://doi.org/10.1117/12.839102
Kohli A, Gupta A (2022) Light-weight 3dcnn for deepfakes, faceswap and face2face facial forgery detection. Multimed Tools Appl 81(22):31391–31403. https://doi.org/10.1007/s11042-022-12778-3
Article Google Scholar
Kuang L, Wang Y, Hang T, Chen B, Zhao G (2022) A dual-branch neural network for deepfake video detection by detecting spatial and temporal inconsistencies. Multimed Tools Appl 81(29):42591–42606. https://doi.org/10.1007/s11042-021-11539-y
Article Google Scholar
Li Y, Chang M-C, Lyu S (2018) In ictu oculi: exposing ai created fake videos by detecting eye blinking. In: IEEE international workshop on information forensics and security (WIFS), pp 1–7. IEEE. https://doi.org/10.1109/WIFS.2018.8630787
Li L, Bao J, Zhang T, Yang H, Chen D, Wen F, Guo B (2020) Face x-ray for more general face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5000–5009. https://doi.org/10.1109/CVPR42600.2020.00505
Li L, Bao J, Yang H, Chen D, Wen F (2020) Advancing high fidelity identity swapping for forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5073–5082. https://doi.org/10.1109/CVPR42600.2020.00512
Li J, Xie H, Li J, Wang Z, Zhang Y (2021) Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6458–6467. https://doi.org/10.1109/CVPR46437.2021.00639
Loukhaoukha K, Chouinard J-Y, Berdai A (2012) A secure image encryption algorithm based on rubik’s cube principle. J Electrical Comput Eng. https://doi.org/10.1155/2012/173931
Luo Y, Zhang Y, Yan J, Liu W (2021) Generalizing face forgery detection with high-frequency features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16317–16326. https://doi.org/10.1109/CVPR46437.2021.01605
Megahed A, Han Q (2022) Identify videos with facial manipulations based on convolution neural network and dynamic texture. Multimed Tools Appl 81(30):43441–43466. https://doi.org/10.1007/s11042-022-13102-9
Article Google Scholar
Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: Annual Conference on Neural Information Processing Systems, vol 27
Nick, D, Andrew, G (2019) Deepfake Detection Dataset. https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html. Accessed: 2022-11-10
Nirkin Y, Wolf L, Keller Y, Hassner T (2021) Deepfake detection based on discrepancies between faces and their context. IEEE Trans Pattern Anal Mach Intell 44(10):6111–6121. https://doi.org/10.1109/TPAMI.2021.3093446
Article Google Scholar
Panda SK, Diwan T, Kakde OG, Tembhurne JV (2022) Improvised detection of deepfakes from visual inputs using light weight deep ensemble model. Multimed Tools Appl, pp 1–18. https://doi.org/10.1007/s11042-022-14307-8
Qian Y, Yin G, Sheng L, Chen Z, Shao J (2020) Thinking in frequency: face forgery detection by mining frequency-aware clues. Eur Conf Comput Vis 12357:86–103. https://doi.org/10.1007/978-3-030-58610-2_6
Article Google Scholar
Rahaman N, Baratin A, Arpit D, Draxler F, Lin M, Hamprecht F, Bengio Y, Courville A (2019) On the spectral bias of neural networks. In: International conference on machine learning, pp 5301–5310. PMLR. https://doi.org/10.48550/arXiv.1806.08734
Rahmouni N, Nozick V, Yamagishi J, Echizen I (2017) Distinguishing computer graphics from natural images using convolution neural networks. In: IEEE international workshop on information forensics and security (WIFS), pp 1–6. IEEE. https://doi.org/10.1109/WIFS.2017.8267647
Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: Learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1–11. https://doi.org/10.1109/ICCV.2019.00009
Saikia P, Dholaria D, Yadav P, Patel V, Roy M (2022) A hybrid cnn-lstm model for video deepfake detection by leveraging optical flow features. In: 2022 international joint conference on neural networks (IJCNN), pp 1–7. IEEE. https://doi.org/10.1109/IJCNN55064.2022.9892905
Shin HJ, Jeon JJ, Eom IK (2017) Color filter array pattern identification using variance of color difference image. J Electron Imaging 26(4):043015. https://doi.org/10.1117/1.JEI.26.4.043015
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Sun Z, Han Y, Hua Z, Ruan N, Jia W (2021) Improving the efficiency and robustness of deepfakes detection through precise geometric features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3609–3618. https://doi.org/10.1109/CVPR46437.2021.00361
Thies J, Zollhöfer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: Real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2387–2395. https://doi.org/10.1109/CVPR.2016.262
Thies J, Zollhöfer M, Nießner M (2019) Deferred neural rendering: image synthesis using neural textures. ACM Trans Graph (TOG) 38(4):1–12. https://doi.org/10.1145/3306346.3323035
Article Google Scholar
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Annual Conference on Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1706.03762
Yu Y, Ni R, Li W, Zhao Y (2022) Detection of ai-manipulated fake faces via mining generalized features. ACM Trans Multimed Comput Commun Appl (TOMM) 18(4):1–23. https://doi.org/10.1145/3499026
Article CAS Google Scholar
Zhang Y, Li G, Cao Y, Zhao X (2020) A method for detecting human-face-tampered videos based on interframe difference. J Cyber Secur 5(2):49–72. https://doi.org/10.19363/J.cnki.cn10-1380/tn.2020.02.05
Zhang B, Li S, Feng G, Qian Z, Zhang X (2022) Patch diffusion: a general module for face manipulation detection. Proceed AAAI Conf Artif Intell 36:3243–3251. https://doi.org/10.1609/aaai.v36i3.20233
Article Google Scholar
Zhang D, Zhu W, Ding X, Yang G, Li F, Deng Z, Song Y (2022) Srtnet: a spatial and residual based two-stream neural network for deepfakes detection. Multimed Tools Appl, pp 1–19. https://doi.org/10.1007/s11042-022-13966-x
Zhao T, Xu X, Xu M, Ding H, Xiong Y, Xia W (2021) Learning self-consistency for deepfake detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 15003–15013. https://doi.org/10.48550/arXiv.2012.09311
Zheng Y, Bao J, Chen D, Zeng M, Wen F (2021) Exploring temporal coherence for more general video face forgery detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 15044–15054. https://doi.org/10.1109/ICCV48922.2021.01477

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Bo Ding, Zhenfeng Fan, Zejun Zhao & Shihong Xia
University of Chinese Academy of Sciences, Beijing, China
Bo Ding, Zhenfeng Fan, Zejun Zhao & Shihong Xia

Authors

Bo Ding
View author publications
You can also search for this author in PubMed Google Scholar
Zhenfeng Fan
View author publications
You can also search for this author in PubMed Google Scholar
Zejun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shihong Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shihong Xia.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ding, B., Fan, Z., Zhao, Z. et al. Mining collaborative spatio-temporal clues for face forgery detection. Multimed Tools Appl 83, 27901–27920 (2024). https://doi.org/10.1007/s11042-023-16173-4

Download citation

Received: 13 February 2023
Revised: 15 May 2023
Accepted: 01 July 2023
Published: 26 August 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16173-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining collaborative spatio-temporal clues for face forgery detection

Abstract

Access this article

Similar content being viewed by others

$$D^3$$ : A Novel Face Forgery Detector Based on Dual-Stream and Dual-Utilization Methods

Research on video face forgery detection model based on multiple feature fusion network

Temporal Consistency Based Deep Face Forgery Detection Network

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining collaborative spatio-temporal clues for face forgery detection

Abstract

Access this article

Similar content being viewed by others

$$D^3$$ : A Novel Face Forgery Detector Based on Dual-Stream and Dual-Utilization Methods

Research on video face forgery detection model based on multiple feature fusion network

Temporal Consistency Based Deep Face Forgery Detection Network

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation