Abstract
Face forgery detection has been a widespread issue recently due to the adverse effects of face forgery techniques on social media. The state-of-the-art deep learning based methods commonly employ low-level texture features for face forgery detection, since most face forgery methods have difficulty simulating low-level signals in natural images. However, most existing methods only visit the low-level features from the spatial or temporal perspective. In this work, we revisit the face forgery detection problem from a spatio-temporal perspective to cover both for better generalization performance. Specifically, we propose a Spatio-Temporal Difference Network (STDN) to mine low-level clues for face forgery detection. The network contains three different but complementary branches 1) high-frequency channel difference images, 2) inter-frame residual signals, and 3) raw RGB images. It is able to capture face forgery traces through a three-branch collaborative learning framework. Furthermore, we propose a multimodal attention fusion module to effectively fuse the complementary features from different branches. Through comprehensive experiments on several publicly available datasets, we demonstrate the superior performance of the proposed STDN. The effectiveness of low-level spatio-temporal clues in a collaborative learning framework could potentially guide future work in face forgery detection.
Similar content being viewed by others
Data availability
The datasets generated or analyzed during the current study are available from the corresponding author upon reasonable request.
References
Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: IEEE international workshop on information forensics and security (WIFS), pp 1–7. IEEE. https://doi.org/10.1109/WIFS.2018.8630761
Amerini I, Galteri L, Caldelli R, Del Bimbo A (2019) Deepfake video detection through optical flow based cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 1205–1207. https://doi.org/10.1109/ICCVW.2019.00152
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1409.0473
Bayar B, Stamm MC (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, pp 5–10. https://doi.org/10.1145/2909827.2930786
Brooks R, Yuan Y, Liu Y, Chen H et al (2022) Deepfake and its enabling techniques: a review. APSIPA Transactions on Signal and Information Processing 11(2). https://doi.org/10.1561/116.00000024
Caldelli R, Galteri L, Amerini I, Del Bimbo A (2021) Optical flow based cnn for detection of unlearnt deepfake manipulations. Pattern Recogn Lett 146:31–37. https://doi.org/10.1016/j.patrec.2021.03.005
Chen S, Yao T, Chen Y, Ding S, Li J, Ji R (2021) Local relation learning for face forgery detection. Proceed AAAI Conf Artif Intell 35:1081–1088. https://doi.org/10.48550/arXiv.2105.02577
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1800–1807. https://doi.org/10.1109/CVPR.2017.195
Cozzolino D, Poggi G, Verdoliva L (2017) Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, pp 159–164. https://doi.org/10.1145/3082031.3083247
Dăscălescu AC, Boriga RE (2013) A novel fast chaos-based algorithm for generating random permutations with high shift factor suitable for image scrambling. Nonlinear Dyn 74(1–2):307–318. https://doi.org/10.1007/s11071-013-0969-6
Deepfakes: Deepfakes github (2018) https://github.com/deepfakes/faceswap. Accessed: 2023-01-03
Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: Single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5202–5211. https://doi.org/10.1109/CVPR42600.2020.00525
Durall R, Keuper M, Keuper J (2020) Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7887–7896. https://doi.org/10.1109/CVPR42600.2020.00791
Dzanic T, Shah K, Witherden FD (2020) Fourier spectrum discrepancies in deep network generated images. In: Annual Conference on Neural Information Processing Systems, pp 3022–3032. https://doi.org/10.48550/arXiv.1911.06465
Etemadi Borujeni S, Eshghi M (2009) Chaotic image encryption design using tompkins-paige algorithm. Math Probl Eng. https://doi.org/10.1155/2009/762652
Faceswap: Faceswap github (2018) https://github.com/MarekKowalski/FaceSwap. Accessed: 2023-01-03
Fei J, Dai Y, Yu P, Shen T, Xia Z, Weng J (2022) Learning second order local anomaly for general face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 20238–20248. https://doi.org/10.1109/CVPR52688.2022.01963
Fridrich J, Kodovsky J (2012) Rich models for steganalysis of digital images. IEEE Trans Inf Forensic Secur 7(3):868–882. https://doi.org/10.1109/TIFS.2012.2190402
González Fernández E, Sandoval Orozco AL, Garćıa Villalba L, J., Hernandez-Castro, J. (2018) Digital image tamper detection technique based on spectrum analysis of cfa artifacts. Sensors 18(9):2804. https://doi.org/10.3390/s18092804
Gu Z, Chen Y, Yao T, Ding S, Li J, Huang F, Ma L (2021) Spatiotemporal inconsistency learning for deepfake video detection. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 3473–3481. https://doi.org/10.1145/3474085.3475508
Guan J, Zhou H, Hong Z, Ding E, Wang J, Quan C, Zhao Y (2022) Delving into sequential patches for deepfake detection. CoRR abs/2207.02803. https://doi.org/10.48550/arXiv.2207.02803
Gunturk BK, Altunbasak Y, Mersereau RM (2002) Color plane interpolation using alternating projections. IEEE Trans Image Process 11(9):997–1013. https://doi.org/10.1109/TIP.2002.801121
Guo Z, Hu L, Xia M, Yang G (2021) Blind detection of glow-based facial forgery. Multimed Tools Appl 80(5):7687–7710. https://doi.org/10.1007/s11042-020-10098-y
Haliassos A, Vougioukas K, Petridis S, Pantic M (2021) Lips don’t lie: A generalisable and robust approach to face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5039–5049. https://doi.org/10.1109/CVPR46437.2021.00500
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1412.6980
Kirchner M (2010) Efficient estimation of cfa pattern configuration in digital camera images. In: Media forensics and security II, vol 7541. SPIE, pp 383–394. https://doi.org/10.1117/12.839102
Kohli A, Gupta A (2022) Light-weight 3dcnn for deepfakes, faceswap and face2face facial forgery detection. Multimed Tools Appl 81(22):31391–31403. https://doi.org/10.1007/s11042-022-12778-3
Kuang L, Wang Y, Hang T, Chen B, Zhao G (2022) A dual-branch neural network for deepfake video detection by detecting spatial and temporal inconsistencies. Multimed Tools Appl 81(29):42591–42606. https://doi.org/10.1007/s11042-021-11539-y
Li Y, Chang M-C, Lyu S (2018) In ictu oculi: exposing ai created fake videos by detecting eye blinking. In: IEEE international workshop on information forensics and security (WIFS), pp 1–7. IEEE. https://doi.org/10.1109/WIFS.2018.8630787
Li L, Bao J, Zhang T, Yang H, Chen D, Wen F, Guo B (2020) Face x-ray for more general face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5000–5009. https://doi.org/10.1109/CVPR42600.2020.00505
Li L, Bao J, Yang H, Chen D, Wen F (2020) Advancing high fidelity identity swapping for forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5073–5082. https://doi.org/10.1109/CVPR42600.2020.00512
Li J, Xie H, Li J, Wang Z, Zhang Y (2021) Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6458–6467. https://doi.org/10.1109/CVPR46437.2021.00639
Loukhaoukha K, Chouinard J-Y, Berdai A (2012) A secure image encryption algorithm based on rubik’s cube principle. J Electrical Comput Eng. https://doi.org/10.1155/2012/173931
Luo Y, Zhang Y, Yan J, Liu W (2021) Generalizing face forgery detection with high-frequency features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16317–16326. https://doi.org/10.1109/CVPR46437.2021.01605
Megahed A, Han Q (2022) Identify videos with facial manipulations based on convolution neural network and dynamic texture. Multimed Tools Appl 81(30):43441–43466. https://doi.org/10.1007/s11042-022-13102-9
Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: Annual Conference on Neural Information Processing Systems, vol 27
Nick, D, Andrew, G (2019) Deepfake Detection Dataset. https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html. Accessed: 2022-11-10
Nirkin Y, Wolf L, Keller Y, Hassner T (2021) Deepfake detection based on discrepancies between faces and their context. IEEE Trans Pattern Anal Mach Intell 44(10):6111–6121. https://doi.org/10.1109/TPAMI.2021.3093446
Panda SK, Diwan T, Kakde OG, Tembhurne JV (2022) Improvised detection of deepfakes from visual inputs using light weight deep ensemble model. Multimed Tools Appl, pp 1–18. https://doi.org/10.1007/s11042-022-14307-8
Qian Y, Yin G, Sheng L, Chen Z, Shao J (2020) Thinking in frequency: face forgery detection by mining frequency-aware clues. Eur Conf Comput Vis 12357:86–103. https://doi.org/10.1007/978-3-030-58610-2_6
Rahaman N, Baratin A, Arpit D, Draxler F, Lin M, Hamprecht F, Bengio Y, Courville A (2019) On the spectral bias of neural networks. In: International conference on machine learning, pp 5301–5310. PMLR. https://doi.org/10.48550/arXiv.1806.08734
Rahmouni N, Nozick V, Yamagishi J, Echizen I (2017) Distinguishing computer graphics from natural images using convolution neural networks. In: IEEE international workshop on information forensics and security (WIFS), pp 1–6. IEEE. https://doi.org/10.1109/WIFS.2017.8267647
Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: Learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1–11. https://doi.org/10.1109/ICCV.2019.00009
Saikia P, Dholaria D, Yadav P, Patel V, Roy M (2022) A hybrid cnn-lstm model for video deepfake detection by leveraging optical flow features. In: 2022 international joint conference on neural networks (IJCNN), pp 1–7. IEEE. https://doi.org/10.1109/IJCNN55064.2022.9892905
Shin HJ, Jeon JJ, Eom IK (2017) Color filter array pattern identification using variance of color difference image. J Electron Imaging 26(4):043015. https://doi.org/10.1117/1.JEI.26.4.043015
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Sun Z, Han Y, Hua Z, Ruan N, Jia W (2021) Improving the efficiency and robustness of deepfakes detection through precise geometric features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3609–3618. https://doi.org/10.1109/CVPR46437.2021.00361
Thies J, Zollhöfer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: Real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2387–2395. https://doi.org/10.1109/CVPR.2016.262
Thies J, Zollhöfer M, Nießner M (2019) Deferred neural rendering: image synthesis using neural textures. ACM Trans Graph (TOG) 38(4):1–12. https://doi.org/10.1145/3306346.3323035
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Annual Conference on Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1706.03762
Yu Y, Ni R, Li W, Zhao Y (2022) Detection of ai-manipulated fake faces via mining generalized features. ACM Trans Multimed Comput Commun Appl (TOMM) 18(4):1–23. https://doi.org/10.1145/3499026
Zhang Y, Li G, Cao Y, Zhao X (2020) A method for detecting human-face-tampered videos based on interframe difference. J Cyber Secur 5(2):49–72. https://doi.org/10.19363/J.cnki.cn10-1380/tn.2020.02.05
Zhang B, Li S, Feng G, Qian Z, Zhang X (2022) Patch diffusion: a general module for face manipulation detection. Proceed AAAI Conf Artif Intell 36:3243–3251. https://doi.org/10.1609/aaai.v36i3.20233
Zhang D, Zhu W, Ding X, Yang G, Li F, Deng Z, Song Y (2022) Srtnet: a spatial and residual based two-stream neural network for deepfakes detection. Multimed Tools Appl, pp 1–19. https://doi.org/10.1007/s11042-022-13966-x
Zhao T, Xu X, Xu M, Ding H, Xiong Y, Xia W (2021) Learning self-consistency for deepfake detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 15003–15013. https://doi.org/10.48550/arXiv.2012.09311
Zheng Y, Bao J, Chen D, Zeng M, Wen F (2021) Exploring temporal coherence for more general video face forgery detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 15044–15054. https://doi.org/10.1109/ICCV48922.2021.01477
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ding, B., Fan, Z., Zhao, Z. et al. Mining collaborative spatio-temporal clues for face forgery detection. Multimed Tools Appl 83, 27901–27920 (2024). https://doi.org/10.1007/s11042-023-16173-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16173-4