Skip to main content
Log in

SA\(^3\)WT: Adaptive Wavelet-Based Transformer with Self-Paced Auto Augmentation for Face Forgery Detection

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Face forgery detection (FFD) on digital images has become increasingly challenging with the proliferation of sophisticated manipulation techniques. In this study, we propose a novel approach, named Adaptive Wavelet-based Transformer with Self-paced Auto Augmentation (SA\(^3\)WT), which naturally combines the global representation capabilities of visual transformers with adaptive enhancement of fine-grained artifacts in the frequency domain to effectively capture forgery patterns. In particular, to adequately handle various clues, the network incorporates Wavelet-based Mixed Attention (WMA) Transformer block to better leverage the information residing in all frequency sub-bands and a Residual Reserve Fine-grained Sampler (RRFS) to enhance detailed forgery artifacts while learning hierarchical global representations. By deeply mixing the modeling processes of global representations and fine-grained features throughout the network, the model captures rich forgery clues while simultaneously bypassing the fusion issue arising from their separate extraction. Furthermore, Self-paced Auto Augmentation Strategy (SAAS) facilitates model learning by unifying data augmentation and active learning in a coupled manner. Extensive experiments conducted on several benchmarks demonstrate the superiority of SA\(^3\)WT compared to state-of-the-art methods. The ablation studies and cross-dataset evaluations confirm the significance of the specifically designed modules, in terms of both effectiveness and generalization. Our findings suggest that the pure visual transformers also provide a promising direction for advanced forgery detection in real-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

All the datasets used in this paper are available online. FaceForensics++ (https://github.com/ondyari/FaceForen-sics), Celeb-DF  (https://github.com/yuezunli/celeb-deepfakeforensics), DeepFake Detection Challenge (https://www.kaggle.com/c/deepfake-detection-challenge), DeepFake-TIMIT (https://www.idiap.ch/en/dataset/deepfaketimit), and DeeperForensics-1.0  (https://github.com/EndlessSora/DeeperForensics-1.0/tree/master/dataset) can be downloaded from their official websites accordingly.

References

  • Kemelmacher-Shlizerman, I. (2016). Transfiguring portraits. ACM Transactions on Graphics (TOG), 35(4), 1–8.

    Article  Google Scholar 

  • Koujan, M. R., Doukas, M. C., Roussos, A., & Zafeiriou, S. (2020). Head2head: Video-based neural head synthesis. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 16–23. IEEE

  • Pumarola, A., Agudo, A., Martinez, A. M., Sanfeliu, A., & Moreno-Noguer, F. (2018). Ganimation: Anatomically-aware facial animation from a single image. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 818–833.

  • Suwajanakorn, S., Seitz, S. M., & Kemelmacher-Shlizerman, I. (2017). Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (ToG), 36(4), 1–13.

    Article  Google Scholar 

  • Wang, Q., Zhang, P., Xiong, H., & Zhao, J. (2021). Face. evolve: A high-performance face recognition library. arXiv preprint arXiv:2107.08621

  • Zhao, J. (2018). Deep learning for human-centric image analysis. PhD thesis, National University of Singapore Singapore

  • Jeon, H., Bang, Y., & Woo, S. S. (2020). Fdftnet: Facing off fake images using fake detection fine-tuning network. In: IFIP International Conference on ICT Systems Security and Privacy Protection, pp. 416–430. Springer

  • Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). Mesonet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–7. IEEE

  • Miao, C., Tan, Z., Chu, Q., Liu, H., Hu, H., & Yu, N. (2023). F\(^{2}\)trans: High-frequency fine-grained transformer for face forgery detection. IEEE Transactions on Information Forensics and Security, 18, 1039–1051.

    Article  Google Scholar 

  • Nguyen, H. H., Fang, F., Yamagishi, J., & Echizen, I. (2019). Multi-task learning for detecting and segmenting manipulated facial images and videos. In: 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–8. IEEE

  • Marra, F., Gragnaniello, D., Cozzolino, D., & Verdoliva, L. (2018). Detection of gan-generated fake images over social networks. In: 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 384–389. IEEE

  • Nguyen, H. H., Yamagishi, J., & Echizen, I. (2019). Capsule-forensics: Using capsule networks to detect forged images and videos. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2307–2311. IEEE

  • Wang, R., Juefei-Xu, F., Ma, L., Xie, X., Huang, Y., Wang, J., & Liu, Y. (2019). Fakespotter: A simple yet robust baseline for spotting ai-synthesized fake faces. arXiv preprint arXiv:1909.06122

  • Dang, H., Liu, F., Stehouwer, J., Liu, X., & Jain, A. K. (2020). On the detection of digital face manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5781–5790.

  • Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., & Yu, N. (2021). Multi-attentional deepfake detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2185–2194.

  • Li, Y., & Lyu, S. (2018). Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656

  • Qi, H., Guo, Q., Juefei-Xu, F., Xie, X., Ma, L., Feng, W., Liu, Y., & Zhao, J. (2020). Deeprhythm: Exposing deepfakes with attentional visual heartbeat rhythms. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 4318–4327.

  • Li, X., Lang, Y., Chen, Y., Mao, X., He, Y., Wang, S., Xue, H., & Lu, Q. (2020). Sharp multiple instance learning for deepfake video detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1864–1872.

  • Mittal, T., Bhattacharya, U., Chandra, R., Bera, A., & Manocha, D. (2020). Emotions don’t lie: An audio-visual deepfake detection method using affective cues. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2823–2832.

  • Yang, C.-Z., Ma, J., Wang, S., & Liew, A.W.-C. (2020). Preventing deepfake attacks on speaker authentication by dynamic lip movement analysis. IEEE Transactions on Information Forensics and Security, 16, 1841–1854.

    Article  Google Scholar 

  • Coccomini, D.A., Caldelli, R., Falchi, F., Gennaro, C., & Amato, G. (2022). Cross-forgery analysis of vision transformers and cnns for deepfake image detection. In: Proceedings of the 1st International Workshop on Multimedia AI Against Disinformation, pp. 52–58.

  • Miao, C., Tan, Z., Chu, Q., Yu, N., & Guo, G. (2022). Hierarchical frequency-assisted interactive networks for face manipulation detection. IEEE Transactions on Information Forensics and Security, 17, 3008–3021.

    Article  Google Scholar 

  • Miao, C., Chu, Q., Li, W., Gong, T., Zhuang, W., & Yu, N. (2021). Towards generalizable and robust face manipulation detection via bag-of-local-feature. arXiv preprint arXiv:2103.07915

  • Wodajo, D., & Atnafu, S. (2021). Deepfake video detection using convolutional vision transformer. arXiv preprint arXiv:2102.11126

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems 30.

  • Wang, J., Wu, Z., Ouyang, W., Han, X., Chen, J., Jiang, Y.-G., & Li, S.-N. (2022). M2tr: Multi-modal multi-scale transformers for deepfake detection. In: Proceedings of the 2022 International Conference on Multimedia Retrieval, pp. 615–623.

  • Zheng, Y., Bao, J., Chen, D., Zeng, M., & Wen, F. (2021). Exploring temporal coherence for more general video face forgery detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15044–15054.

  • Wang, J., Sun, Y., & Tang, J. (2022). Lisiam: Localization invariance siamese network for deepfake detection. IEEE Transactions on Information Forensics and Security, 17, 2425–2436.

    Article  Google Scholar 

  • Tan, Z., Yang, Z., Miao, C., & Guo, G. (2022). Transformer-based feature compensation and aggregation for deepfake detection. IEEE Signal Processing Letters, 29, 2183–2187.

    Article  Google Scholar 

  • Gu, Q., Chen, S., Yao, T., Chen, Y., Ding, S., & Yi, R. (2022). Exploiting fine-grained face forgery clues via progressive enhancement learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 735–743.

    Article  Google Scholar 

  • Luo, Y., Zhang, Y., Yan, J., & Liu, W. (2021). Generalizing face forgery detection with high-frequency features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16317–16326.

  • Durall, R., Keuper, M., Pfreundt, F.-J., & Keuper, J. (2019). Unmasking deepfakes with simple features. arXiv preprint arXiv:1911.00686

  • Qian, Y., Yin, G., Sheng, L., Chen, Z., & Shao, J. (2020). Thinking in frequency: Face forgery detection by mining frequency-aware clues. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII, pp. 86–103. Springer

  • Masi, I., Killekar, A., Mascarenhas, R. M., Gurudatt, S. P., & AbdAlmageed, W. (2020). Two-branch recurrent network for isolating deepfakes in videos. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pp. 667–684. Springer

  • Li, J., Xie, H., Li, J., Wang, Z., & Zhang, Y. (2021). Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6458–6467.

  • Liu, H., Li, X., Zhou, W., Chen, Y., He, Y., Xue, H., Zhang, W., & Yu, N. (2021). Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 772–781.

  • Chen, S., Yao, T., Chen, Y., Ding, S., Li, J., & Ji, R. (2021). Local relation learning for face forgery detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 1081–1088.

    Article  Google Scholar 

  • Li, J., Xie, H., Yu, L., & Zhang, Y. (2022). Wavelet-enhanced weakly supervised local feature learning for face forgery detection. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 1299–1308.

  • Wolter, M., Blanke, F., Heese, R., & Garcke, J. (2022). Wavelet-packets for deepfake image analysis and detection. Machine Learning, 1–33.

  • Jia, G., Zheng, M., Hu, C., Ma, X., Xu, Y., Liu, L., Deng, Y., & He, R. (2021). Inconsistency-aware wavelet dual-branch network for face forgery detection. IEEE Transactions on Biometrics, Behavior, and Identity Science, 3(3), 308–319.

    Article  Google Scholar 

  • Li, Q., Shen, L., Guo, S., & Lai, Z. (2020). Wavelet integrated cnns for noise-robust image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7245–7254.

  • Mallat, S. G. (1989). A theory for multiresolution signal decomposition: the wavelet representation. IEEE transactions on pattern analysis and machine intelligence, 11(7), 674–693.

    Article  Google Scholar 

  • Daubechies, I. (1992). Ten Lectures on Wavelets. Houthalen-Helchteren: SIAM.

    Book  Google Scholar 

  • Strang, G., & Nguyen, T. (Eds.). (1996). Wavelets and Filter Banks. Wellesley: SIAM.

    Google Scholar 

  • Xu, K., Qin, M., Sun, F., Wang, Y., Chen, Y.-K., & Ren, F. (2020). Learning in the frequency domain. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1740–1749.

  • Qin, Z., Zhang, P., Wu, F., & Li, X. (2021). Fcanet: Frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 783–792.

  • Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., & Guo, B. (2020). Face x-ray for more general face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5001–5010.

  • Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., & Xia, W. (2021). Learning self-consistency for deepfake detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15023–15033.

  • Wang, Z., Bao, J., Zhou, W., Wang, W., & Li, H. (2023). Altfreezing for more general video face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4129–4138.

  • Carvalho, T., Faria, F. A., Pedrini, H., Torres, R., & d S., & Rocha, A. (2015). Illuminant-based transformed spaces for image forensics. IEEE transactions on information forensics and security, 11(4), 720–733.

  • Cozzolino, D., Gragnaniello, D., & Verdoliva, L. (2014). Image forgery localization through the fusion of camera-based, feature-based and pixel-based techniques. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 5302–5306. IEEE

  • Ferrara, P., Bianchi, T., De Rosa, A., & Piva, A. (2012). Image forgery localization via fine-grained analysis of cfa artifacts. IEEE Transactions on Information Forensics and Security, 7(5), 1566–1577.

    Article  Google Scholar 

  • Fridrich, J., & Kodovsky, J. (2012). Rich models for steganalysis of digital images. IEEE Transactions on information Forensics and Security, 7(3), 868–882.

    Article  Google Scholar 

  • Pan, X., Zhang, X., & Lyu, S. (2012). Exposing image splicing with inconsistent local noise variances. In: 2012 IEEE International Conference on Computational Photography (ICCP), pp. 1–10. IEEE

  • Peng, B., Wang, W., Dong, J., & Tan, T. (2016). Optimized 3d lighting environment estimation for image forgery detection. IEEE Transactions on Information Forensics and Security, 12(2), 479–494.

    Article  Google Scholar 

  • Shiohara, K., & Yamasaki, T. (2022). Detecting deepfakes with self-blended images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18720–18729.

  • Chen, L., Zhang, Y., Song, Y., Liu, L., & Wang, J. (2022). Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18710–18719.

  • Wang, S.-Y., Wang, O., Zhang, R., Owens, A., & Efros, A. A. (2020). Cnn-generated images are surprisingly easy to spot... for now. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8695–8704.

  • Miao, C., Chu, Q., Li, W., Li, S., Tan, Z., Zhuang, W., & Yu, N. (2021). Learning forgery region-aware and id-independent features for face manipulation detection. IEEE Transactions on Biometrics, Behavior, and Identity Science, 4(1), 71–84.

    Article  Google Scholar 

  • Wang, C., & Deng, W. (2021). Representative forgery mining for fake face detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14923–14932.

  • Zhuang, W., Chu, Q., Tan, Z., Liu, Q., Yuan, H., Miao, C., Luo, Z., & Yu, N. (2022). Uia-vit: Unsupervised inconsistency-aware method based on vision transformer for face forgery detection. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part V, pp. 391–407. Springer

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR.

  • Gong, C., Wang, D., Li, M., Chandra, V., & Liu, Q. (2021). Improve vision transformers training by suppressing over-smoothing. 4(11) . arXiv preprint arXiv:2104.12753

  • Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Jiang, Z., Hou, Q., & Feng, J. (2021). Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886

  • Yu, Z., Zhao, C., Wang, Z., Qin, Y., Su, Z., Li, X., Zhou, F., & Zhao, G. (2020). Searching central difference convolutional networks for face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5295–5305.

  • Ahonen, T., Hadid, A., & Pietikainen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE transactions on pattern analysis and machine intelligence, 28(12), 2037–2041.

    Article  Google Scholar 

  • Juefei-Xu, F., Naresh Boddeti, V., & Savvides, M. (2017). Local binary convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 19–28.

  • Berthelot, D., Carlini, N., Cubuk, E.D., Kurakin, A., Sohn, K., Zhang, H., & Raffel, C. (2019). Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint arXiv:1911.09785

  • Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR

  • Zhao, S., Liu, Z., Lin, J., Zhu, J.-Y., & Han, S. (2020). Differentiable augmentation for data-efficient gan training. Advances in Neural Information Processing Systems, 33, 7559–7570.

    Google Scholar 

  • Kostrikov, I., Yarats, D., & Fergus, R. (2020). Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649

  • Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2019). Autoaugment: Learning augmentation strategies from data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 113–123.

  • Hataya, R., Zdenek, J., Yoshizoe, K., & Nakayama, H. (2022). Meta approach to data augmentation optimization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2574–2583.

  • Ho, D., Liang, E., Chen, X., Stoica, I., & Abbeel, P. (2019). Population based augmentation: Efficient learning of augmentation policy schedules. In: International Conference on Machine Learning, pp. 2731–2741. PMLR

  • Cubuk, E.D., Zoph, B., Shlens, J., & Le, Q. V. (2020). Randaugment: Practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703.

  • Cheng, S., Leng, Z., Cubuk, E.D., Zoph, B., Bai, C., Ngiam, J., Song, Y., Caine, B., Vasudevan, V., & Li, C., et al. (2020). Improving 3d object detection through progressive population based augmentation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 279–294. Springer

  • Hataya, R., Zdenek, J., Yoshizoe, K., & Nakayama, H. (2020). Faster autoaugment: Learning augmentation strategies using backpropagation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp. 1–16. Springer

  • Li, Y., Hu, G., Wang, Y., Hospedales, T., Robertson, N. M., & Yang, Y. (2020). Dada: Differentiable automatic data augmentation. arXiv preprint arXiv:2003.03780

  • Lin, C., Guo, M., Li, C., Yuan, X., Wu, W., Yan, J., Lin, D., & Ouyang, W. (2019). Online hyper-parameter learning for auto-augmentation strategy. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6579–6588.

  • Zhang, X., Wang, Q., Zhang, J., & Zhong, Z. (2019). Adversarial autoaugment. arXiv preprint arXiv:1912.11188

  • Liu, A., Huang, Z., Huang, Z., & Wang, N. (2021). Direct differentiable augmentation search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12219–12228.

  • Duan, Y., Liu, F., Jiao, L., Zhao, P., & Zhang, L. (2017). Sar image segmentation based on convolutional-wavelet neural network and markov random field. Pattern Recognition, 64, 255–267.

    Article  Google Scholar 

  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022.

  • Su, Y., Shan, S., Chen, X., & Gao, W. (2009). Hierarchical ensemble of global and local classifiers for face recognition. IEEE Transactions on Image Processing, 18(8), 1885–1896.

    Article  MathSciNet  Google Scholar 

  • Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., & Zhang, L. (2021). Cvt: Introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22–31.

  • Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., & Ye, Q. (2021). Conformer: Local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 367–376.

  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141.

  • Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986.

  • Chai, L., Bau, D., Lim, S.-N., & Isola, P. (2020). What makes fake images detectable? understanding properties that generalize. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16, pp. 103–120. Springer

  • Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). Faceforensics++: Learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1–11.

  • Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). Celeb-df: A large-scale challenging dataset for deepfake forensics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3207–3216.

  • Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C. C. (2020). The deepfake detection challenge (dfdc) dataset. arXiv preprint arXiv:2006.07397

  • Korshunov, P., & Marcel, S. (2018). Deepfakes: a new threat to face recognition? assessment and detection. arXiv preprint arXiv:1812.08685

  • Jiang, L., Li, R., Wu, W., Qian, C., & Loy, C. C. (2020). Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2889–2898.

  • Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2019). Celeb-df: A new dataset for deepfake forensics. CoRR arXiv:1909.12962

  • Sun, K., Liu, H., Ye, Q., Gao, Y., Liu, J., Shao, L., & Ji, R. (2021). Domain general face forgery detection by learning to weight. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 2638–2646.

    Article  Google Scholar 

  • Sun, K., Yao, T., Chen, S., Ding, S., Li, J., & Ji, R. (2022). Dual contrastive learning for general face forgery detection. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 2316–2324.

    Article  Google Scholar 

  • Yang, J., Li, A., Xiao, S., Lu, W., & Gao, X. (2021). Mtd-net: learning to detect deepfakes images by multi-scale texture difference. IEEE Transactions on Information Forensics and Security, 16, 4234–4245.

    Article  Google Scholar 

  • Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: Real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395.

  • Thies, J., Zollhöfer, M., & Nießner, M. (2019). Deferred neural rendering: Image synthesis using neural textures. Acm Transactions on Graphics (TOG), 38(4), 1–12.

    Article  Google Scholar 

  • Bayar, B., & Stamm, M. C. (2016). A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, pp. 5–10.

  • Cozzolino, D., Poggi, G., & Verdoliva, L. (2017). Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, pp. 159–164.

  • Rahmouni, N., Nozick, V., Yamagishi, J., & Echizen, I. (2017). Distinguishing computer graphics from natural images using convolution neural networks. In: 2017 IEEE Workshop on Information Forensics and Security (WIFS), pp. 1–6. IEEE

  • Dong, S., Wang, J., Ji, R., Liang, J., Fan, H., & Ge, Z. (2023). Implicit identity leakage: The stumbling block to improving deepfake detection generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3994–4004.

  • Huang, B., Wang, Z., Yang, J., Ai, J., Zou, Q., Wang, Q., & Ye, D. (2023). Implicit identity driven deepfake face swapping detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4490–4499.

  • Zi, B., Chang, M., Chen, J., Ma, X., & Jiang, Y.-G. (2020). Wilddeepfake: A challenging real-world dataset for deepfake detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2382–2390.

  • Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR

  • Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988.

  • Cozzolino, D., Thies, J., Rössler, A., Riess, C., Nießner, M., & Verdoliva, L. (2018). Forensictransfer: Weakly-supervised domain adaptation for forgery detection. arXiv preprint arXiv:1812.02510

  • Li, D., Yang, Y., Song, Y.-Z., & Hospedales, T. (2018). Learning to generalize: Meta-learning for domain generalization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32.

  • Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497.

  • Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36. Springer

  • Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308.

Download references

Acknowledgements

This work is partly supported by the National Key Research and Development Program of China (2022ZD0161902), the National Natural Science Foundation of China (No. 62176012, 62202031, U20B2069), the Beijing Natural Science Foundation (No. 4222049), and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyu Yang.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Communicated by Segio Escalera.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Zhang, Y., Yang, H. et al. SA\(^3\)WT: Adaptive Wavelet-Based Transformer with Self-Paced Auto Augmentation for Face Forgery Detection. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-02091-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11263-024-02091-x

Keywords

Navigation