Abstract
Face forgery detection aims to distinguish between real and fake facial images or videos by identifying manipulated or forged visual media. The main challenge in face forgery detection is achieving high model generalization ability, i.e., satisfactory performance under cross-database scenarios where the training and testing datasets are from different forgery methods. To achieve this goal, this paper presents an attention-erasing stripe pyramid network (ASPNet) to utilize high-frequency noises and exploit both the RGB and fine-grained frequency clues. First, since separately extracting features from different scales and granularities will ignore their complementarity, we employ a stripe pyramid block (SPB) to learn multi-scale and multi-granularity features simultaneously. Second, to make the model focus on useful information and suppress noise, a two-stage attention block (TSAB) is introduced by combining spatial attention and channel attention to filter out the pixel-wise and channel-wise noise in the learned feature maps. Finally, to dynamically guide the model to pay attention to different areas of the human face, an attention erasing (AE) scheme is adopted by randomly erasing units in attention maps. Sufficient experiments demonstrate that ASPNet has superior performance than \(F^{3}\)-Net on the FaceForensics++ dataset. The area under the receiver operating characteristic curve (AUC) and the accuracy (ACC) of our model reach 77.4% and 70.85%, respectively, which are improved by 0.83% and 1.28% compared with \(F^{3}\)-Net. Our code is available at: https://github.com/NWPU-Zwu.
Similar content being viewed by others
Data availability
The original datasets have been published online. The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.
References
Chen, S., Yao, T., Chen, Y., et al.: Local relation learning for face forgery detection. Proc. AAAI Conf. Artif. Intell. 35(2), 1081–1088 (2021)
Luo, Y., Zhang, Y., Yan, J., et al.: Generalizing face forgery detection with high-frequency features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16317–16326 (2021)
Yang, J., Xiao, S., Li, A., et al.: MSTA-net: forgery detection by generating manipulation trace based on multi-scale self-texture attention. IEEE Trans. Circ. Syst. Video Technol. 32(7), 4854–4866 (2021)
Shang, Z., Xie, H., Zha, Z., et al.: PRRNet: pixel-region relation network for face forgery detection. Pattern Recogn. 116, 107950 (2021)
Liu, Z., Lin, Y., Cao, Y., et al.: Swin-transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–1002 (2021)
Martinel, N., Luca Foresti, G., Micheloni, C.: Aggregating deep pyramidal representations for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, (2019)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)
Wang, C., Zhang, Q., Huang, C., et al.: Mancs: A multi-task attentional network with curriculum sampling for person re-identification. In: Proceedings of the European conference on computer vision (ECCV), pp. 365–381 (2018)
Zhong, Y., Wang, Y., Zhang, S.: Progressive feature enhancement for person re-identification. IEEE Trans. Image Process. 30, 8384–8395 (2021)
Sun, K., Liu, H., Yao, T., et al.: An information theoretic approach for attention-driven face forgery detection. European conference on computer vision, pp. 111–127. Springer, Cham (2022)
Fei, J., Dai, Y., Yu, P., et al.: Learning second order local anomaly for general face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20270–20280 (2022)
Wang, Q., Guo, G.: AAN-face: attention augmented networks for face recognition. IEEE Trans. Image Process. 30, 7636–7648 (2021)
Yu, P., Fei, J., Xia, Z., et al.: Improving generalization by commonality learning in face forgery detection. IEEE Trans. Inf. Forensics Secur. 17, 547–558 (2022)
Cao, J., Ma, C., Yao, T., et al.: End-to-end reconstruction-classification learning for face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4113–4122 (2022)
Yang, J., Cai, Y., Liu, D., et al.: Multi-scale Siamese prediction network for video anomaly detection. Signal, Image and Video Processing, pp. 1–8 (2022)
Aloraini, M.: FaceMD: convolutional neural network-based spatiotemporal fusion facial manipulation detection. SIViP 17(1), 247–255 (2023)
Atkale, D.V., Pawar, M.M., Deshpande, S.C., et al.: Multi-scale feature fusion model followed by residual network for generation of face aging and de-aging. SIViP 16(3), 753–761 (2022)
Qian, Y., Yin, G., Sheng, L., et al.: Thinking in frequency: face forgery detection by mining frequency-aware clues. European conference on computer vision, pp. 86–103. Springer, Cham (2020)
Wang, L., Fayolle, P.A., Belyaev, A.G.: Reverse image filtering with clean and noisy filters. SIViP 17(2), 333–341 (2023)
Jia, S., Ma, C., Yao, T., et al.: Exploring frequency adversarial attacks for face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4103–4112 (2022)
Zhao, H., Zhou, W., Chen, D., et al.: Multi-attentional deepfake detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2185–2194. (2021)
Tao, H., Duan, Q.: Learning discriminative feature representation for estimating smoke density of smoky vehicle rear. IEEE Trans. Intell. Transp. Syst. 12(23), 23136–23147 (2022)
Tao, H., Lu, M., Hu, Z., et al.: Attention-aggregated attribute-aware network with redundancy reduction convolution for video-based industrial smoke emission recognition. IEEE Trans. Industr. Inf. 18(11), 7653–7664 (2022)
Wang, C., Deng, W.: Representative forgery mining for fake face detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14923–14932 (2021)
Duan, Q., Hu, Z., Lu, M., et al.: Learning discriminative features for person re-identification via multi-spectral channel attention. SIViP (2023). https://doi.org/10.1007/s11760-023-02522-1
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258. (2017)
Wang, Q., Guo, G.: LS-CNN: characterizing local patches at multiple scales for face recognition. IEEE Trans. Inf. Forensics Secur. 15, 1640–1653 (2019)
Rossler, A., Cozzolino, D., Verdoliva, L., et al.: Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1–11. (2019)
Sagonas, C., Antonakos, E., Tzimiropoulos, G., et al.: 300 faces in-the-wild challenge: database and results. Image Vis. Comput. 47, 3–18 (2016)
Haliassos, A., Vougioukas, K., Petridis, S., et al.: Lips don't lie: a generalisable and robust approach to face forgery detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5039–5049 (2021)
Zheng, Y., Bao, J., Chen, D., et al.: Exploring temporal coherence for more general video face forgery detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 15044–15054 (2021)
Li, L., Bao, J., Zhang, T., et al.: Face X-ray for more general face forgery detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5001–5010 (2020)
Fridrich, J., Kodovsky, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 7(3), 868–882 (2012)
Cozzolino, D., Poggi, G., Verdoliva, L.: Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Proceedings of the 5th ACM workshop on information hiding and multimedia security, pp. 159–164 (2017)
Bayar, B., Stamm, M.C.: A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information hiding and multimedia security, pp. 5–10 (2016)
Afchar, D., Nozick, V., Yamagishi, J., et al.: Mesonet: a compact facial video forgery detection network. In: 2018 IEEE international workshop on information forensics and security (WIFS). IEEE, pp. 1–7 (2018)
Nguyen, H.H., Fang, F., Yamagishi, J., et al.: Multi-task learning for detecting and segmenting manipulated facial images and videos. In: 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE, pp. 1–8 (2019)
Ni, Y., Meng, D., Yu, C., et al.: CORE: consistent representation learning for face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12–21 (2022)
Liu, D., Dang, Z., Peng, C., et al.: FedForgery: generalized face forgery detection with residual federated learning. arXiv preprint arXiv:2210.09563, (2022)
Deepfakes. https://github.com/iperov/DeepFaceLab. Accessed: 2020–05–10. 3, 6, 7.
Thies, J., Zollhofer, M., Stamminger, M., et al.: Face2face: Real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2387–2395 (2016)
Faceswap. https://github.com/MarekKowalski/FaceSwap. Accessed: 2020–05–10. 3, 6, 7.
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. Acm Trans. Graph. (TOG) 38(4), 1–12 (2019)
Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp. 618–626 (2017)
Rahmouni, N., Nozick, V., Yamagishi, J., et al.: Distinguishing computer graphics from natural images using convolution neural networks. In: 2017 IEEE workshop on information forensics and security (WIFS). IEEE, pp. 1–6 (2017)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, (2014)
Acknowledgements
This work was partly supported by the Fundamental Research Funds for the Central Universities (No. D5000210737), the Key Research and Development Program of Shaanxi Province (No. 2023-ZDLGY-53), and the National Natural Science Foundation of China (No. 62102320). (Corresponding author: Huanjie Tao).
Author information
Authors and Affiliations
Contributions
Zhenwu Hu completed the experiment and wrote the main manuscript text. Qianyue Duan provided ideas of improvement. Peiyu Zhang assisted us in writing. Huanjie Tao provided experiment guidance and writing advice. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, Z., Duan, Q., Zhang, P. et al. An attention-erasing stripe pyramid network for face forgery detection. SIViP 17, 4123–4131 (2023). https://doi.org/10.1007/s11760-023-02644-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02644-6