Skip to main content
Log in

FaceMD: convolutional neural network-based spatiotemporal fusion facial manipulation detection

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Digital videos have become essential to broadcast news that targets many audiences around the world, and it is therefore important to ensure the reliability of these broadcasted videos. Unfortunately, digital videos can be manipulated by replacing a person’s face or expressions with another person’s face or expressions without leaving visible traces. This facial manipulation is a challenging problem due to the lack of digital forensic techniques that can be used to verify the originality of video content. In this paper, we propose a novel approach, dubbed FaceMD, based on fusing three streams of convolutional neural networks to detect facial manipulation. The proposed FaceMD incorporates spatiotemporal information by fusing video frames, motion residuals, and 3D gradients to improve facial manipulation detection accuracy. We combine these three streams using different fusion methods and places to best use this spatiotemporal information, hence increasing detection performance. The experimental results show that the proposed FaceMD achieves state-of-the-art accuracy using two different facial manipulation data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Rössler, A., Cozzolino, D., Verdoliva,,L., Riess, C., Thies, J., Nießner, M.: “Faceforensics: A large-scale video dataset for forgery detection in human faces,” arXiv preprint arXiv:1803.09179 (2018)

  2. Aloraini, M., Sharifzadeh, M., Schonfeld, D.: Sequential and patch analyses for object removal video forgery detection and localization, IEEE Transactions on Circuits and Systems for Video Technology (Early Access), pp. 1 – 1 (2020)

  3. Faceswap. https://github.com/MarekKowalski/FaceSwap/, Accessed: 2020-05-20

  4. Deepfakes githup. https://github.com/deepfakes/faceswap, Accessed: 2020-05-20

  5. Thies, J., Zollhofer, M., Stamminger,M., Theobalt, C., Nießner, M.: Face2face: Real-time face capture and reenactment of rgb videos, In: Proceedings of the IEEE conference on computer vision and pattern recognition pp. 2387–2395 (2016)

  6. Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graphics (TOG) 38(4), 1–12 (2019)

    Article  Google Scholar 

  7. Matern, F., Riess, C., Stamminger, M.: Exploiting visual artifacts to expose deepfakes and face manipulations, In: IEEE Winter Applications of Computer Vision Workshops (WACVW). IEEE 2019, pp. 83–92 (2019)

  8. Afchar, D., Nozick, V., Yamagishi, J., Echizen, I.: Mesonet: a compact facial video forgery detection network, In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS).IEEE, pp. 1–7 (2018)

  9. Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A., Ortega-Garcia, J.: Deepfakes and beyond: A survey of face manipulation and fake detection, arXiv preprint arXiv:2001.00179 (2020)

  10. Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: Faceforensics++: Learning to detect manipulated facial images, In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1–11 (2019)

  11. Google-jigsaw. https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html, Accessed: 2020-05-20

  12. Wu, M., Trappe, W., Wang, Z.J., Liu, K.R.: Collusion-resistant fingerprinting for multimedia. IEEE Sig Process Magaz 21(2), 15–27 (2004)

    Article  Google Scholar 

  13. Chen, S., Tan, S., Li, B., Huang, J.: Automatic detection of object-based forgery in advanced video. IEEE Trans. Circuits and Syst. Video Technol. 26(11), 2138–2151 (2016)

    Article  Google Scholar 

  14. Danielsson, P.-E., Seger, O.: Generalized and separable sobel operators, In: Machine vision for three-dimensional scenes.Elsevier, pp. 347–379 (1990)

  15. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks, In: Proceedings of the IEEE international conference on computer vision, pp. 4489–4497 (2015)

  16. Chollet, F.: Xception: Deep learning with depthwise separable convolutions, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258 (2017)

  17. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1933–1941 (2016)

  18. Zhou, P., Han, X., Morariu, V.I., Davis, L.S.: Two-stream neural networks for tampered face detection, In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, , pp. 1831–1839 (2017)

Download references

Acknowledgements

The researcher would like to thank the Deanship of Scientific Research, Qassim University, for funding the publication of this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Aloraini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 846 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aloraini, M. FaceMD: convolutional neural network-based spatiotemporal fusion facial manipulation detection. SIViP 17, 247–255 (2023). https://doi.org/10.1007/s11760-022-02227-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-022-02227-x

Keywords

Navigation