Skip to main content

Hierarchical Contrastive Inconsistency Learning for Deepfake Video Detection

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13672))

Included in the following conference series:

Abstract

With the rapid development of Deepfake techniques, the capacity of generating hyper-realistic faces has aroused public concerns in recent years. The temporal inconsistency which derives from the contrast of facial movements between pristine and forged videos can serve as an efficient cue in identifying Deepfakes. However, most existing approaches tend to impose binary supervision to model it, which restricts them to only focusing on the category-level discrepancies. In this paper, we propose a novel Hierarchical Contrastive Inconsistency Learning framework (HCIL) with a two-level contrastive paradigm. Specially, sampling multiply snippets to form the input, HCIL performs contrastive learning from both local and global perspectives to capture more general and intrinsical temporal inconsistency between real and fake videos. Moreover, we also incorporate a region-adaptive module for intra-snippet inconsistency mining and an inter-snippet fusion module for cross-snippet information fusion, which further facilitates the inconsistency learning. Extensive experiments and visualizations demonstrate the effectiveness of our method against SOTA competitors on four Deepfake video datasets, i.e., FaceForensics++, Celeb-DF, DFDC, and Wild-Deepfake.

Z. Gu and T. Yao—Equal contributions.

This work was done when Zhihao Gu was an intern at Youtu Lab.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Beuve, N., Hamidouche, W., Deforges, O.: DmyT: dummy triplet loss for deepfake detection. In: WSMMADGD (2021)

    Google Scholar 

  2. Cao, J., Ma, C., Yao, T., Chen, S., Ding, S., Yang, X.: End-to-end reconstruction-classification learning for face forgery detection. In: CVPR (2022)

    Google Scholar 

  3. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR (2017)

    Google Scholar 

  4. Chen, J., Wang, X., Guo, Z., Zhang, X., Sun, J.: Dynamic region-aware convolution. In: CVPR (2021)

    Google Scholar 

  5. Chen, S., Yao, T., Chen, Y., Ding, S., Li, J., Ji, R.: Local relation learning for face forgery detection. In: AAAI (2021)

    Google Scholar 

  6. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)

    Google Scholar 

  7. Chen, Z., Li, B., Xu, J., Wu, S., Ding, S., Zhang, W.: Towards practical certifiable patch defense with vision transformer. In: CVPR (2022)

    Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  9. Dolhansky, B., Howes, R., Pflaum, B., Baram, N., Ferrer, C.C.: The deepfake detection challenge (DFDC) preview dataset. In: arXiv (2019)

    Google Scholar 

  10. Fung, S., Lu, X., Zhang, C., Li, C.T.: Deepfakeucl: deepfake detection via unsupervised contrastive learning. In: IJCNN (2021)

    Google Scholar 

  11. Gu, Q., Chen, S., Yao, T., Chen, Y., Ding, S., Yi, R.: Exploiting fine-grained face forgery clues via progressive enhancement learning. In: AAAI (2021)

    Google Scholar 

  12. Gu, Z., Chen, Y., Yao, T., Ding, S., Li, J., Huang, F., Ma, L.: Spatiotemporal inconsistency learning for deepfake video detection. In: ACM MM (2021)

    Google Scholar 

  13. Gu, Z., Chen, Y., Yao, T., Ding, S., Li, J., Ma, L.: Delving into the local: dynamic inconsistency learning for deepfake video detection. In: AAAI (2022)

    Google Scholar 

  14. Haliassos, A., Vougioukas, K., Petridis, S., Pantic, M.: Lips don’t lie: a generalisable and robust approach to face forgery detection. In: CVPR (2021)

    Google Scholar 

  15. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)

    Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  17. Hochreiter, S., Schmidhuber, J.: Long short-term memory. In: NC (1997)

    Google Scholar 

  18. Hu, Z., Xie, H., Wang, Y., Li, J., Wang, Z., Zhang, Y.: Dynamic inconsistency-aware deepfake video detection. In: IJCAI (2021)

    Google Scholar 

  19. Li, B., Sun, Z., Guo, Y.: Supervae: superpixelwise variational autoencoder for salient object detection. In: AAAI (2019)

    Google Scholar 

  20. Li, B., Sun, Z., Li, Q., Wu, Y., Hu, A.: Group-wise deep object co-segmentation with co-attention recurrent neural network. In: ICCV (2019)

    Google Scholar 

  21. Li, B., Sun, Z., Tang, L., Hu, A.: Two-B-real net: two-branch network for real-time salient object detection. In: ICASSP (2019)

    Google Scholar 

  22. Li, B., Sun, Z., Tang, L., Sun, Y., Shi, J.: Detecting robust co-saliency with recurrent co-attention neural network. In: IJCAI (2019)

    Google Scholar 

  23. Li, B., Sun, Z., Wang, Q., Li, Q.: Co-saliency detection based on hierarchical consistency. In: ACM MM (2019)

    Google Scholar 

  24. Li, B., Xu, J., Wu, S., Ding, S., Li, J., Huang, F.: Detecting adversarial patch attacks through global-local consistency. CoRR (2021)

    Google Scholar 

  25. Li, L., et al.: Face X-ray for more general face forgery detection. In: CVPR (2020)

    Google Scholar 

  26. Li, X., et al.: Sharp multiple instance learning for deepfake video detection. In: ACM MM (2020)

    Google Scholar 

  27. Li, Y., Chang, M.C., Lyu, S.: In ictu oculi: Exposing AI generated fake face videos by detecting eye blinking. arXiv (2018)

    Google Scholar 

  28. Li, Y., Lyu, S.: Exposing deepfake videos by detecting face warping artifacts. arXiv (2018)

    Google Scholar 

  29. Li, Y., Yang, X., Sun, P., Qi, H., Lyu, S.: Celeb-DF: a large-scale challenging dataset for deepfake forensics. In: CVPR (2020)

    Google Scholar 

  30. Lin, J., Gan, C., Han, S.: TSM: temporal shift module for efficient video understanding. In: ICCV (2019)

    Google Scholar 

  31. Liu, Z., Luo, D., Wang, Y., Wang, L., Tai, Y., Wang, C., Li, J., Huang, F., Lu, T.: TEINet: towards an efficient architecture for video recognition. In: AAAI (2020)

    Google Scholar 

  32. Liu, Z., Wang, L., Wu, W., Qian, C., Lu, T.: TAM: temporal adaptive module for video recognition. In: CVPR (2021)

    Google Scholar 

  33. Masi, I., Killekar, A., Mascarenhas, R.M., Gurudatt, S.P., AbdAlmageed, W.: Two-branch recurrent network for isolating deepfakes in videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 667–684. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_39

    Chapter  Google Scholar 

  34. Matern, F., Riess, C., Stamminger, M.: Exploiting visual artifacts to expose deepfakes and face manipulations. In: CVPRW (2019)

    Google Scholar 

  35. Nguyen, H.H., Yamagishi, J., Echizen, I.: Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP (2019)

    Google Scholar 

  36. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv (2018)

    Google Scholar 

  37. Qi, H., et al.: DeepRhythm: exposing deepfakes with attentional visual heartbeat rhythms. In: ACM MM (2020)

    Google Scholar 

  38. Qian, Y., Yin, G., Sheng, L., Chen, Z., Shao, J.: Thinking in frequency: face forgery detection by mining frequency-aware clues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 86–103. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_6

    Chapter  Google Scholar 

  39. Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: FaceForensics++: learning to detect manipulated facial images. In: ICCV (2019)

    Google Scholar 

  40. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)

    Google Scholar 

  41. Sohrawardi, S.J., et al.: Poster: towards robust open-world detection of deepfakes. In: ACM CCCS (2019)

    Google Scholar 

  42. Sun, K., Yao, T., Chen, S., Ding, S., Ji, R., et al.: Dual contrastive learning for general face forgery detection. In: AAAI (2021)

    Google Scholar 

  43. Tang, L., Li, B.: CLASS: cross-level attention and supervision for salient objects detection. In: Ishikawa, H., Liu, C., Pajdla, T., Shi, J. (eds.) ACCV (2020)

    Google Scholar 

  44. Tang, L., Li, B., Zhong, Y., Ding, S., Song, M.: Disentangled high quality salient object detection. In: ICCV (2021)

    Google Scholar 

  45. Wang, G., Jiang, Q., Jin, X., Li, W., Cui, X.: MC-LCR: multi-modal contrastive classification by locally correlated representations for effective face forgery detection. arXiv (2021)

    Google Scholar 

  46. Wang, G., Zhou, J., Wu, Y.: Exposing deep-faked videos by anomalous co-motion pattern detection. arXiv (2020)

    Google Scholar 

  47. Wang, L., Tong, Z., Ji, B., Wu, G.: TDN: temporal difference networks for efficient action recognition. In: CVPR (2021)

    Google Scholar 

  48. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)

    Google Scholar 

  49. Wang, X., Yao, T., Ding, S., Ma, L.: Face manipulation detection via auxiliary supervision. In: ICONIP (2020)

    Google Scholar 

  50. Wu, C.Y., Feichtenhofer, C., Fan, H., He, K., Krahenbuhl, P., Girshick, R.: Long-term feature banks for detailed video understanding. In: CVPR (2019)

    Google Scholar 

  51. Wu, W., et al.: DSANet: Dynamic segment aggregation network for video-level representation learning. In: ACM MM (2021)

    Google Scholar 

  52. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)

    Google Scholar 

  53. Xu, Y., Raja, K., Pedersen, M.: Supervised contrastive learning for generalizable and explainable deepfakes detection. In: WCACV (2022)

    Google Scholar 

  54. Yang, X., Li, Y., Lyu, S.: Exposing deep fakes using inconsistent head poses. In: ICASSP (2019)

    Google Scholar 

  55. Zhang, D., Li, C., Lin, F., Zeng, D., Ge, S.: Detecting deepfake videos with temporal dropout 3DCNN. In: AAAI (2021)

    Google Scholar 

  56. Zhang, J., et al.: Towards efficient data free black-box adversarial attack. In: CVPR (2022)

    Google Scholar 

  57. Zhang, S., Guo, S., Huang, W., Scott, M.R., Wang, L.: V4D: 4D convolutional neural networks for video-level representation learning. arXiv (2020)

    Google Scholar 

  58. Zhong, Y., Li, B., Tang, L., Kuang, S., Wu, S., Ding, S.: Detecting camouflaged object in frequency domain. In: CVPR (2022)

    Google Scholar 

  59. Zhong, Y., Li, B., Tang, L., Tang, H., Ding, S.: Highly efficient natural image matting. CoRR (2021)

    Google Scholar 

  60. Zi, B., Chang, M., Chen, J., Ma, X., Jiang, Y.G.: Wilddeepfake: a challenging real-world dataset for deepfake detection. In: ACM MM (2020)

    Google Scholar 

Download references

Acknowledgements

This research is supported in part by the National Key Research and Development Program of China (No. 2019YFC1521104), National Natural Science Foundation of China (No. 61972157 and No. 72192821), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), Shanghai Science and Technology Commission (21511101200 and 21511101200) and Art major project of National Social Science Fund (I8ZD22). We also thank Shen Chen for the proof-read of our manuscript.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shouhong Ding or Lizhuang Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gu, Z., Yao, T., Chen, Y., Ding, S., Ma, L. (2022). Hierarchical Contrastive Inconsistency Learning for Deepfake Video Detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13672. Springer, Cham. https://doi.org/10.1007/978-3-031-19775-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19775-8_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19774-1

  • Online ISBN: 978-3-031-19775-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics