Skip to main content
Log in

SegNet: a network for detecting deepfake facial videos

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Recent advancements in artificial intelligence have made the forgery of digital images and videos easy. Deepfake technology uses a deep learning approach to identify and replace faces in images or videos. It can make people distrust digital content, thereby significantly affecting political and social stability. If the sources of the training and test data are different, the existing solutions for identifying forged images can achieve a considerably low accuracy. In many cases, the detection accuracy is significantly lower than 50%. In this study, we propose SegNet, which is a face-forgery-detection method, to determine whether images or videos have been processed using deepfake technology. By focusing on the changes in various regions of an image and ignoring the characteristics of different forgery techniques, SegNet solves the problem of low detection accuracy. SegNet achieves satisfactory detection accuracy using the recently proposed separable convolutional neural networks, ensemble models, and image segmentation. Moreover, we examine the effects of different image-segmentation methods on the detection results. A comprehensive comparison between SegNet and the existing solutions shows the superior detection capability of SegNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. Here, low-level features mean that as we only have partial images (sub-images), the features extracted from them are low-level compared with the high-level features extracted from the original images. We do not refer to the cases in which the first layers extract low-level features and the last layers extract high-level features in a CNN. However, we use similar terms.

  2. https://github.com/deepfakes/faceswap.

  3. https://github.com/MarekKowalski/FaceSwap.

  4. https://github.com/iperov/DeepFaceLab.

  5. Here, low-level features mean that because we only have partial images, i.e., sub-images, the features extracted from the sub-images are low-level compared to the high-level features extracted from the original images. We do not refer to the cases in which the first layers are used to extract low-level features and those in which the last layers are used to extract high-level features in a CNN. However, we use similar terms.

  6. https://www.ffmpeg.org/.

  7. In Table 1 each column comprises two numbers, one for the number of positive samples (genuine images) and the other for the number of negative samples (forged images). The first column (Training) corresponds to the construction of the training set. Specifically, we extracted approximately 5–10 images from each genuine video in the FF++, and in total, we obtained 7650 genuine images for the training set. Because the FF++ dataset provides samples processed using DF, FS, and F2F, we sample approximately 3300 images for each category, and in total, we obtain 10,478 forged images for the training set. We follow a similar approach to construct the other three columns. Specifically, we note that the videos used for the first column (Training) and second column (Validation) do not overlap for both the positive and negative samples. For DeepFaceLab and StyleGAN, we used the same 1439 positive samples extracted from the FF++ dataset. The 2620 negative samples for the third column (DeepFaceLab) and the 2000 negative samples for the fourth column (StyleGAN) are collected from the Internet.

  8. The first row (Training, Testing) = (FF++, FF++ (Training)) of Table 2 shows that the dataset (both positive and negative samples) in the first column of Table 1 is used for both the training and testing processes. Obviously, this results in almost perfect accuracy. The second row (FF++, FF++ (Testing)) of Table 2 shows that the dataset of the first column of Table 1 is used for training, whereas the dataset in the second column of Table 1 is used for testing. As an example, the sixth row (DF and FS) of Table 2 shows that the DF part of the dataset in the first column of Table 1 is used for training, whereas the FS part of the dataset in the second column of Table 1 is used for testing. Note that the DF part of the dataset in the first column of Table 1 contains 7650 positive samples and approximately 3300 negative samples, whereas the FS part of the dataset in the second column of Table 1 contains 1002 positive samples and approximately 550 negative samples. Similar arguments apply to the other rows.

References

  1. Afchar, D., Nozick, V., Yamagishi, J., Echizen, I.: Mesonet: a compact facial video forgery detection network. In: IEEE International Workshop on Information Forensics and Security (WIFS) (2018)

  2. Agarwal, S., Farid, H., Gu, Y., He, M., Nagano, K., Li, H.: Protecting world leaders against deep fakes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)

  3. Barni, M., Bondi, L., Bonettini, N., Bestagini, P., Costanzo, A., Tondi, B., Stefano, T.: Aligned and non-aligned double jpeg detection using convolutional neural networks. J. Vis. Commun. Image Represent. 49, 153–163 (2017)

    Article  Google Scholar 

  4. Bonettini, N., Cannas, ED., Mandelli, S., Bondi, L., Bestagini, P., Tubaro, S.: Video face manipulation detection through ensemble of cnns. In: International Conference on Pattern Recognition (ICPR) (2020)

  5. Cheng, J., Wu, J., Leng, C., Wang, Y., Hu, Q.: Quantized cnn: a unified approach to accelerate and compress convolutional networks. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4730–4743 (2017)

    Article  Google Scholar 

  6. Chintha, A., Thai, B., Sohrawardi, S.J., Bhatt, K.M., Hickerson, A., Wright, M., Ptucha, R.: Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE J. Sel. Top. Signal Process. 14(5), 1024–1037 (2020)

    Article  Google Scholar 

  7. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  8. Dang, H., Liu, F., Stehouwer, J., Liu, X., Jain, A.: On the detection of digital face manipulation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  9. Fei, J., Xia, Z., Yu, P., Xiao, F.: Exposing ai-generated videos with motion magnification. Multimed. Tools Appl. 80, 30789–30802 (2021)

    Article  Google Scholar 

  10. Gardiner, N.: Facial re-enactment, speech synthesis and the rise of the deepfake. https://ro.ecu.edu.au/theses_hons/1530 (2019). Accessed 25 Feb 2019

  11. Guera, D., Delp, E.: Deepfake video detection using recurrent neural networks. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2018)

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  13. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)

  14. Kakar, P., Sudha, N., Ser, W.: Exposing digital image forgeries by detecting discrepancies in motion blur. IEEE Trans. Multimed. 13(3), 443–452 (2011)

    Article  Google Scholar 

  15. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  16. King, D.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)

    Google Scholar 

  17. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)

  18. Li, Y., Chang, MC., Farid, H., Lyu, S.: In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2018)

  19. Lin, M., Chen, Q., Yan, S.: Network in network. arXiv:1312.4400 (2013)

  20. Maheshwari, A.: Digital Transformation: Building Intelligent Enterprises. Wiley Press, Hoboken (2019)

    Google Scholar 

  21. Nguyen, H., Yamagishi, J., Echizen, I.: Capsule-forensics: using capsule networks to detect forged images and videos. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019)

  22. Quan, Y., Li, C.: On addressing the impact of iso speed upon prnu and forgery detection. arXiv:2006.11539 (2016)

  23. Sabir, E., Cheng, J., Jaiswal, A., AbdAlmageed, W., Masi, I., Natarajan, P.: Recurrent-convolution approach to deepfake detection - state-of-art results on faceforensics++. arXiv:1905.00582 (2019)

  24. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  25. Singh, R., Aggarwal, N.: Detection of upscale-crop and splicing for digital video authentication. Digit Investig. 21, 31–52 (2017)

    Article  Google Scholar 

  26. Singh, R., Aggarwal, N.: Optical flow and prediction residual based hybrid forensic system for inter-frame tampering detection. J. Circ. Syst. Comput. 26(7), 1750107 (2017)

    Article  Google Scholar 

  27. Su, L., Li, C., Lai, Y., Yang, J.: A fast forgery detection algorithm based on exponential-Fourier moments for video region duplication. IEEE Trans. Multimed. 20(4), 825–840 (2018)

    Article  Google Scholar 

  28. Tan, M., Le, QV.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) (2019)

  29. Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Niesner, M.: Face2face: Real-time face capture and reenactment of rgb videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  30. Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A., Ortega-Garcia, J.: Deepfakes and beyond: a survey of face manipulation and fake detection. arXiv:2001.00179 (2020)

  31. van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  32. Wang, SY., Wang, O., Zhang, R., Owens, A., Efros, AA.: Cnn-generated images are surprisingly easy to spot... for now. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  33. Yang, X., Li, Y., Lyu, S.: Exposing deep fakes using inconsistent head poses. In: IEEE International Workshop on Information Forensics and Security (WIFS) (2019)

  34. Zhou, P., Han, X., Morariu, V., Davis, L.: Two-stream neural networks for tampered face detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2017)

  35. Zhou, P., Han, X., Morariu, V., Davis, L.: Learning rich features for image manipulation detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

Download references

Acknowledgements

Chia-Mu Yu is supported by MOST 110-2636-E-009-018. We thank to National Center for High-performance Computing (NCHC) of National Applied Research Laboratories (NARLabs) in Taiwan for providing computational and storage resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chia-Mu Yu.

Additional information

Communicated by A. Liu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, CM., Chen, KC., Chang, CT. et al. SegNet: a network for detecting deepfake facial videos. Multimedia Systems 28, 793–814 (2022). https://doi.org/10.1007/s00530-021-00876-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00876-5

Keywords

Navigation