SegNet: a network for detecting deepfake facial videos

Yu, Chia-Mu; Chen, Kang-Cheng; Chang, Ching-Tang; Ti, Yen-Wu

doi:10.1007/s00530-021-00876-5

SegNet: a network for detecting deepfake facial videos

Regular Paper
Published: 10 January 2022

Volume 28, pages 793–814, (2022)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Chia-Mu Yu ORCID: orcid.org/0000-0002-1677-2131¹,
Kang-Cheng Chen²,
Ching-Tang Chang³ &
…
Yen-Wu Ti⁴

1196 Accesses
6 Citations
Explore all metrics

Abstract

Recent advancements in artificial intelligence have made the forgery of digital images and videos easy. Deepfake technology uses a deep learning approach to identify and replace faces in images or videos. It can make people distrust digital content, thereby significantly affecting political and social stability. If the sources of the training and test data are different, the existing solutions for identifying forged images can achieve a considerably low accuracy. In many cases, the detection accuracy is significantly lower than 50%. In this study, we propose SegNet, which is a face-forgery-detection method, to determine whether images or videos have been processed using deepfake technology. By focusing on the changes in various regions of an image and ignoring the characteristics of different forgery techniques, SegNet solves the problem of low detection accuracy. SegNet achieves satisfactory detection accuracy using the recently proposed separable convolutional neural networks, ensemble models, and image segmentation. Moreover, we examine the effects of different image-segmentation methods on the detection results. A comprehensive comparison between SegNet and the existing solutions shows the superior detection capability of SegNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

A review of object detection based on deep learning

Article 12 June 2020

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

Notes

Here, low-level features mean that as we only have partial images (sub-images), the features extracted from them are low-level compared with the high-level features extracted from the original images. We do not refer to the cases in which the first layers extract low-level features and the last layers extract high-level features in a CNN. However, we use similar terms.
https://github.com/deepfakes/faceswap.
https://github.com/MarekKowalski/FaceSwap.
https://github.com/iperov/DeepFaceLab.
Here, low-level features mean that because we only have partial images, i.e., sub-images, the features extracted from the sub-images are low-level compared to the high-level features extracted from the original images. We do not refer to the cases in which the first layers are used to extract low-level features and those in which the last layers are used to extract high-level features in a CNN. However, we use similar terms.
https://www.ffmpeg.org/.
In Table 1 each column comprises two numbers, one for the number of positive samples (genuine images) and the other for the number of negative samples (forged images). The first column (Training) corresponds to the construction of the training set. Specifically, we extracted approximately 5–10 images from each genuine video in the FF++, and in total, we obtained 7650 genuine images for the training set. Because the FF++ dataset provides samples processed using DF, FS, and F2F, we sample approximately 3300 images for each category, and in total, we obtain 10,478 forged images for the training set. We follow a similar approach to construct the other three columns. Specifically, we note that the videos used for the first column (Training) and second column (Validation) do not overlap for both the positive and negative samples. For DeepFaceLab and StyleGAN, we used the same 1439 positive samples extracted from the FF++ dataset. The 2620 negative samples for the third column (DeepFaceLab) and the 2000 negative samples for the fourth column (StyleGAN) are collected from the Internet.
The first row (Training, Testing) = (FF++, FF++ (Training)) of Table 2 shows that the dataset (both positive and negative samples) in the first column of Table 1 is used for both the training and testing processes. Obviously, this results in almost perfect accuracy. The second row (FF++, FF++ (Testing)) of Table 2 shows that the dataset of the first column of Table 1 is used for training, whereas the dataset in the second column of Table 1 is used for testing. As an example, the sixth row (DF and FS) of Table 2 shows that the DF part of the dataset in the first column of Table 1 is used for training, whereas the FS part of the dataset in the second column of Table 1 is used for testing. Note that the DF part of the dataset in the first column of Table 1 contains 7650 positive samples and approximately 3300 negative samples, whereas the FS part of the dataset in the second column of Table 1 contains 1002 positive samples and approximately 550 negative samples. Similar arguments apply to the other rows.

References

Afchar, D., Nozick, V., Yamagishi, J., Echizen, I.: Mesonet: a compact facial video forgery detection network. In: IEEE International Workshop on Information Forensics and Security (WIFS) (2018)
Agarwal, S., Farid, H., Gu, Y., He, M., Nagano, K., Li, H.: Protecting world leaders against deep fakes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)
Barni, M., Bondi, L., Bonettini, N., Bestagini, P., Costanzo, A., Tondi, B., Stefano, T.: Aligned and non-aligned double jpeg detection using convolutional neural networks. J. Vis. Commun. Image Represent. 49, 153–163 (2017)
Article Google Scholar
Bonettini, N., Cannas, ED., Mandelli, S., Bondi, L., Bestagini, P., Tubaro, S.: Video face manipulation detection through ensemble of cnns. In: International Conference on Pattern Recognition (ICPR) (2020)
Cheng, J., Wu, J., Leng, C., Wang, Y., Hu, Q.: Quantized cnn: a unified approach to accelerate and compress convolutional networks. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4730–4743 (2017)
Article Google Scholar
Chintha, A., Thai, B., Sohrawardi, S.J., Bhatt, K.M., Hickerson, A., Wright, M., Ptucha, R.: Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE J. Sel. Top. Signal Process. 14(5), 1024–1037 (2020)
Article Google Scholar
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Dang, H., Liu, F., Stehouwer, J., Liu, X., Jain, A.: On the detection of digital face manipulation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Fei, J., Xia, Z., Yu, P., Xiao, F.: Exposing ai-generated videos with motion magnification. Multimed. Tools Appl. 80, 30789–30802 (2021)
Article Google Scholar
Gardiner, N.: Facial re-enactment, speech synthesis and the rise of the deepfake. https://ro.ecu.edu.au/theses_hons/1530 (2019). Accessed 25 Feb 2019
Guera, D., Delp, E.: Deepfake video detection using recurrent neural networks. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Kakar, P., Sudha, N., Ser, W.: Exposing digital image forgeries by detecting discrepancies in motion blur. IEEE Trans. Multimed. 13(3), 443–452 (2011)
Article Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
King, D.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
Li, Y., Chang, MC., Farid, H., Lyu, S.: In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2018)
Lin, M., Chen, Q., Yan, S.: Network in network. arXiv:1312.4400 (2013)
Maheshwari, A.: Digital Transformation: Building Intelligent Enterprises. Wiley Press, Hoboken (2019)
Google Scholar
Nguyen, H., Yamagishi, J., Echizen, I.: Capsule-forensics: using capsule networks to detect forged images and videos. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019)
Quan, Y., Li, C.: On addressing the impact of iso speed upon prnu and forgery detection. arXiv:2006.11539 (2016)
Sabir, E., Cheng, J., Jaiswal, A., AbdAlmageed, W., Masi, I., Natarajan, P.: Recurrent-convolution approach to deepfake detection - state-of-art results on faceforensics++. arXiv:1905.00582 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Singh, R., Aggarwal, N.: Detection of upscale-crop and splicing for digital video authentication. Digit Investig. 21, 31–52 (2017)
Article Google Scholar
Singh, R., Aggarwal, N.: Optical flow and prediction residual based hybrid forensic system for inter-frame tampering detection. J. Circ. Syst. Comput. 26(7), 1750107 (2017)
Article Google Scholar
Su, L., Li, C., Lai, Y., Yang, J.: A fast forgery detection algorithm based on exponential-Fourier moments for video region duplication. IEEE Trans. Multimed. 20(4), 825–840 (2018)
Article Google Scholar
Tan, M., Le, QV.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) (2019)
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Niesner, M.: Face2face: Real-time face capture and reenactment of rgb videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A., Ortega-Garcia, J.: Deepfakes and beyond: a survey of face manipulation and fake detection. arXiv:2001.00179 (2020)
van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Wang, SY., Wang, O., Zhang, R., Owens, A., Efros, AA.: Cnn-generated images are surprisingly easy to spot... for now. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Yang, X., Li, Y., Lyu, S.: Exposing deep fakes using inconsistent head poses. In: IEEE International Workshop on Information Forensics and Security (WIFS) (2019)
Zhou, P., Han, X., Morariu, V., Davis, L.: Two-stream neural networks for tampered face detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2017)
Zhou, P., Han, X., Morariu, V., Davis, L.: Learning rich features for image manipulation detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

Download references

Acknowledgements

Chia-Mu Yu is supported by MOST 110-2636-E-009-018. We thank to National Center for High-performance Computing (NCHC) of National Applied Research Laboratories (NARLabs) in Taiwan for providing computational and storage resources.

Author information

Authors and Affiliations

Department of Information Management and Finance, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Chia-Mu Yu
Computational Intelligence Technology Center, Industrial Technology Research Institute, Zhudong, Taiwan
Kang-Cheng Chen
Department of Computer Science and Engineering, National Chung Hsing University, Taichung, Taiwan
Ching-Tang Chang
College of Artificial Intelligence, Yango University, Fuzhou, China
Yen-Wu Ti

Authors

Chia-Mu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kang-Cheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Tang Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yen-Wu Ti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chia-Mu Yu.

Additional information

Communicated by A. Liu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, CM., Chen, KC., Chang, CT. et al. SegNet: a network for detecting deepfake facial videos. Multimedia Systems 28, 793–814 (2022). https://doi.org/10.1007/s00530-021-00876-5

Download citation

Received: 25 February 2021
Accepted: 28 November 2021
Published: 10 January 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00530-021-00876-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SegNet: a network for detecting deepfake facial videos

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

A review of object detection based on deep learning

Convolutional neural network: a review of models, methodologies and applications to object detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SegNet: a network for detecting deepfake facial videos

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

A review of object detection based on deep learning

Convolutional neural network: a review of models, methodologies and applications to object detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation