Skip to main content
Log in

Fake visual content detection using two-stream convolutional neural networks

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Rapid progress in adversarial learning has enabled the generation of realistic-looking fake visual content. To distinguish between fake and real visual content, several detection techniques have been proposed. The performance of most of these techniques however drops off significantly if the test and the training data are sampled from different distributions. This motivates efforts towards improving the generalization of fake detectors. Since current fake content generation techniques do not accurately model the frequency spectrum of the natural images, we observe that the frequency spectrum of the fake visual data contains discriminative characteristics that can be used to detect fake content. We also observe that the information captured in the frequency spectrum is different from that of the spatial domain. Using these insights, we propose to complement frequency and spatial domain features using a two-stream convolutional neural network architecture called TwoStreamNet. We demonstrate the improved generalization of the proposed two-stream network to several unseen generation architectures, datasets, and techniques. The proposed detector has demonstrated significant performance improvement compared to the current state-of-the-art fake content detectors with the fusing of frequency and spatial domain streams also improving the generalization of the detector.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The eyes and mouth are determined as the mesoscopic features in the forgery detection in the Deepfake videos.

  2. https://www.yf.io/p/lsun.

  3. https://peterwang512.github.io/CNNDetection/.

References

  1. Chesney B, Citron D (2019) Deep fakes: a looming challenge for privacy, democracy, and national security. Calif L Rev 107:1753

    Google Scholar 

  2. Kumar R, Sotelo J, Kumar K, de Brébisson A, Bengio Y (2017) Obamanet: photo-realistic lip-sync from text. arXiv preprint arXiv:1801.01442

  3. Suwajanakorn S, Seitz SM, Kemelmacher-Shlizerman I (2017) Synthesizing Obama: learning lip sync from audio. ACM Trans Graph (TOG) 36(4):95

    Article  Google Scholar 

  4. Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2387–2395

  5. Thies J, Zollhöfer M, Nießner M, Valgaerts L, Stamminger M, Theobalt C (2015) Real-time expression transfer for facial reenactment. ACM Trans Graph 34(6):183–191

    Article  Google Scholar 

  6. Wiles O, Koepke A, Zisserman A (2018) X2face: a network for controlling face generation using images, audio, and pose codes. In: Proceedings of the European conference on computer vision (ECCV), pp 670–686

  7. Chan C, Ginosar S, Zhou T, Efros AA (2019) Everybody dance now. In: Proceedings of the IEEE international conference on computer vision, pp 5933–5942

  8. Cai H, Bai C, Tai YW, Tang CK (2018) Deep video generation, prediction and completion of human action sequences. In: Proceedings of the European conference on computer vision (ECCV), pp 366–382

  9. Esser P, Haux J, Milbich T et al (2018) Towards learning a realistic rendering of human behavior. In: Proceedings of the European conference on computer vision (ECCV)

  10. Wang Y, Skerry-Ryan RJ, Stanton D, Wu Y, Weiss RJ, Jaitly N, Yang Z, Xiao Y, Chen Z, Bengio S et al (2017)Tacotron: towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135

  11. Arik SO, Chen J, Peng K, Ping W, Zhou Y (2018) Neural voice cloning with a few samples. In: Advances in neural information processing systems, pp 10019–10029

  12. Tachibana H, Uenoyama K, Aihara S (2018) Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp 4784–4788. IEEE

  13. From porn to ‘game of thrones’: how deepfakes and realistic-looking fake videos hit it big. https://www.businessinsider.com/deepfakes-explained-the-rise-of-fake-realistic-videos-online-2019-6. Accessed 07 Dec 2020

  14. Lee D (2018)Fake porn’ has serious consequences. https://www.bbc.com/news/technology-42912529. Accessed 07 Dec 2020

  15. Cole S (2018) Gfycat’s AI solution for fighting deepfakes isn’t working. https://www.vice.com/en_us/article/ywe4qw/gfycat-spotting-deepfakes-fake-ai-porn. Accessed 07 Dec 2020

  16. Patrini G, Cavalli F, Henry A (2018) The state of deepfakes: reality under attack, annual report v.2.3. https://deeptracelabs.com/archive/. Accessed 07 Dec 2020

  17. Damiani J (2019) A voice deepfake was used to scam a CEO out of 243,000\$. https://www.forbes.com/sites/jessedamiani/2019/09/03/a-voice-deepfake-was-used-toscam-a-ceo-out-of-243000/. Accessed 07 Dec 2020

  18. Bayar B, Stamm MC (2016)A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information hiding and multimedia security, pp 5–10

  19. Cozzolino D, Poggi G, Verdoliva L (2017) Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Proceedings of the 5th ACM workshop on information hiding and multimedia security, pp 159–164

  20. Rahmouni N, Nozick V, Yamagishi J, Echizen I (2017) Distinguishing computer graphics from natural images using convolution neural networks. In: 2017 IEEE workshop on information forensics and security (WIFS), pp 1–6. IEEE

  21. Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2018) FaceForensics: a large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179

  22. Afchar D, Nozick V, Yamagishi J, Echizen I (2018) MesoNet: a compact facial video forgery detection network. In: 2018 IEEE international workshop on information forensics and security (WIFS), pp 1–7. IEEE

  23. Nataraj L, Mohammed TM, Manjunath BS, Chandrasekaran S, Flenner A, Bappy JH, Roy Chowdhury AK (2019) generated fake images using co-occurrence matrices. Electron Imaging 2019(5):532–541

  24. Wang SY, Wang O, Zhang R, Owens A, Efros AA (2020) Cnn-generated images are surprisingly easy to spot... for now. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 7

  25. Zhang X, Karaman S, Chang SF (2019) Detecting and simulating artifacts in gan fake images. In: 2019 IEEE international workshop on information forensics and security (WIFS), pp 1–6

  26. Agarwal S, Farid H (2017) Photo forensics from jpeg dimples. In: 2017 IEEE workshop on information forensics and security (WIFS), pp 1–6. IEEE

  27. Lyu S, Pan X, Zhang X (2014) Exposing region splicing forgeries with blind local noise estimation. Int J Comput Vis 110(2):202–221

    Article  Google Scholar 

  28. Popescu AC, Farid H (2005) Exposing digital forgeries by detecting traces of resampling. IEEE Trans Signal Process 53(2):758–767

    Article  MathSciNet  Google Scholar 

  29. Li H, Luo W, Qiu X, Huang J (2017) Image forgery localization via integrating tampering possibility maps. IEEE Trans Inf Forensics Secur 12(5):1240–1252

    Article  Google Scholar 

  30. Guo Y, Cao X, Zhang W, Wang R (2018) Fake colorized image detection. IEEE Trans Inf Forensics Secur 13(8):1932–1944

    Article  Google Scholar 

  31. Peng B, Wang W, Dong J, Tan T (2018) Image forensics based on planar contact constraints of 3d objects. IEEE Trans Inf Forensics Secur 13(2):377–392

    Article  Google Scholar 

  32. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929

  33. Huh M, Liu A, Owens A, Efros AA. (2018) Fighting fake news: image splice detection via learned self-consistency. In: Proceedings of the European conference on computer vision (ECCV), pp 101–117

  34. Cozzolino D, Poggi G, Verdoliva L (2015) Splicebuster: a new blind image splicing detector. In: 2015 IEEE international workshop on information forensics and security (WIFS), pp 1–6. IEEE

  35. Yuan Rao , Jiangqun Ni (2016) A deep learning approach to detection of splicing and copy-move forgeries in images. In: 2016 IEEE international workshop on information forensics and security (WIFS), pp 1–6. IEEE

  36. Yan Y, Ren W, Cao X (2019) Recolored image detection via a deep discriminative model. IEEE Trans Inf Forensics Secur 14(1):5–17

    Article  Google Scholar 

  37. Quan W, Wang K, Yan D, Zhang X (2018) Distinguishing between natural and computer-generated images using convolutional neural networks. IEEE Trans Inf Forensics Secur 13(11):2772–2787

    Article  Google Scholar 

  38. McCloskey S, Albright M (2018) Detecting gan-generated imagery using color cues. arXiv preprint arXiv:1812.08247

  39. Li Y, Lyu S (2018) Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656

  40. Li Y, Chang MC, Lyu S (2018) In ictu oculi: exposing AI created fake videos by detecting eye blinking. In: 2018 IEEE international workshop on information forensics and security (WIFS), pp 1–7. IEEE

  41. Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8261–8265. IEEE

  42. Wang R, Juefei-Xu F, Ma L, Xie X, Huang Y, Wang J, Liu Y (2019) Fakespotter: a simple baseline for spotting ai-synthesized fake faces. arXiv preprint arXiv:1909.06122

  43. Yang J, Xiao S, Li A, Lan G, Wang H (2021) Detecting fake images by identifying potential texture difference. Future Gener Comput Syst 125:127–135

    Article  Google Scholar 

  44. Guo H, Hu S, Wang X, Chang MC, Lyu S (2021) Robust attentive deep neural network for exposing gan-generated faces. arXiv preprint arXiv:2109.02167

  45. Cozzolino D, Thies J, Rössler A, Riess C, Nießner M, Verdoliva L (2018) Forensictransfer: weakly-supervised domain adaptation for forgery detection. arXiv preprint arXiv:1812.02510

  46. Xuan X, Peng B, Wang W, Dong J (2019) On the generalization of GAN image forensics. In: Chinese conference on biometric recognition. Springer, pp 134–141

  47. Gueguen L, Sergeev A, Kadlec B, Liu R, Yosinski J (2018) Faster neural networks straight from JPEG. In: Advances in neural information processing systems, pp 3933–3944

  48. Ehrlich M, Davis LS (2019) Deep residual learning in the jpeg transform domain. In: Proceedings of the IEEE international conference on computer vision, pp 3484–3493

  49. Xu K, Qin M, Sun F, Wang Y, Chen YK, Ren F (2020) Learning in the frequency domain. arXiv preprint arXiv:2002.12416

  50. Durall R, Keuper M, Pfreundt FJ, Keuper J (2019) Unmasking deepfakes with simple features. arXiv preprint arXiv:1911.00686

  51. Recommendation ITU-R (2011) Studio encoding parameters of digital television for standard 4: 3 and wide-screen 16: 9 aspect ratios

  52. Radiocommunication ITU (2002) Parameter values for the HDTV standards for production and international programme exchange. Recommendation ITU-R BT, pp 709–5

  53. Recommendation ITU-R BT.601-5 (1982-1995)

  54. Recommendation ITU-R BT.709-5 (1990-2002)

  55. Society of Motion Picture and Television Engineers SMPTE 240M-1999 “Television-Signal Parameters-1125-Line High-Definition Production”. http://www.smpte.org

  56. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  57. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196

  58. Yu F, Seff A, Zhang Y, Song S, Funkhouser T, Xiao J (2015) LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365

  59. Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8789–8797

  60. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4401–4410

  61. Chen C, Chen Q, Xu J, Koltun V (2018) Learning to see in the dark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3291–3300

  62. Brock A, Donahue J, Simonyan K (2018) Large scale GAN training for high fidelity natural image synthesis. In: International conference on learning representations

  63. Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2019) Analyzing and improving the image quality of StyleGAN. arXiv preprint arXiv:1912.04958

  64. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232

  65. Which face is real? https://www.whichfaceisreal.com/. Accessed 07 Dec 2020

  66. Park T, Liu MY, Wang TC, Zhu JY (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  67. Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M.(2019) FaceForensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE international conference on computer vision, pp 1–11

  68. Chen Q, Koltun V (2017) Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE international conference on computer vision, pp 1511–1520

  69. Li K, Zhang T, Malik J (2019) Diverse image synthesis from semantic layouts via conditional IMLE. In: Proceedings of the IEEE international conference on computer vision, pp 4220–4229

  70. Dai T, Cai J, Zhang Y, Xia ST, Zhang L (2019) Second-order attention network for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11065–11074

  71. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junaid Qadir.

Ethics declarations

Conflict of interest

We wish to confirm that there are no known conflicts of interest associated with this publication.

Ethical approval

We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us. We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yousaf, B., Usama, M., Sultani, W. et al. Fake visual content detection using two-stream convolutional neural networks. Neural Comput & Applic 34, 7991–8004 (2022). https://doi.org/10.1007/s00521-022-06902-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-06902-5

Keywords

Navigation