Skip to main content
Log in

A literature review and perspectives in deepfakes: generation, detection, and applications

  • Trends and Surveys
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

In the last few years, with the advancement of deep learning methods, especially Generative Adversarial Networks (GANs) and Variational Auto-encoders (VAEs), fabricated content has become more realistic and believable to the naked eye. Deepfake is one such emerging technology that allows the creation of highly realistic, believable synthetic content. On the one hand, Deepfake has paved the way for highly advanced applications in various fields like advertising, creative arts, and film productions. On the other hand, it poses a threat to various Multimedia Information Retrieval Systems (MIPR) such as face recognition and speech recognition systems and has more significant societal implications in spreading misleading information. This paper aims to assist an individual in understanding the deepfake technology (along with its application), current state-of-the-art methods and gives an idea about the future pathway of this technology. In this paper, we have presented a comprehensive literature survey on the application of deepfakes, followed by discussions on state-of-the-art methods for deepfake generation and detection for three media: Image, Video, and Audio. Next, we have extensively discussed the architectural components and dataset used for various methods of deepfakes. Furthermore, we discuss the various limitations and open challenges of deepfakes to identify the research gaps in this field. Finally, discuss the conclusion and future directions to explore the potential of this technology in the coming years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. AI-Assisted Fake Porn Is Here(vice.com).

  2. Deepfake Detection Challenge | Kaggle.

  3. ASVspoof.

  4. Synthesia Case studies: Reuters.

  5. Digital Doubles: The Deepfake Tech Nourishing New Wave Retail (forbes.com).

References

  1. Güera D, Delp EJ (2018) Deepfake video detection using recurrent neural networks. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS), Auckland

  2. Strickland E (2019) Facebook AI launches its deepfake detection challenge. In: IEEE, December 2019. https://spectrum.ieee.org/facebook-ai-launches-its-deepfake-detection-challenge

  3. Chesney R, Citron DK (2018) Deep fakes: a looming challenge for privacy, democracy, and national security, 68

  4. Mirsky Y, Lee W (2021) The creation and detection of deepfakes: a survey. ACM Comput Surv 54(1):1–41

    Article  Google Scholar 

  5. Jaiman A (2020) Positive uses of deepfakes, towards data science, 15 Aug 2020. https://towardsdatascience.com/positive-use-cases-of-deepfakes-49f510056387. Accessed 11 April 2021

  6. Damiani J (2019) A voice deepfake was used to scam a CEO Out Of $243,000, Forbes, 3 September 2019. https://www.forbes.com/sites/jessedamiani/2019/09/03/a-voice-deepfake-was-used-to-scam-a-ceo-out-of-243000/?sh=70583a482241. Accessed 10 July 2021

  7. Jaiman A (2020) Deepfakes harms and threat modeling, 19 Aug 2020. https://towardsdatascience.com/deepfakes-harms-and-threat-modeling-c09cbe0b7883. Accessed 14 April 2021

  8. . Rizzotto L (2019) Deepfake ads, 4 Dec 2019. https://medium.com/futurepi/why-deepfakes-will-change-advertising-forever-2949ec3f87ee. Accessed 18 April 2021

  9. Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond: a survey of face manipulation and fake detection. Inf Fusion 64:131–148

    Article  Google Scholar 

  10. Masood M, Nawaz M, Malik KM, Javed A, Irtaza A (2021) Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward, http://arxiv.org/abs/2103.00484v1

  11. Juefei-Xu F, Wang R, Huang Y, Guo Q, Ma L, Liu Y (2021) Countering malicious deepfakes: survey, battleground, and horizon. In: http://arxiv.org/abs/2103.00218v1

  12. Yu P, Xia Z, Fei J, Lu Y (2021) A survey on deepfake video detection. IET Biometrics 10(6):607–624

    Article  Google Scholar 

  13. Faceswap, https://faceswap.dev/. Accessed 6 April 2021

  14. FakeApp, https://www.malavida.com/en/soft/fakeapp/. Accessed 6 April 2021

  15. deepfakes/Faceswap, github, 2016. https://github.com/deepfakes/faceswap

  16. Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Niessner M (2019) FaceForensics++: learning to detect manipulated facial images. In: IEEE/CVF International Conference on Computer Vision (ICCV), Seoul

  17. Dale K, Sunkavalli K, Johnson MK, Vlasic D, Matusik W, Pfister H (2011) Video face replacement. ACM Trans Gr 30(6):1–10

    Article  Google Scholar 

  18. Li L, Bao J, Yang H, Chen D, Wen F (2020) Advancing high fidelity identity swapping for forgery detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle

  19. Nirkin Y, Keller Y, Hassner T (2019) FSGAN: subject agnostic face swapping and reenactment. In: IEEE/CVF International Conference on Computer Vision (ICCV), Seoul

  20. Chen R, Chen X, Ni B, Ge Y (2020) SimSwap: an efficient framework for high fidelity face swapping. In: Proceedings of the 28th ACM International Conference on Multimedia, Seattle

  21. Zhu Y, Li Q, Wang J, Xu C, Sun Z (2021) One shot face swapping on megapixels. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville

  22. Zhang L, Yang H, Qiu T, Li L (2021) AP-GAN: improving attribute preservation in video face swapping. IEEE Trans Circuits Syst Video Technol (Early Access) 32(4):2226–2237

    Article  Google Scholar 

  23. Peng B, Fan H, Wang W, Dong J, Lyu S (2021) A unified framework for high fidelity face swap and expression reenactment. IEEE Trans Circuits Syst Video Technol (Early Access) 32(6):3673–3684

    Article  Google Scholar 

  24. Cao M, Huang H, Wang H, Wang X, Shen L, Wang S, Bao L, Li Z, Luo J (2021) UniFaceGAN: a unified framework for temporally consistent facial video editing. IEEE Trans Image Process 30:6107–6116

    Article  Google Scholar 

  25. Chan C, Ginosar S, Zhou T, Efros A (2019) Everybody dance now. In: IEEE/CVF International Conference on Computer Vision (ICCV), Seoul

  26. Thies J, Zollhöfer M, Stamminger M, Theobalt C, Nießner M (2016) Face2Face: real-time face capture and reenactment of RGB videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas

  27. Thies J, Zollhöfer M, Nießner M (2019) Deferred neural rendering: image synthesis using neural textures. ACM Trans Gr 38(4):66

    Article  Google Scholar 

  28. Liu L, Xu W, Zollhöfer M, Kim H, Bernard F, Habermann M, Wang W, Theobalt C (2019) Neural rendering and reenactment of human actor videos. ACM Trans Gr 38(5):1–14

    Article  Google Scholar 

  29. Christos Doukas M, Koujan MR, Sharmanska V, Roussos A, Zafeiriou S (2021) Head2Head++: deep facial attributes re-targeting. IEEE Trans Biometrics Behav Identit Sci 3(1):31–43

    Article  Google Scholar 

  30. Zakharov E, Shysheya A, Burkov E, Lempitsky V (2019) Few-shot adversarial learning of realistic neural talking head models. In: IEEE/CVF International Conference on Computer Vision (ICCV), Seoul

  31. Wang T-C, Liu M-Y, Tao A, Liu G, Kautz J, Catanzaro B (2019) Few-shot video-to-video synthesis. In: Advances in Neural Information Processing Systems (NeurIPS), Vancouver

  32. Gafni O, Ashual O, Wolf L (2021) Single-shot freestyle dance reenactment. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville

  33. Zhang J, Zeng X, Pan Y, Liu Y, Ding Y, Fan C (2019) FaceSwapNet: landmark guided many-to-many face reenactment. http://arxiv.org/abs/1905.11805v1

  34. Zhang Y, Zhang S, He Y, Li C, Loy CC, Liu Z (2019) One-shot face reenactment. http://arxiv.org/abs/1908.03251v1

  35. Gu K, Zhou Y, Huang T (2020) FLNet: landmark driven fetching and learning network for faithful talking facial animation synthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, Hilton New York Midtown

  36. Lee J, Ramanan D, Girdhar R (2020) MetaPix: few-shot video retargeting. In: International conference on learning representations

  37. Sanchez E, Valstar M (2020) A recurrent cycle consistency loss for progressive face-to-face synthesis. In: IEEE international conference on automatic face and gesture recognition, Buenos Aires

  38. Tripathy S, Kannala J, Rahtu E (2021) FACEGAN: facial attribute controllable rEenactment GAN. In: IEEE winter conference on applications of computer vision (WACV), Waikoloa

  39. Lee C-H, Liu Z, Wu L, Luo P (2020) MaskGAN: towards diverse and interactive facial image manipulation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle

  40. Zhu Z, Huang T, Shi B, Yu M, Wang B, Bai X (2019) Progressive pose attention transfer for person image generation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach

  41. Aberman K, Shi M, Liao J, Lischinski D, Cohen-Or D, Chen B (2019) Deep video-based performance cloning. In: European association for computer graphics, Genova

  42. Zhou Y, Wang Z, Fang C, Bui T, Berg TL (2019) Dance dance generation: motion transfer for internet videos. In: IEEE/CVF international conference on computer vision workshop (ICCVW), Seoul

  43. Tripathy S, Kannala J, Rahtu E (2020) ICface: interpretable and controllable face reenactment using GANs. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass

  44. Zablotskaia P, Siarohin A, Zhao B, Sigal L (2019) DwNet: dense warp-based network for pose-guided human video generation. In: British Machine Vision Conference (BMVC), Cardiff

  45. Suwajanakorn S, Seitz SM, Kemelmacher-Shlizerman I (2017) Synthesizing Obama: learning lip sync from audio. ACM Trans Gr 36(4):1–14

    Article  Google Scholar 

  46. Fried O, Tewari A, Zollhöfer M, Finkelstein A, Shechtman E, Goldman DB, Genova K, Jin Z, Theobalt C, Agrawala M (2019) Text-based editing of talking-head video. ACM Trans Gr 38(4):1–14

    Article  Google Scholar 

  47. Lahiri A, Kwatra V, Frueh C, Lewis J, Bregler C (2021) LipSync3D: data-efficient learning of personalized 3D talking faces from video using pose and lighting normalization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville

  48. Zhang Z, Li L, Ding Y, Fan C (2021) Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville

  49. Jamaludin A, Chung JS, Zisserman A (2019) You said that?: Synthesising talking faces from audio. Int J Comput Vis 127:1767–1779

    Article  Google Scholar 

  50. Choi Y, Choi M, Kim M, Ha J-W, Kim S, Choo J (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City

  51. Pumarola A, Agudo A, Martinez AM, Sanfeliu A, Moreno-Noguer F (2019) GANimation: one-shot anatomically consistent facial animation. Int J Comput Vis 128:698–713

    Article  Google Scholar 

  52. Liu M, Ding Y, Xia M, Liu X, Ding E, Zuo W, Wen S (2019) STGAN: a unified selective transfer network for arbitrary image attribute editing. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach

  53. Liang H, Hou X, Shen L (2021) SSFlow: style-guided neural spline flows for face image manipulation. In: Proceedings of the 29th ACM international conference on multimedia, New York

  54. Wang R, Chen J, Yu G, Sun L, Yu C, Gao C, Sang N (2021) Attribute-specific Control Units in StyleGAN for Fine-grained image manipulation. In: Proceedings of the 29th ACM international conference on multimedia, New York

  55. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach

  56. Zhou H, Liu Y, Liu Z, Luo P, Wang X (2019) Talking face generation by adversarially disentangled audio-visual representation. In: AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu

  57. Chen L, Maddox RK, Duan Z, Xu C (2019) Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach

  58. Vougioukas K, Petridis S, Pantic M (2019) Realistic speech-driven facial animation with GANs. Int J Comput Vis 128:1398–1413

    Article  Google Scholar 

  59. Thies J, Elgharib M, Tewari A, Theobalt C, Nießner M (2020) Neural voice puppetry: audio-driven facial reenactment. In: European conference on computer vision (ECCV), Glasgow

  60. Vougioukas K, Petridis S, Pantic M (2019) End-to-end speech-driven realistic facial animation with temporal GANs In: Computer Vision and Pattern Recognition (CVPR), Long Beach

  61. He Z, Zuo W, Kan M, Shan S, Chen X (2019) AttGAN: facial attribute editing by only changing what you want. IEEE Trans Image Process 28(11):5464–5478

    Article  MathSciNet  MATH  Google Scholar 

  62. Shen Y, Gu J, Tang X, Zhou B (2020) Interpreting the latent space of GANs for semantic face editing. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle

  63. Jo Y, Park J (2019) SC-FEGAN: face editing generative adversarial network with user’s sketch and color. In: IEEE/CVF international conference on computer vision (ICCV), Seoul

  64. Shen Y, Yang C, Tang X, Zhou B (2020) InterFaceGAN: interpreting the disentangled face representation learned by GANs. IEEE Trans Pattern Anal Mach Intell (Early Access), p 1

  65. Fu C, Hu Y, Wu X, Wang G, Zhang Q, He R (2021) High-fidelity face manipulation with extreme poses and expressions. IEEE Trans Inf Forensics Secur 16:2218–2231

    Article  Google Scholar 

  66. Yang N, Zheng Z, Zhou M, Guo X, Qi L, Wang T (2021) A domain-guided noise-optimization-based inversion method for facial image manipulation. IEEE Trans Image Process 30:6198–6211

    Article  Google Scholar 

  67. Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive Growing of GANs for improved quality, stability, and variation. In: International conference on learning representations (ICLR), Vancouver

  68. Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of StyleGAN. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR), Seattle

  69. Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. http://arxiv.org/abs/ 1805.08318v2

  70. Brock A, Donahue J, Simonyan K (2019) Self-attention generative adversarial networks. In: International Conference on Learning Representations (ICLR), New Orleans

  71. Martin K, Marketing V (2021) What is voice cloning?, ID R&D, https://www.idrnd.ai/what-is-voice-cloning/. Accessed 24 July 2021

  72. Maheshwari H (2021) Basic text to speech, explained," towards data Science, https://towardsdatascience.com/text-to-speech-explained-from-basic-498119aa38b5. Accessed 11 July 2021

  73. Maheshwari H (2021) Text to speech system for multi-speaker setting, towards data science, https://towardsdatascience.com/text-to-speech-system-for-multi-speaker-setting-35e83f84e669. Accessed 12 July 2021

  74. Singh J (2018) WaveNet: google Assistant’s voice synthesizer, towardsdatascience, 7 November 2018. https://towardsdatascience.com/wavenet-google-assistants-voice-synthesizer-a168e9af13b1. Accessed 10 July 2021

  75. Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. In: Proceedings of the 9th ISCA Speech Synthesis Workshop, Sunnyvale

  76. Oord A, Li Y, Babuschkin I, Simonyan K, Vinyals O, Kavukcuoglu K, Driessche G, Lockhart E, Cobo L, Stimberg F, Casagrande N, Grewe D, Noury S, Dieleman S, Elsen E, Kalchbrenner N, Zen H, Graves A, King H, Walters T, Belov D, Hassabis D (2018) Parallel WaveNet: fast high-fidelity speech synthesis. In: Proceedings of the 35th international conference on machine learning, Stockholm

  77. Arık SO, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Li X, Miller J, Ng A, Raiman J, Sengupta S, Shoeybi M (2017) Deep voice: real-time neural text-to-speech. In: International conference on machine learning, Sydney

  78. Arık SÖ, Diamos G, Gibiansky A, Miller J, Peng K, Ping W, Raiman J, Zhou Y (2017) Deep voice 2: multi-speaker neural text-to-speech. In: Advances in neural information processing systems, Long Beach

  79. Ping W, Peng K, Gibiansky A, Arık SO, Kannan A, Narang S, Raiman J, Miller J (2018) Deep voice 3: scaling text-to-speech with convolutional sequence learning. In: International conference on learning representations (ICLR), Vancouver

  80. Wang Y, Skerry-Ryan R, Stanton D, Wu Y, Weiss RJ, Jaitly N, Yang Z, Xiao Y, Chen Z, Bengio S, Le Q, Agiomyrgiannakis Y, Clark R, Saurous RA (2017) Tacotron: towards end-to-end Speech Synthesis. http://arxiv.org/abs/ 1703.10135v2

  81. Zhang J-X, Ling Z-H, Liu L-J, Jiang Y, Dai L-R (2019) Sequence-to-sequence acoustic modeling for voice conversion. IEEE/ACM Trans Audio Speech Lang Process 27(3):631–644

    Article  Google Scholar 

  82. Veaux C, Yamagishi J, King S (2013) Towards personalized synthesized voices for individuals with vocal disabilities: voice banking and reconstruction. In: Speech and language processing for assistive technologies (SLPAT), Grenoble

  83. Sisman B, Yamagishi J, King S, Li H (2021) An overview of voice conversion and its challenges: from statistical modeling to deep learning. IEEE/ACM Trans Audio Speech Lang Process 29:132–157

    Article  Google Scholar 

  84. Zhang J-X, Ling Z-H, Dai L-R (2019) Non-parallel sequence-to-sequence voice conversion with disentangled linguistic and speaker representations. IEEE/ACM Trans Audio Speech Lang Process 28:540–552

    Article  Google Scholar 

  85. Wang R, Ding Y, Li L, Fan C (2020) One-shot voice conversion using Star-GAN. In: ICASSP 2020 - 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), Barcelona

  86. Liu R, Chen X, Wen X (2020) Voice conversion with transformer network. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), Barcelona

  87. Yasuda Y, Wang X, Takaki S, Yamagishi J (2019) Investigation of enhanced tacotron text-to-speech synthesis systems with self-attention for pitch accent language. In IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton

  88. Chen Y, Assael Y, Shillingford B, Budden D, Reed S, Zen H, Wang Q, Cobo LC, Trask A, Laurie B, Gulcehre C, Oord AVD, Vinyals O, Freitas ND (2019) Sample efficient adaptive text-to-speech. In: International Conference on Learning Representations (ICLR), New Orleans

  89. Liu R, Yang J, Liu M (2019) A new end-to-end long-time speech synthesis system based on Tacotron2. In: International conference proceeding series (ICPS), Beijing

  90. Weiss RJ, Skerry-Ryan R, Battenberg E, Mariooryad S, Kingma DP (2021) Wave-Tacotron: spectrogram-free end-to-end text-to-speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto

  91. He Q, Xiu Z, Koehler T, Wu J (2021) Multi-rate attention architecture for fast streamable text-to-speech spectrum modeling. In: 2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), Toronto

  92. Liu R, Sisman B, Gao G, Li H (2021) Expressive TTS training with frame and style reconstruction loss. IEEE/ACM Trans Audio Speech Lang Process 29:1806–1818

    Article  Google Scholar 

  93. Zhou X, Ling Z-H, Dai L-R (2021) UnitNet: a sequence-to-sequence acoustic model for concatenative speech synthesis. IEEE/ACM Trans Audio Speech Lang Process 29:2643–2655

    Article  Google Scholar 

  94. Tanaka K, Kameoka H, Kaneko T, Hojo N (2019) ATTS2S-VC: sequence-to-sequence voice conversion with attention and context preservation mechanisms. In: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton

  95. Kameoka H, Kaneko T, Tanaka K, Hojo N (2019) ACVAE-VC: non-parallel voice conversion with auxiliary classifier variational autoencoder. IEEE/ACM Trans Audio Speech Lang Process 27(9):1432–1443

    Article  Google Scholar 

  96. Cong J, Yang S, Xie L, Yu G, Wan G (2020) Data efficient voice cloning from noisy samples with domain adversarial training. In: Interspeech 2020, Shanghai

  97. Zhang M, Sisman B, Zhao L, Li H (2020) DeepConversion: voice conversion with limited parallel training data. Speech Commun 122:31–43

    Article  Google Scholar 

  98. Kameoka H, Tanaka K, Kwaśny D, Kaneko T, Hojo N (2020) ConvS2S-VC: fully convolutional sequence-to-sequence voice conversion. IEEE/ACM Trans Audio Speech Lang Process 28:1849–1863

    Article  Google Scholar 

  99. Ding S, Zhao G, Gutierrez-Osuna R (2020) Improving the speaker identity of non-parallel many-to-many voice conversion with adversarial speaker recognition. In: INTERSPEECH, Shanghai

  100. Lee S, Ko B, Lee K, Yoo I-C, Yook D (2020) Many-to-many voice conversion using conditional cycle-consistent adversarial networks. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona

  101. Zhang M, Zhou Y, Zhao L, Li H (2021) Transfer learning from speech synthesis to voice conversion with non-parallel training data. IEEE/ACM Trans Audio Speech Lang Process 29:1290–1302

    Article  Google Scholar 

  102. Chen M, Shi Y, Hain T (2021) Towards low-resource stargan voice conversion using weight adaptive instance normalization. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Toronto

  103. Li Z, Tang B, Yin X, Wan Y, Xu L, Shen C, Ma Z (2021) PPG-based singing voice conversion with adversarial representation learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto

  104. Kameoka H, Huang W-C, Tanaka K, Kaneko T, Hojo N, Toda T (2021) Many-to-many voice transformer network. IEEE/ACM Trans Audio Speech Lang Process 29:656–670

    Article  Google Scholar 

  105. Li H, Li B, Tana S, Huang J (2020) Identification of deep network generated images using disparities in color components. Signal Process 174:107616

    Article  Google Scholar 

  106. Chen P, Liu J, Liang T, Yu C, Zou S, Dai J, Han J (2021) DLFMNet: end-to-end detection and localization of face manipulation using multi-domain features. In: IEEE international conference on multimedia and expo (ICME), Shenzhen

  107. McCloskey S, Albright M (2018) Detecting GAN-generated imagery using color cues. http://arxiv.org/abs/ 1812.08247v1

  108. Yu N, Davis L, Fritz M (2019) Attributing fake images to GANs: learning and analyzing GAN fingerprints. In: IEEE/CVF international conference on computer vision (ICCV), Seoul

  109. Koopman M, Rodriguez AM, Geradts Z (2018) Detection of deepfake video manipulation. In: Irish machine vision and image processing conference (IMVIP), Belfast

  110. Li Y, Lyu S (2019) Exposing DeepFake videos by detecting face warping artifacts. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops, Long Beach

  111. Li L, Bao J, Zhang T, Yang H, Chen D, Wen F, Guo B (2020) Face X-ray for more general face forgery detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle

  112. Matern F, Riess C, Stamminger M (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In: IEEE winter applications of computer vision workshops (WACVW), Waikoloa

  113. Zhao Y, Ge W, Li W, Wang R, Zhao L, Ming J (2019) Capturing the persistence of facial expression features for deepfake video detection. In: International Conference on Information and Communications Security, Beijing

  114. Li X, Yu K, Ji S, Wang Y, Wu C, Xue H (2020) Fighting against deepfake: Patch&Pair convolutional neural networks (PPCNN). In: Companion Proceedings of the Web Conference 2020, New York

  115. Lee S, Tariq S, Shin Y, Woo SS (2021) Detecting handcrafted facial image manipulations and GAN-generated facial images using Shallow-FakeFaceNet. Appl Soft Comput 105:107256

    Article  Google Scholar 

  116. Shang Z, Xie H, Zha Z, Yu L, Li Y, Zhang Y (2021) PRRNet: Pixel-Region relation network for face forgery detection. Pattern Recognit 116:107950

    Article  Google Scholar 

  117. Agarwal S, Farid H, Fried O, Agrawala M (2020) Detecting deep-fake videos from phoneme-viseme mismatches. In: IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), Seattle

  118. Mittal T, Bhattacharya U, Chandra R, Bera A, Manocha D (2020) Emotions don't lie: an audio-visual deepfake detection method using affective cues. In: ACM international conference on multimedia, New York

  119. Chugh K, Gupta P, Dhall A, Subramanian R (2020) Not made for each other- audio-visual dissonance-based deepfake detection and localization. In: ACM international conference on multimedia, New York

  120. Hosier BC, Stamm MC (2020) Detecting video speed manipulation. In: IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), Seattle

  121. Amerini I, Galteri L, Caldelli R, Bimbo AD (2019) Deepfake video detection through optical flow based CNN. In: IEEE/CVF international conference on computer vision workshop (ICCVW), Seoul.

  122. Caldelli R, Galteri L, Amerini I, Bimbo AD (2021) Optical Flow based CNN for detection of unlearnt deepfake manipulations. Pattern Recognit Lett 146:31–37

    Article  Google Scholar 

  123. Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton

  124. Li Y, Chang M-C, Lyu S (2018) In Ictu Oculi: exposing AI created fake videos by detecting eye blinking. In: IEEE International Workshop on Information Forensics and Security (WIFS), Hong Kong

  125. Qi H, Guo Q, Juefei-Xu F, Xie2 X, Ma L, Feng W, Liu Y, Zhao J (2020) DeepRhythm: exposing DeepFakes with attentional visual heartbeat rhythms. In: ACM international conference on multimedia, New York

  126. Ciftci UA, Demir I, Yin L (2020) FakeCatcher: detection of synthetic portrait videos using biological signals, IEEE Trans Pattern Anal Mach Intell (Early Access)

  127. Hernandez-Ortega J, Tolosana R, Fierrez J, Morales A (2020) DeepFakesON-Phys: deepfakes detection based on heart rate estimation. http://arxiv.org/abs/2010.00400v3

  128. Yasrab R, Jiang W, Riaz A (2021) Fighting deepfakes using body language analysis. Forecast MDPI Open Access J 3(2):1–19

    Google Scholar 

  129. Khalid H, Woo SS (2020) OC-FakeDect: classifying deepfakes using one-class variational Autoencoder. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle

  130. Xuan X, Peng B, Wang W, Dong J (2019) On the generalization of GAN image forensics. In: Chinese conference on biometric recognition, Zhuzhou

  131. Zhou P, Han X, Morariu VI, Davis LS (2017) Two-stream neural networks for tampered face detection. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), Honululu

  132. Jeon H, Bang Y, Woo SS (2019) FakeTalkerDetect: effective and practical realistic neural talking head detection with a highly unbalanced dataset. In: IEEE/CVF international conference on computer vision workshop (ICCVW), Seoul

  133. Wu X, Xie Z, Gao Y, Xiao Y (2020) SSTNet: detecting manipulated faces through spatial, steganalysis and temporal features. In: IEEE International conference on acoustics, speech and signal processing (ICASSP), Barcelona

  134. Tariq S, Lee S, Kim H, Shin Y, Woo SS (2019) GAN is a friend or foe? A framework to detect various fake face images. In: Proceedings of the 34th ACM/SIGAPP symposium on applied computing, Cyprus

  135. Sohrawardi SJ, Chintha A, Thai B, Seng S, Hickerson A, Ptucha R, Wright MK (2019) Poster: towards robust open-world detection of deepfakes. In: ACM SIGSAC conference on computer and communications security, London

  136. Fernando T, Fookes C, Denman S, Sridharan S (2019) Exploiting human social cognition for the detection of fake and fradulent faces via memory networks. http://arxiv.org/abs/1911.07844v1

  137. Sun X, Wu B, Chen W (2020) Identifying invariant texture violation for robust deepfake detection. http://arxiv.org/abs/2012.10580v1

  138. Ding X, Raziei Z, Larson EC, Olinick EV, Krueger P, Hahsler M (2020) Swapped face detection using deep learning and subjective assessment. EURASIP J Inf Secur, vol. 6

  139. Kumar A, Bhavsar, A, Verma R (2020) Detecting deepfakes with metric learning. In: International Workshop on Biometrics and Forensics (IWBF), Porto

  140. .Rana MS, Sung AH (2020) DeepfakeStack: a deep ensemble-based learning technique for deepfake detection. In: IEEE international conference on cyber security and cloud computing, New York

  141. Zhou X, Wang Y, Wu P (2020) Detecting deepfake videos via frame serialization learning. In: IEEE 3rd International Conference of Safe Production and Informatization (IICSPI), Chongqing City

  142. Nguyen XH, Tran TS, Le VT, Nguyen KD, Truong D-T (2021) Learning Spatio-temporal features to detect manipulated facial videos created by the Deepfake techniques. Forensic Sci Int Digital Investig 36:301108

    Article  Google Scholar 

  143. Xu Z, Liu J, Lu W, Xu B, Zhao X, Li B, Huang J (2021) Detecting facial manipulated videos based on set convolutional neural networks. J Vis Commun Image Represent 77:103119

    Article  Google Scholar 

  144. Chen Z, Yang H (2021) Attentive semantic exploring for manipulated face detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto

  145. Zhang J, Ni J, Xie H (2021) DeepFake videos detection using self-supervised decoupling network. In: IEEE International Conference on Multimedia and Expo (ICME), Shenzhen

  146. Gu Z, Chen Y, Yao T, Ding S, Li J, Huang F, Ma L (2021) Spatiotemporal inconsistency learning for deepfake video detection. In: Proceedings of the 29th ACM international conference on multimedia, New York

  147. Tu Y, Liu Y, Li X (2021) Deepfake video detection by using convolutional gated recurrent unit. In: International conference on machine learning and computing, Shenzhen

  148. Zhuang Y-X, Hsu C-C (2019) Detecting generated image based on a coupled network with two-step pairwise learning. In: IEEE international conference on image processing (ICIP), Taipei

  149. Lima OD, Franklin S, Basu S, Karwoski B, George A (2020) Deepfake detection using spatiotemporal convolutional networks. http://arxiv.org/abs/2006.14749v1

  150. Lang Y, Li X, Chen Y, Mao X, He Y, Wang S, Xue H, Lu Q (2020) Sharp multiple instance learning for deepfake video detection. In: Proceedings of the 28th ACM international conference on multimedia, Seattle WA

  151. Chen B, Ju X, Xiao B, Ding W, Zheng Y, Albuquerque VHCD (2021) Locally GAN-generated face detection based on an improved Xception. Inf Sci 572:16–28

    Article  Google Scholar 

  152. Chen H-S, Rouhsedaghat M, Ghani H, Hu S, You S, Kuo C-CJ (2021) DefakeHop: a light-weight high-performance deepfake detector. In: IEEE International Conference on Multimedia and Expo (ICME), Shenzhen

  153. Das S, Seferbekov S, Datta A, Islam MS, Amin MR (2021) Towards solving the deepfake problem : an analysis on improving deepfake detection using dynamic face augmentation. In: IEEE/CVF international conference on computer vision workshops (ICCVW), Montreal

  154. Nguyen HH, Fang F, Yamagishi J, Echizen I (2019) Multi-task learning for detecting and segmenting manipulated facial images and videos. In: IEEE 10th international conference on biometrics theory, applications and systems (BTAS), Tampa

  155. Du M, Pentyala SK, Li Y, Hu X (2020) Towards generalizable deepfake detection with locality-aware autoencoder. In: ACM international conference on information & knowledge management, Virtual Event Ireland

  156. He P, Li H, Wang H (2019) Detection of fake images via the ensemble of deep representations from multi color spaces. In: IEEE International conference on image processing (ICIP), Taipei

  157. Guo Z, Yang G, Chen J, Sun X (2021) Fake face detection via adaptive manipulation traces extraction network. Comput Vis Image Underst 204:103170

    Article  Google Scholar 

  158. Wang R, Juefei-Xu F, Ma L, Xie X, Huang Y, Wang J, Liu Y (2020) FakeSpotter: a simple yet robust baseline for spotting AI-synthesized fake faces. In: International joint conference on artificial intelligence (IJCAI), Yokohama

  159. Khan SA, Dai H (2021) Video transformer for deepfake detection with incremental learning. In: Proceedings of the 29th ACM international conference on multimedia, New York

  160. Frank J, Eisenhofer T, Schonherr L, Fischer A, Kolossa D, Holz T (2020) Leveraging frequency analysis for deep fake image recognition. Proc of Mach Learn 119:3247–3258

    Google Scholar 

  161. Durall R, Keuper M, Pfreundt F-J, Keuper J (2020) Unmasking deepfakes with simple feature. http://arxiv.org/abs/1911.00686v3

  162. Masi I, Killekar A, Mascarenha RM, Gurudatt SP, AbdAlmageed W (2020) Two-branch recurrent network for isolating deepfakes in videos. In: European conference on computer vision, Glasgow

  163. McCloskey S, Albright M (2019) Detecting GAN-generated imagery using saturation cues. In: IEEE International conference on image processing (ICIP), Taipei

  164. Guarnera L, Giudice O, Battiato S (2020) DeepFake detection by analyzing convolutional traces. In: IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), Seattle

  165. Wang S-Y, Wang O, Zhang R, Owens A, Efros AA (2020) CNN-generated images are surprisingly easy to spot... for now. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle

  166. Lugstein F, Baier S, Bachinger G, Uhl A (2021) PRNU-based deepfake detection. In: Proceedings of the 2021 ACM workshop on information hiding and multimedia security

  167. Nirkin Y, Wolf L, Keller Y, Hassner T (2020) DeepFake detection based on discrepancies between faces and their context. http://arxiv.org/abs/2008.12262v1

  168. Yang J, Xiao S, Li A, Lan G, Wang H (2021) Detecting fake images by identifying potential texture difference. Futur Gener Comput Syst 125:127–135

    Article  Google Scholar 

  169. Li G, Cao Y, Zhao X (2021) Exploiting facial symmetry to expose deepfakes. In: IEEE international conference on image processing (ICIP), Anchorage

  170. Luo Z, Kamata S-I, Sun Z (2021) Transformer and node-compressed dnn based dual-path system for manipulated face detection. In: IEEE international conference on image processing (ICIP), Anchorage

  171. Yang J, Xiao S, Li A, Lu W, Gao X, Li Y (2021) MSTA-net: forgery detection by generating manipulation trace based on multi-scale self-texture attention. IEEE Trans Circuits Syst Video Technol ( Early Access ), pp. 1–1

  172. Bonomi M, Pasquini C, Boato G (2021) Dynamic texture analysis for detecting fake faces in video sequences. J Vis Commun Image Represent 79:103239

    Article  Google Scholar 

  173. Yang J, Li A, Xiao S, Lu W, Gao X (2021) MTD-Net: learning to detect deepfakes images by multi-scale texture difference. IEEE Trans Inf Forensics Secur 16:4234–4245

    Article  Google Scholar 

  174. Gu Y, He M, Nagano K, Li H (2019) Protecting world leaders against deep fakes. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR) Workshops, Long Beach

  175. Yang C-Z, Ma J, Wang S-L, Liew AW-C (2020) Preventing deepfake attacks on speaker authentication by dynamic lip movement analysis. IEEE Trans Inf Forensics Secur 16:1841–1854

    Article  Google Scholar 

  176. Hosler B, Salvi D, Murray A, Antonacci F, Bestagini P, Tubaro S, Stamm MC (2021) Do deepfakes feel emotions? A semantic approach to detecting deepfakes via emotional inconsistencies. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville

  177. Demir İ, Ciftci UA (2021) Where do deep fakes look? Synthetic face detection via gaze. In ACM symposium on eye tracking research and applications, Germany

  178. Hu S, Li Y, Lyu S (2021) Exposing GAN-generated faces using inconsistent corneal specular highlights. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Toronto

  179. Agarwal S, Farid H (2021) Detecting deep-fake videos from aural and oral dynamics. In: IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), Nashville

  180. Sabir E, Cheng J, Jaiswal A, AbdAlmageed W, Masi I, Natarajan P (2019) Recurrent-convolution approach to deepfake detection – state-of-art results on FaceForensics++. http://arxiv.org/abs/1905.00582v1

  181. Sabir E, Cheng J, Jaiswal A, AbdAlmageed W, Masi I, Natarajan P (2019) Recurrent convolutional strategies for face manipulation detection in videos. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach

  182. Amerini I, Caldelli R (2020) Exploiting prediction error inconsistencies through LSTM-based classifiers to detect deepfake videos. In: ACM workshop on information hiding and multimedia security, New York

  183. Lu C, Liu B, Zhou W, Chu Q, Yu N (2021) Deepfake video detection using 3D-attentional inception convolutional neural network. In: IEEE international conference on image processing (ICIP), Anchorage

  184. Trinh L, Tsang M, Rambhatla S, Liu Y (2021) Interpretable and trustworthy deepfake detection via dynamic prototypes. In: IEEE winter conference on applications of computer vision (WACV), Hawaii

  185. Cozzolino D, Thies J, Rossler A, Riess C, Nießner M, Verdoliva L (2019) ForensicTransfer: weakly-supervised domain adaptation for forgery detection. http://arxiv.org/abs/1812.s02510v2

  186. Hsu C-C, Zhuang Y-X, Lee C-Y (2019) Deep fake image detection based on pairwise learning. Appl Sci 10(1):370

    Article  Google Scholar 

  187. Dang LM, Hassan SI, Im S, Moon H (2019) Face image manipulation detection based on a convolutional neural network. Expert Syst Appl 129:156–168

    Article  Google Scholar 

  188. Nguyen HH, Yamagishi J, Echizen I (2019) Capsule-forensics: using capsule networks to detect forged images and videos. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton

  189. Montserrat DM, Hao H, Yarlagadda SK, Baireddy S, Shao R, Horváth J, Bartusiak E, Yang J, Güera D, Zhu F, Delp EJ (2020) Deepfakes detection with automatic face weighting. In: IEEE/CVF conference on computer vision and pattern recognition workshops, Seattle

  190. Choi DH, Lee HJ, Lee S, Kim JU, Ro YM (2020) Fake video detection with certainty-based attention network. In: IEEE international conference on image processing (ICIP), Abu Dhabi

  191. Chintha A, Thai B, Sohrawardi SJ, Bhatt K, Hickerson A, Wright M, Ptucha R (2020) Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE J Sel Top Signal Process 14(5):1024–1037

    Article  Google Scholar 

  192. Hu J, Wang S, Li X (2021) Improving the generalization ability of deepfake detection via disentangled representation learning. In: IEEE international conference on image processing (ICIP), Anchorage

  193. Hu J, Liao X, Wang W, Qin Z (2021) Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Trans Circuits Syst Video Technol (Early Acces) 32(3):1089–1102

    Article  Google Scholar 

  194. Han B, Han X, Zhang H, Li J, Cao X (2021) Fighting fake news: two stream network for deepfake detection via learnable SRM. IEEE Trans Biometrics Behav Ident Sci 3(3):320–331

    Article  Google Scholar 

  195. Kim M, Tariq S, Woo SS (2021) FReTAL: generalizing deepfake detection using knowledge distillation and representation learning. In: IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), Nashville

  196. Zhao H, Wei T, Zhou W, Zhang W, Chen D, Yu N (2021) Multi-attentional deepfake detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville

  197. Sun Z, Han Y, Hua Z, Ruan N, Jia W (2021) Improving the efficiency and robustness of deepfakes detection through precise geometric features. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville

  198. Tariq S, Lee S, Woo SS (2021) One detector to rule them all. In: Proceedings of the web conference 2021, New York

  199. Wang R, Juefei-Xu F, Huang Y, Guo Q, Xie X, Ma L, Liu Y (2020) DeepSonar: towards effective and robust detection of AI-synthesized fake voices. In: Proceedings of the 28th ACM international conference on multimedia, Seattle

  200. Balamurli B, Lin KE, Lui S, Chen J-M, Herremans D (2019) Toward robust audio spoofing detection: a detailed comparison of traditional and learned features. In: IEEE Access

  201. Saranya MS, Padmanabhan R, Murthy HA (2018) Replay attack detection in speaker verification using non-voiced segments and decision level feature switching. In: International conference on signal processing and communications (SPCOM), Bangalore

  202. Witkowski M, Kacprzak S, Zelasko P, Kowalczyk K, Gałka J (2017) Audio replay attack detection using high-frequency features. In: INTERSPEECH, Stockholm

  203. AlBadawy EA, Lyu S, Farid H (2019) Detecting AI-synthesized speech using bispectral analysis. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR), Long Beach

  204. Patil HA, Kamble MR (2018) A survey on replay attack detection for automatic speaker verification (ASV) system. In: Proceedings of the APSIPA Annual Summit and Conference 2018, Hawai

  205. Wijethunga R, Matheesha D, Noman AA, Silva KD, Tissera M, Rupasinghe L (2020) Deepfake audio detection: a deep learning based solution for group conversations. In: International conference on advancements in computing (ICAC), Malabe

  206. Chen T, Kumar A, Nagarsheth P, Sivaraman G, Khoury E (2020) Generalization of audio deepfake detection. In: Odyssey 2020 the speaker and language recognition workshop, Tokyo

  207. Shim H-J, Jung J-W, Heo H-S, Yoon S-H, Yu H-J (2018) Replay spoofing detection system for automatic speaker verification using multi-task learning of noise classes. In: Conference on technologies and applications of artificial intelligence (TAAI), Taichung

  208. Yang J, Das RK (2020) Long-term high frequency features for synthetic speech detection. Digital Signal Process 97:102622

    Article  Google Scholar 

  209. Malik H (2019) Securing voice-driven interfaces against Fake (Cloned) Audio Attacks. In: IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose

  210. Gunendradasan T, Irtza S, Ambikairajah E, Epps J (2019) Transmission line cochlear model based AM-FM features for replay attack detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton

  211. Borrelli C, Bestagini P, Antonacci F, Sarti A, Tubaro S (2021) Synthetic speech detection through short-term and long-term prediction traces. EURASIP J Inf Secur, 2

  212. Lai C-I, Abad A, Richmond K, Yamagishi J, Dehak N, King S (2019) Attentive filtering networks for audio replay attack detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton

  213. Huang L, Pun C-M (2019) Audio replay spoof attack detection using segment-based hybrid feature and DenseNet-LSTM network. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton

  214. Gomez-Alanis A, Peinado AM, Gonzalez JA, Gomez AM (2019) A light convolutional GRU-RNN deep feature extractor for ASV spoofing detection. In: INTERSPEECH, Graz

  215. Gomez-Alanis A, Peinado AM, Gonzalez JA, Gomez AM (2021) A gated recurrent convolutional neural network for robust spoofing detection. IEEE/ACM Trans Audio Speech Lang Process 27(12):1985–1999

    Article  Google Scholar 

  216. Huang L, Pun C-M (2020) Audio replay spoof attack detection by joint segment-based linear filter bank feature extraction and attention-enhanced DenseNet-BiLSTM Network. IEEE/ACM Trans Audio Speech Lang Process 28:1813–1825

    Article  Google Scholar 

  217. Wu Z, Das RK, Yang J, Li H (2020) Light convolutional neural network with feature genuinization for detection of synthetic speech attacks. In: INTERSPEECH, Shanghai

  218. Wang Z, Cui S, Kang X, Sun W, Li Z (2021) Densely connected convolutional network for audio spoofing detection. In: Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), Auckland

  219. You CH, Yang J (2020) Device feature extraction based on parallel neural network training for replay spoofing detection. IEEE/ACM Trans Audio Speech Lang Process 28:2308–2318

    Article  Google Scholar 

  220. Luo A, Li E, Liu Y, Kang X, Wang ZJ (2021) A capsule network based approach for detection of audio spoofing attacks. In: IEEE International conference on acoustics, speech and signal processing (ICASSP), Toronto

  221. Ren Y, Liu W, Liu D, Wang L (2021) Recalibrated bandpass filtering on temporal waveform for audio spoof detection. In: IEEE International conference on image processing (ICIP), Anchorage

  222. Huang L, Zhao J (2021) Audio replay spoofing attack detection using deep learning feature and long-short-term memory recurrent neural network. In: The second international conference on artificial intelligence, information processing and cloud computing, Hangzhou

  223. Ouyang M, Das RK, Yang J, Li H (2021) Capsule network based end-to-end system for detection of replay attacks. In: International symposium on chinese spoken language processing (ISCSLP), Hong Kong

  224. Li X, Li N, Weng C, Liu X, Su2 D, Yu D, Meng H (2021) Replay and synthetic speech detection with Res2Net architecture. In: IEEE International conference on acoustics, speech and signal processing (ICASSP), Toronto

  225. Li Y, Yang X, Sun P, Qi H, Lyu S (2020) Celeb-DF: a large-scale challenging dataset for deepfake forensics. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle

  226. Dolhansky B, Bitton J, Pflaum B, Lu J, Howes R, Wang M, Ferrer C C (2020) The deepfake detection challenge (DFDC) dataset. In: http://arxiv.org/2006.07397v4

  227. Korshunov P, Marcel S (2018) DeepFakes: a new threat to face recognition? Assessment and Detection. In: http://arxiv.org/1812.08685v1

  228. Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2018) FaceForensics: a large-scale video dataset for forgery detection in human faces. http://arxiv.org/1803.09179v1

  229. Khodabakhsh A, Ramachandra R, Raja K, Wasnik P, Busch C (2018) Fake face detection methods: can they be generalized? In: International Conference of the Biometrics Special Interest Group (BIOSIG), Darmstadt

  230. Dolhansky B, Howes R, Pflaum B, Baram N, Ferrer CC (2019) The deepfake detection challenge (DFDC) preview dataset. In: http://arxiv.org/1910.08854v2

  231. Contributing Data to Deepfake Detection Research, (2019). https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html

  232. Jiang L, Li R, Wu W, Qian C, Loy CC (2020) DeeperForensics-1.0: a large-scale dataset for real-world face forgery detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle

  233. Zi B, Chang M, Chen J, Ma X, Jiang Y-G (2020) WildDeepfake: a challenging real-world dataset for deepfake detection. In: Proceedings of the 28th ACM international conference on multimedia, Seattle

  234. Dong X, Bao J, Chen D, Zhang W, Yu N, Chen D, Wen F, Guo B (2020) Identity-driven deepfake detection. In http://arxiv.org/2012.03930v1

  235. Huang J, Wang X, Du B, Du P, Xu C (2021) DeepFake MNIST+: a DeepFake facial animation dataset. In: IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal

  236. Kominek J, Black AW (2004) The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis

  237. Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an ASR corpus based on public domain audio books. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane

  238. Wu Z, Kinnunen T, Evans N, Yamagishi J, Hanilc C¸ Sahidullah IM, Sizov A (2015) ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: InterSpeech, Dresden

  239. Ito K, Johnson L (2017) The LJ speech dataset, LibriVox project. https://keithito.com/LJ-Speech-Dataset/. Accessed 28 July 2021

  240. Delgado H, Todisco1 M, Sahidullah M, Evans N, Kinnunen T, Lee KA, Yamagishi J (2018) ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements. In: Odyssey 2018—the speaker and language recognition workshop, Les Sables

  241. Chung JS, Nagrani A, Zisserman A (2018) VoxCeleb2: Deep speaker recognition. In: INTERSPEECH, Hyderabad

  242. Veaux C, Yamagishi J, MacDonald K (2019) CSTR VCTK Corpus: English multi-speaker Corpus for CSTR voice cloning toolkit. The Centre for Speech Technology Research (CSTR), University of Edinburgh

  243. Reimao R, Tzerpos V (2019) FoR: a dataset for synthetic speech detection. In: International conference on speech technology and human-computer dialogue (SpeD), Timisoara

  244. Nagrani A, Chung JS, Xie W, Zisserman A (2020) Voxceleb: Large-scale speaker verification in the wild. Comput Speech Lang 60:101027S

    Article  Google Scholar 

  245. GMAIL. The M-AILABS Speech dataset, Caito, https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/. Accessed 28 July 2021].

  246. Wang X, Yamagishi J, Todisco M, Delgado H, Nautsch A, Evans N, Sahidullah M, Vestman V, Kinnunen T, Lee KA, Juvela L, Alku P, Peng Y-H, Hwang H-T, Tsao Y, Wang H-M, Maguer SL, Becker M, Henderson F, Clark R, Zhang Y, Wang Q, Jia Y, Onuma K, Mushika K, Kaneda T, Jiang Y, Liu L-J, Wu Y-C, Huang W-C, Toda T, Tanaka K, Kameoka H, Steiner I, Matrouf D, Bonastre J-F, Govender A, Ronanki S, Zhang J-X, Ling Z-H (2020) ASVspoof 2019: a large-scale public database of synthesized, converted and replayed speech. Comput Speech Lang 64:101114

    Article  Google Scholar 

  247. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, Montreal

  248. DepFaceLab, GitHub, [Online]. Available: https://github.com/iperov/DeepFaceLab. Accessed 6 April 2021

  249. Deepfakes web, [Online]. Available: https://deepfakesweb.com/. Accessed 6 April 2021

  250. FaceApp, [Online]. Available: https://www.faceapp.com/. Accessed 1 April 2021

  251. Zao, [Online]. Available: https://zaodownload.com/. Accessed 6 April 2021

  252. MachineTube, [Online]. Available: https://www.machine.tube/. Accessed 6 April 2021

  253. Doublicat, [Online]. Available: https://reface.app/about/. Accessed 7 April 2021

  254. Resemble AI, [Online]. Available: https://www.resemble.ai/. Accessed 28 08 2021

  255. Rudrabha/Wav2Lip, github, [Online]. Available: https://github.com/Rudrabha/Wav2Lip

  256. Thies J, Zollhöfer M, Theobalt C, Stamminger M, Nießner M (2018) Headon: real-time reenactment of human portrait videos. ACM Trans Gr 37(4):1–13

    Article  Google Scholar 

  257. Nguyen TT, Nguyen CM, Nguyen DT, Nguyen DT, Nahavandi S (2019) Deep learning for deepfakes creation and detection http://arxiv.org/1909.11573v1

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dinesh Kumar Vishwakarma.

Ethics declarations

Conflict of interest

There is no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dagar, D., Vishwakarma, D.K. A literature review and perspectives in deepfakes: generation, detection, and applications. Int J Multimed Info Retr 11, 219–289 (2022). https://doi.org/10.1007/s13735-022-00241-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-022-00241-w

Keywords

Navigation