Skip to main content

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields. However, the academic community still lacks a video dataset with diverse facial attribute annotations, which is crucial for the research on face-related videos. In this work, we propose a large-scale, high-quality, and diverse video dataset with rich facial attribute annotations, named the High-Quality Celebrity Video Dataset (CelebV-HQ). CelebV-HQ contains 35, 666 video clips with the resolution of \(512\times 512\) at least, involving 15, 653 identities. All clips are labeled manually with 83 facial attributes, covering appearance, action, and emotion. We conduct a comprehensive analysis in terms of age, ethnicity, brightness stability, motion smoothness, head pose diversity, and data quality to demonstrate the diversity and temporal coherence of CelebV-HQ. Besides, its versatility and potential are validated on two representative tasks, i.e., unconditional video generation and video facial attribute editing. We finally envision the future potential of CelebV-HQ, as well as the new opportunities and challenges it would bring to related research directions. Data, code, and models are publicly available (Project page: https://celebv-hq.github.io/ Code and models: https://github.com/CelebV-HQ/CelebV-HQ).

H. Zhu and W. Wu—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bezryadin, S., Bourov, P., Ilinih, D.: Brightness calculation in digital image processing. In: TDPF (2007)

    Google Scholar 

  2. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR (2018)

    Google Scholar 

  3. Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)

    Google Scholar 

  4. Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR (2021)

    Google Scholar 

  5. Chen, L., Maddox, R.K., Duan, Z., Xu, C.: Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In: CVPR (2019)

    Google Scholar 

  6. Chen, Y., Wu, Q., Zheng, C., Cham, T.J., Cai, J.: Sem2NeRF: converting single-view semantic masks to neural radiance fields. In: ECCV (2022)

    Google Scholar 

  7. Cheng, W., et al.: Generalizable neural performer: Learning robust radiance fields for human novel view synthesis. arXiv preprint arxiv:2204.11798 (2022)

  8. Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: CVPR (2020)

    Google Scholar 

  9. Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: INTERSPEECH (2018)

    Google Scholar 

  10. Da Xu, L., He, W., Li, S.: Internet of things in industries: a survey. IEEE TII 10, 2233–2243 (2014)

    Google Scholar 

  11. Ding, H., Zhou, H., Zhou, S., Chellappa, R.: A deep cascade network for unaligned face attribute classification. In: AAAI (2018)

    Google Scholar 

  12. Dzedzickis, A., Kaklauskas, A., Bucinskas, V.: Human emotion recognition: review of sensors and methods. Sensors 20, 592 (2020)

    Article  Google Scholar 

  13. Ephrat, A., et al.: Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. ACM TOG 37, 1–11 (2018)

    Article  Google Scholar 

  14. Gafni, G., Thies, J., Zollhöfer, M., Nießner, M.: Dynamic neural radiance fields for monocular 4D facial avatar reconstruction. In: CVPR (2021)

    Google Scholar 

  15. Gao, G., Huang, H., Fu, C., Li, Z., He, R.: Information bottleneck disentanglement for identity swapping. In: CVPR (2021)

    Google Scholar 

  16. Gao, R., Grauman, K.: VisualVoice: audio-visual speech separation with cross-modal consistency. In: CVPR (2021)

    Google Scholar 

  17. Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)

    Google Scholar 

  18. Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3D aware generator for high-resolution image synthesis. In: ICLR (2021)

    Google Scholar 

  19. Guo, Y., Chen, K., Liang, S., Liu, Y., Bao, H., Zhang, J.: AD-NeRF: audio driven neural radiance fields for talking head synthesis. In: ICCV (2021)

    Google Scholar 

  20. Haliassos, A., Vougioukas, K., Petridis, S., Pantic, M.: Lips don’t lie: a generalisable and robust approach to face forgery detection. In: CVPR (2021)

    Google Scholar 

  21. Han, H., Jain, A.K., Wang, F., Shan, S., Chen, X.: Heterogeneous face attribute estimation: a deep multi-task learning approach. IEEE TPAMI 40, 2597–2609 (2017)

    Article  Google Scholar 

  22. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS (2017)

    Google Scholar 

  23. Hong, W., Ding, M., Zheng, W., Liu, X., Tang, J.: CogVideo: large-scale pretraining for text-to-video generation via transformers. arXiv preprint arXiv:2205.15868 (2022)

  24. Hong, Y., Peng, B., Xiao, H., Liu, L., Zhang, J.: HeadNeRF: a real-time nerf-based parametric head model. In: CVPR (2022)

    Google Scholar 

  25. Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: ECCV Workshop (2008)

    Google Scholar 

  26. Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: ECCV (2018)

    Google Scholar 

  27. Hui, T.W., Loy, C.C.: LiteFlowNet3: resolving correspondence ambiguity for more accurate optical flow estimation. In: ECCV (2020)

    Google Scholar 

  28. Inc., S.: Snapchat. In: https://www.snapchat.com/ (2022)

  29. Jegham, I., Khalifa, A.B., Alouani, I., Mahjoub, M.A.: Vision-based human action recognition: an overview and real world challenges. Forensic Sci. Int.: Digit. Invest. 32, 200901 (2020)

    Google Scholar 

  30. Ji, X., et al.: Audio-driven emotional video portraits. In: CVPR (2021)

    Google Scholar 

  31. Jiang, L., Dai, B., Wu, W., Loy, C.C.: Deceive D: adaptive pseudo augmentation for GAN training with limited data. In: NeurIPS (2021)

    Google Scholar 

  32. Jiang, Y., Huang, Z., Pan, X., Loy, C.C., Liu, Z.: Talk-to-edit: fine-grained facial editing via dialog. In: ICCV (2021)

    Google Scholar 

  33. Karkkainen, K., Joo, J.: FairFace: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In: WACV (2021)

    Google Scholar 

  34. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: ICLR (2018)

    Google Scholar 

  35. Karras, T., et al.: Alias-free generative adversarial networks. In: NeurIPS (2021)

    Google Scholar 

  36. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)

    Google Scholar 

  37. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of styleGAN. In: CVPR (2020)

    Google Scholar 

  38. Lee, C.H., Liu, Z., Wu, L., Luo, P.: MaskGAN: towards diverse and interactive facial image manipulation. In: CVPR (2020)

    Google Scholar 

  39. Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: ICCV (2019)

    Google Scholar 

  40. Li, D., Jiang, T., Jiang, M.: Quality assessment of in-the-wild videos. In: ACM MM (2019)

    Google Scholar 

  41. Li, L., Bao, J., Yang, H., Chen, D., Wen, F.: FaceShifter: towards high fidelity and occlusion aware face swapping. arXiv preprint arxiv:1912.13457 (2019)

  42. Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., Guo, B.: Face X-Ray for more general face forgery detection. In: CVPR (2020)

    Google Scholar 

  43. Li, Y., Min, M., Shen, D., Carlson, D., Carin, L.: Video generation from text. In: AAAI (2018)

    Google Scholar 

  44. Liang, B., et al.: Expressive talking head generation with granular audio-visual control. In: CVPR (2022)

    Google Scholar 

  45. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)

    Google Scholar 

  46. Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (Ravdess): a dynamic, multimodal set of facial and vocal expressions in north American english. PLoS ONE 13, e0196391 (2018)

    Article  Google Scholar 

  47. Ltd., F.T.: Faceapp. In: https://www.faceapp.com/ (2022)

  48. Ltd., T.P.: Tiktok. In: https://www.tiktok.com (2022)

  49. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE TIP 21, 4695–4708 (2012)

    MathSciNet  MATH  Google Scholar 

  50. Munro, J., Damen, D.: Multi-modal domain adaptation for fine-grained action recognition. In: CVPR (2020)

    Google Scholar 

  51. Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: INTERSPEECH (2017)

    Google Scholar 

  52. Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3D-consistent image and geometry generation. In: CVPR (2022)

    Google Scholar 

  53. Peng, S., et al.: Animatable neural radiance fields for modeling dynamic human bodies. In: ICCV (2021)

    Google Scholar 

  54. Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: neural radiance fields for dynamic scenes. In: CVPR (2021)

    Google Scholar 

  55. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arxiv:1511.06434 (2015)

  56. Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: FaceForensics: a large-scale video dataset for forgery detection in human faces. arXiv preprint arxiv:1803.09179 (2018)

  57. Saito, M., Matsumoto, E., Saito, S.: Temporal generative adversarial nets with singular value clipping. In: ICCV (2017)

    Google Scholar 

  58. Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., Madry, A.: Adversarially robust generalization requires more data. In: NeurIPS (2018)

    Google Scholar 

  59. Serengil, S.I., Ozpinar, A.: Hyperextended lightface: a facial attribute analysis framework. In: ICEET (2021)

    Google Scholar 

  60. Shen, W., Liu, R.: Learning residual images for face attribute manipulation. In: CVPR (2017)

    Google Scholar 

  61. Shen, Y., Yang, C., Tang, X., Zhou, B.: InterfaceGAN: interpreting the disentangled face representation learned by GANs. IEEE TPAMI 44(4), 2004–2018 (2022)

    Article  Google Scholar 

  62. Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., Sebe, N.: First order motion model for image animation. In: NeurIPS (2019)

    Google Scholar 

  63. Skorokhodov, I., Tulyakov, S., Elhoseiny, M.: StyleGAN-v: a continuous video generator with the price, image quality and perks of styleGAN2. In: CVPR (2022)

    Google Scholar 

  64. Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: NeurIPS (2014)

    Google Scholar 

  65. Tian, Y., et al.: A good image generator is what you need for high-resolution video synthesis. In: ICLR (2020)

    Google Scholar 

  66. Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for styleGAN image manipulation. ACM TOG 40(4), 1–14 (2021)

    Article  Google Scholar 

  67. Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for video generation. In: CVPR (2018)

    Google Scholar 

  68. Tzaban, R., Mokady, R., Gal, R., Bermano, A.H., Cohen-Or, D.: Stitch it in time: GAN-based facial editing of real videos. arXiv preprint arxiv:2201.08361 (2022)

  69. Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michalski, M., Gelly, S.: Towards accurate generative models of video: a new metric & challenges. arXiv preprint arxiv:1812.01717 (2018)

  70. Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: NeurIPS (2016)

    Google Scholar 

  71. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)

    Google Scholar 

  72. Wang, K., et al.: Mead: a large-scale audio-visual dataset for emotional talking-face generation. In: ECCV (2020)

    Google Scholar 

  73. Wang, T.C., Mallya, A., Liu, M.Y.: One-shot free-view neural talking-head synthesis for video conferencing. In: CVPR (2021)

    Google Scholar 

  74. Wu, C., et al.: N\(\backslash \)" uwa: visual synthesis pre-training for neural visual world creation. arXiv preprint arXiv:2111.12417 (2021)

  75. Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: a boundary-aware face alignment algorithm. In: CVPR (2018)

    Google Scholar 

  76. Wu, W., Zhang, Y., Li, C., Qian, C., Loy, C.C.: ReenactGAN: learning to reenact faces via boundary transfer. In: ECCV (2018)

    Google Scholar 

  77. Xu, Y., et al.: Transeditor: transformer-based dual-space GAN for highly controllable facial editing. In: CVPR (2022)

    Google Scholar 

  78. Yan, W., Zhang, Y., Abbeel, P., Srinivas, A.: VideoGPT: video generation using VQ-VAE and transformers. arXiv preprint arxiv:2104.10157 (2021)

  79. Yao, X., Newson, A., Gousseau, Y., Hellier, P.: A latent transformer for disentangled face editing in images and videos. In: ICCV (2021)

    Google Scholar 

  80. Yao, X., Newson, A., Gousseau, Y., Hellier, P.: A latent transformer for disentangled face editing in images and videos. In: ICCV (2021)

    Google Scholar 

  81. Yu, S., et al.: Generating videos with dynamics-aware implicit generative adversarial networks. In: ICLR (2021)

    Google Scholar 

  82. Zakharov, E., Ivakhnenko, A., Shysheya, A., Lempitsky, V.: Fast Bi-layer neural synthesis of one-shot realistic head avatars. In: ECCV (2020)

    Google Scholar 

  83. Zhang, J., Yin, Z., Chen, P., Nichele, S.: Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Inf. Fusion 59, 103–126 (2020)

    Article  Google Scholar 

  84. Zhong, Y., Sullivan, J., Li, H.: Face attribute prediction using off-the-shelf CNN features. In: ICB (2016)

    Google Scholar 

  85. Zhou, H., Liu, Y., Liu, Z., Luo, P., Wang, X.: Talking face generation by adversarially disentangled audio-visual representation. In: AAAI (2019)

    Google Scholar 

  86. Zhou, H., Sun, Y., Wu, W., Loy, C.C., Wang, X., Liu, Z.: Pose-controllable talking face generation by implicitly modularized audio-visual representation. In: CVPR (2021)

    Google Scholar 

  87. Zhu, H., Fu, C., Wu, Q., Wu, W., Qian, C., He, R.: AOT: appearance optimal transport based identity swapping for forgery detection. In: NeurIPS (2020)

    Google Scholar 

  88. Zhu, H., Huang, H., Li, Y., Zheng, A., He, R.: Arbitrary talking face generation via attentional audio-visual coherence learning. In: IJCAI (2021)

    Google Scholar 

  89. Zhu, H., Luo, M.D., Wang, R., Zheng, A.H., He, R.: Deep audio-visual learning: a survey. IJAC 18, 351–376 (2021)

    Google Scholar 

  90. Zhu, X., Wang, H., Fei, H., Lei, Z., Li, S.Z.: Face forgery detection by 3D decomposition. In: CVPR (2021)

    Google Scholar 

Download references

Acknowledgement

This work is supported by Shanghai AI Laboratory and SenseTime Research. It is also supported by NTU NAP, MOE AcRF Tier 1 (2021-T1-001-088), and under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wayne Wu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 12009 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, H. et al. (2022). CelebV-HQ: A Large-Scale Video Facial Attributes Dataset. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13667. Springer, Cham. https://doi.org/10.1007/978-3-031-20071-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20071-7_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20070-0

  • Online ISBN: 978-3-031-20071-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics