Skip to main content

BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)


Generative models for audio-conditioned dance motion synthesis map music features to dance movements. Models are trained to associate motion patterns to audio patterns, usually without an explicit knowledge of the human body. This approach relies on a few assumptions: strong music-dance correlation, controlled motion data and relatively simple poses and movements. These characteristics are found in all existing datasets for dance motion synthesis, and indeed recent methods can achieve good results. We introduce a new dataset aiming to challenge these common assumptions, compiling a set of dynamic dance sequences displaying complex human poses. We focus on breakdancing which features acrobatic moves and tangled postures. We source our data from the Red Bull BC One competition videos. Estimating human keypoints from these videos is difficult due to the complexity of the dance, as well as the multiple moving cameras recording setup. We adopt a hybrid labelling pipeline leveraging deep estimation models as well as manual annotations to obtain good quality keypoint sequences at a reduced cost. Our efforts produced the BRACE dataset, which contains over 3 h and 30 min of densely annotated poses. We test state-of-the-art methods on BRACE, showing their limitations when evaluated on complex sequences. Our dataset can readily foster advance in dance motion synthesis. With intricate poses and swift movements, models are forced to go beyond learning a mapping between modalities and reason more effectively about body structure and movements.

D. Moltisanti and B. Dai—Work done while at Nanyang Technological University.

D. Moltisanti and J. Wu—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.


  1. Abu-El-Haija, S., et al.: YouTube-8M: a large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 (2016)

  2. Alemi, O., Françoise, J., Pasquier, P.: Groovenet: real-time music-driven dance movement generation using artificial neural networks. Networks 8(17), 26 (2017)

    Google Scholar 

  3. Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)

    Article  Google Scholar 

  4. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  5. Castellano, B.: Pyscenedetect.

  6. Chen, K., et al.: Hybrid task cascade for instance segmentation. In: Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  7. Ferreira, J.P., et al.: Learning to dance: a graph convolutional adversarial network to generate realistic dance motions from audio. Comput. Graph. 94, 11–21 (2021)

    Article  Google Scholar 

  8. Gu, C., et al.: Ava: a video dataset of spatio-temporally localized atomic visual actions. In: Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on computer vision and pattern recognition (2016)

    Google Scholar 

  10. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  11. Huang, R., Hu, H., Wu, W., Sawada, K., Zhang, M., Jiang, D.: Dance revolution: long-term dance generation with music via curriculum learning. In: International Conference on Learning Representations (2021)

    Google Scholar 

  12. Lee, H.Y., et al.: Dancing to music. In: Neural Information Processing Systems (2019)

    Google Scholar 

  13. Li, B., Zhao, Y., Sheng, L.: DanceNet3D: music based dance generation with parametric motion transformer. arXiv preprint arXiv:2103.10206 (2021)

  14. Li, R., Yang, S., Ross, D.A., Kanazawa, A.: AI choreographer: music conditioned 3d dance generation with AIST++. In: International Conference on Computer Vision (2021)

    Google Scholar 

  15. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision (2014)

    Google Scholar 

  16. Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  17. McFee, B., et al.: librosa: audio and music signal analysis in python. In: Python in science conference (2015)

    Google Scholar 

  18. MMDetection Contributors: OpenMMLab Detection Toolbox and Benchmark, August 2018.

  19. MMPose Contributors: OpenMMLab Pose Estimation Toolbox and Benchmark, August 2020.

  20. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  21. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  22. Tang, T., Jia, J., Mao, H.: Dance with melody: an lstm-autoencoder approach to music-oriented dance synthesis. In: ACM International Conference on Multimedia (2018)

    Google Scholar 

  23. Tsuchida, S., Fukayama, S., Hamasaki, M., Goto, M.: AIST dance video database: multi-genre, multi-dancer, and multi-camera database for dance information processing. In: International Society for Music Information Retrieval Conference (2019)

    Google Scholar 

  24. Universitat Pompeu Fabra, B.: Essentia.

  25. Yan, S., Li, Z., Xiong, Y., Yan, H., Lin, D.: Convolutional sequence generation for skeleton-based action synthesis. In: International Conference on Computer Vision (2019)

    Google Scholar 

  26. Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  27. Zhuang, W., Wang, C., Chai, J., Wang, Y., Shao, M., Xia, S.: Music2dance: dancenet for music-driven dance generation. Commun. Appl. ACM Trans. Multimedia Comput. 18(2), 1–21 (2022)

    Article  Google Scholar 

Download references


This study is supported under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). The project is also supported by Singapore MOE AcRF Tier 1 (RG16/21).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Davide Moltisanti .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1079 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moltisanti, D., Wu, J., Dai, B., Loy, C.C. (2022). BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20073-1

  • Online ISBN: 978-3-031-20074-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics