Skip to main content
Log in

Choreography cGAN: generating dances with music beats using conditional generative adversarial networks

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In recent years, automatic music-driven choreography has become a highly challenging problem to be solved. In this paper, we propose a music-driven choreography system based on conditional generative adversarial networks. First, a dataset MF-DS integrating MFCC features and Dancing Skeletons extracted from Japanese dancing videos is built by ourselves for the study. The MFCC features are extracted based on music beats, and the dancing skeletons are detected based on the image frames of a video. In the training, we use a generative adversarial network to train the music-driven choreography system. The generator integrates residual blocks into fractionally stridden convolution, and the discriminator involves conventional CNNs. Two indicators called beat loss values and choreography diversity values are proposed to evaluate three learning models in the experiments. Finally, we validate that the three models with the best epochs have the near-zero loss for the generator and discriminator, thereby generating stable skeletons and presenting choreography diversity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

Data availability

The dataset MF-DS integrating MFCC features and Dancing Skeletons extracted from Japanese dancing videos is built for our study.

Code availability

OpenPose is a freeware enabling real-time multi-person 2D pose estimation using part affinity fields.

References

  1. Abadi, M, Barham, P, Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). TensorFlow: a system for large-scale machine learning, In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 265–283

  2. Alemi, O., Françoise, J., & Pasquier, P. (2017). GrooveNet: real-time music-driven dance movement generation using artificial neural networks, In Proceedings of the 23rd SIGKDD Workshop on Machine Learning for Creativity.

  3. Alemi, O., & Pasquier, P. (2019). Machine learning for data-driven movement generation: a review of the state of the art, arXiv:1903.08356v1.

  4. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN, arXiv:1701.07875v3.

  5. Cao, Z., Hidalgo, G., Simon, T., Wei, S., & Sheikh, Y. (2019). OpenPose: realtime multi-person 2D pose estimation using part affinity fields, arXiv:1812.08008v2.

  6. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv:1406.1078v3.

  7. Crnkovic-Friis, L., & Crnkovic-Friis, L. (2016). Generative choreography using deep learning, arXiv:1605.06921v1.

  8. Dumoulin, V., & Visin, F. (2018). A guide to convolution arithmetic for deep learning, arXiv:1603.07285v2.

  9. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Processing Syst 11:2672–2680

    Google Scholar 

  10. Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A (2017) Improved training of Wasserstein GANs. Adv Neural Inform Processing Syst 12:5769–5779

    Google Scholar 

  11. He, K., Zhang, X., Ren, S.,& Sun, J. (2016). Deep residual learning for image recognition, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778

  12. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  13. Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift, arXiv:1502.03167v3.

  14. Kingma, D. P., & Ba, J. (2017). Adam: A method for stochastic optimization, arXiv:1412.6980v9.

  15. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

    Article  Google Scholar 

  16. Lee, J., Kim, S., & Lee, K. (2018). Listen to dance: music-driven choreography generation using autoregressive encoder-decoder network, arXiv:1811.00818v1.

  17. Lee HY, Yang X, Liu MY, Wang TC, Lu YD, Yang MH, Kautz J (2019) Dancing to music. Adv Neural Inform Processing Syst 13:3581–3591

    Google Scholar 

  18. Levina, E., & Bickel, P. (2001). The earth mover's distance is the mallows distance: some insights from statistics, In Proceedings of the 8th IEEE International Conference on Computer Vision, 251–256

  19. Liu L, Zhang H, Xu X, Zhang Z, Yan S (2020) Collocating clothes with generative adversarial networks cosupervised by categories and attributes: a multidiscriminator framework. IEEE Trans Neural Netw Learn Syst 31(9):3540–3554

    Article  MathSciNet  Google Scholar 

  20. Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models, In Proceedings of the 30th International Conference on Machine Learning.

  21. Martinez, J., Hossain, R., Romero, J., & Little, J. J. (2017). A simple yet effective baseline for 3D human pose estimation, In Proceedings of the IEEE International Conference on Computer Vision, 2640–2649

  22. Mehta D, Sridhar S, Sotnychenko O, Rhodin H, Shafiei M, Seidel HP, Xu W, Casas D, Theobalt C (2017) VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans Graph 36(4):1–14

    Article  Google Scholar 

  23. Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets, arXiv:1411.1784v1.

  24. Ofli F, Demir Y, Yemez Y, Erzin E, Tekalp AM, Balcı K et al (2008) An audio-driven dancing avatar. J Multimodal User Interfaces 2:93–103

    Article  Google Scholar 

  25. Ofli F, Erzin E, Yemez Y, Tekalp AM (2012) Learn2Dance: learning statistical music-to-dance mappings for choreography synthesis. IEEE Trans Multimedia 14(3):747–759

    Article  Google Scholar 

  26. Oore, S., & Akiyama, Y. (2006). Learning to synthesize arm motion to music by example, In Proceedings of the 14th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, 201–208

  27. Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C. & Murphy, K. (2017). Towards accurate multi-person pose estimation in the wild, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4903–4911

  28. Pavllo, D., Feichtenhofer, C., Grangier, D. & Auli, M. (2019). 3D human pose estimation in video with temporal convolutions and semi-supervised training, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7753–7762

  29. Simonyan, K. & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556v6.

  30. Srivastava, R. K., Greff, K. & Schmidhuber, J. (2015). Highway networks, arXiv:1505.00387v2.

  31. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. & Rabinovich, A. (2015). Going deeper with convolutions, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9

  32. Tang, T., Jia, J., and Mao, H. (2018). Dance with melody: an LSTM-autoencoder approach to music-oriented dance synthesis, In Proceedings of the 26th ACM International Conference on Multimedia, 1598–1606

  33. Taylor, G. W., and Hinton, G. E. (2009). Factored conditional restricted Boltzmann machines for modeling motion style, In Proceedings of the 26th Annual International Conference on Machine Learning, 1025–1032

  34. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D et al (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17:261–272

    Article  Google Scholar 

  35. Zhang H, Sun Y, Liu L et al (2020) ClothingOut: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput Appl 32:4519–4530

    Article  Google Scholar 

  36. Kim, T., et al. (2018). carpedm20/DCGAN-tensorflow. Retrieved from https://github.com/carpedm20/DCGAN-tensorflow/blob/master/ops.py

  37. McDonald, K. (2018). Dance x machine learning: first steps. Retrieved from https://medium.com/@kcimc/discrete-figures-7d9e9c275c47

  38. 【足太ぺんた】 (2013). 夏恋花火 踊ってみた【曇りのち晴れ】. Retrieved from https://youtu.be/E_JrGQdX5vU

  39. 【みこ】 (2016). 星屑オーケストラ 踊ってみた【シンガポール】. Retrieved from https://youtu.be/rjlAxvfmRtE

  40. 【りりあ】 (2017). 可愛くなりたい【衣装チェンジ3回で】踊ってみた. Retrieved from https://youtu.be/r5rePp_2LHk

Download references

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yin-Fu Huang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Supplementary information

Supplementary file 1

Supplementary file 2

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, YF., Liu, WD. Choreography cGAN: generating dances with music beats using conditional generative adversarial networks. Neural Comput & Applic 33, 9817–9833 (2021). https://doi.org/10.1007/s00521-021-05752-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-05752-x

Keywords

Navigation