Choreography cGAN: generating dances with music beats using conditional generative adversarial networks

Huang, Yin-Fu; Liu, Wei-De

doi:10.1007/s00521-021-05752-x

Choreography cGAN: generating dances with music beats using conditional generative adversarial networks

Original Article
Published: 15 March 2021

Volume 33, pages 9817–9833, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

503 Accesses
7 Citations
Explore all metrics

Abstract

In recent years, automatic music-driven choreography has become a highly challenging problem to be solved. In this paper, we propose a music-driven choreography system based on conditional generative adversarial networks. First, a dataset MF-DS integrating MFCC features and Dancing Skeletons extracted from Japanese dancing videos is built by ourselves for the study. The MFCC features are extracted based on music beats, and the dancing skeletons are detected based on the image frames of a video. In the training, we use a generative adversarial network to train the music-driven choreography system. The generator integrates residual blocks into fractionally stridden convolution, and the discriminator involves conventional CNNs. Two indicators called beat loss values and choreography diversity values are proposed to evaluate three learning models in the experiments. Finally, we validate that the three models with the best epochs have the near-zero loss for the generator and discriminator, thereby generating stable skeletons and presenting choreography diversity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 22

Fig. 23

Fig. 24

Fig. 25

TG-Dance: TransGAN-Based Intelligent Dance Generation with Music

Quantized GAN for Complex Music Generation from Dance Videos

Choreography Composed by Deep Learning

Data availability

The dataset MF-DS integrating MFCC features and Dancing Skeletons extracted from Japanese dancing videos is built for our study.

Code availability

OpenPose is a freeware enabling real-time multi-person 2D pose estimation using part affinity fields.

References

Abadi, M, Barham, P, Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). TensorFlow: a system for large-scale machine learning, In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 265–283
Alemi, O., Françoise, J., & Pasquier, P. (2017). GrooveNet: real-time music-driven dance movement generation using artificial neural networks, In Proceedings of the 23rd SIGKDD Workshop on Machine Learning for Creativity.
Alemi, O., & Pasquier, P. (2019). Machine learning for data-driven movement generation: a review of the state of the art, arXiv:1903.08356v1.
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN, arXiv:1701.07875v3.
Cao, Z., Hidalgo, G., Simon, T., Wei, S., & Sheikh, Y. (2019). OpenPose: realtime multi-person 2D pose estimation using part affinity fields, arXiv:1812.08008v2.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv:1406.1078v3.
Crnkovic-Friis, L., & Crnkovic-Friis, L. (2016). Generative choreography using deep learning, arXiv:1605.06921v1.
Dumoulin, V., & Visin, F. (2018). A guide to convolution arithmetic for deep learning, arXiv:1603.07285v2.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Processing Syst 11:2672–2680
Google Scholar
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A (2017) Improved training of Wasserstein GANs. Adv Neural Inform Processing Syst 12:5769–5779
Google Scholar
He, K., Zhang, X., Ren, S.,& Sun, J. (2016). Deep residual learning for image recognition, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift, arXiv:1502.03167v3.
Kingma, D. P., & Ba, J. (2017). Adam: A method for stochastic optimization, arXiv:1412.6980v9.
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
Lee, J., Kim, S., & Lee, K. (2018). Listen to dance: music-driven choreography generation using autoregressive encoder-decoder network, arXiv:1811.00818v1.
Lee HY, Yang X, Liu MY, Wang TC, Lu YD, Yang MH, Kautz J (2019) Dancing to music. Adv Neural Inform Processing Syst 13:3581–3591
Google Scholar
Levina, E., & Bickel, P. (2001). The earth mover's distance is the mallows distance: some insights from statistics, In Proceedings of the 8th IEEE International Conference on Computer Vision, 251–256
Liu L, Zhang H, Xu X, Zhang Z, Yan S (2020) Collocating clothes with generative adversarial networks cosupervised by categories and attributes: a multidiscriminator framework. IEEE Trans Neural Netw Learn Syst 31(9):3540–3554
Article MathSciNet Google Scholar
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models, In Proceedings of the 30th International Conference on Machine Learning.
Martinez, J., Hossain, R., Romero, J., & Little, J. J. (2017). A simple yet effective baseline for 3D human pose estimation, In Proceedings of the IEEE International Conference on Computer Vision, 2640–2649
Mehta D, Sridhar S, Sotnychenko O, Rhodin H, Shafiei M, Seidel HP, Xu W, Casas D, Theobalt C (2017) VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans Graph 36(4):1–14
Article Google Scholar
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets, arXiv:1411.1784v1.
Ofli F, Demir Y, Yemez Y, Erzin E, Tekalp AM, Balcı K et al (2008) An audio-driven dancing avatar. J Multimodal User Interfaces 2:93–103
Article Google Scholar
Ofli F, Erzin E, Yemez Y, Tekalp AM (2012) Learn2Dance: learning statistical music-to-dance mappings for choreography synthesis. IEEE Trans Multimedia 14(3):747–759
Article Google Scholar
Oore, S., & Akiyama, Y. (2006). Learning to synthesize arm motion to music by example, In Proceedings of the 14th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, 201–208
Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C. & Murphy, K. (2017). Towards accurate multi-person pose estimation in the wild, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4903–4911
Pavllo, D., Feichtenhofer, C., Grangier, D. & Auli, M. (2019). 3D human pose estimation in video with temporal convolutions and semi-supervised training, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7753–7762
Simonyan, K. & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556v6.
Srivastava, R. K., Greff, K. & Schmidhuber, J. (2015). Highway networks, arXiv:1505.00387v2.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. & Rabinovich, A. (2015). Going deeper with convolutions, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9
Tang, T., Jia, J., and Mao, H. (2018). Dance with melody: an LSTM-autoencoder approach to music-oriented dance synthesis, In Proceedings of the 26th ACM International Conference on Multimedia, 1598–1606
Taylor, G. W., and Hinton, G. E. (2009). Factored conditional restricted Boltzmann machines for modeling motion style, In Proceedings of the 26th Annual International Conference on Machine Learning, 1025–1032
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D et al (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17:261–272
Article Google Scholar
Zhang H, Sun Y, Liu L et al (2020) ClothingOut: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput Appl 32:4519–4530
Article Google Scholar
Kim, T., et al. (2018). carpedm20/DCGAN-tensorflow. Retrieved from https://github.com/carpedm20/DCGAN-tensorflow/blob/master/ops.py
McDonald, K. (2018). Dance x machine learning: first steps. Retrieved from https://medium.com/@kcimc/discrete-figures-7d9e9c275c47
【足太ぺんた】 (2013). 夏恋花火　踊ってみた【曇りのち晴れ】. Retrieved from https://youtu.be/E_JrGQdX5vU
【みこ】 (2016). 星屑オーケストラ踊ってみた【シンガポール】. Retrieved from https://youtu.be/rjlAxvfmRtE
【りりあ】 (2017). 可愛くなりたい【衣装チェンジ3回で】踊ってみた. Retrieved from https://youtu.be/r5rePp_2LHk

Download references

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, 123 University Road Section 3, Touliu, Yunlin, 640, Taiwan, R.O.C.
Yin-Fu Huang & Wei-De Liu

Authors

Yin-Fu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wei-De Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yin-Fu Huang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Supplementary information

Supplementary file 1

Supplementary file 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, YF., Liu, WD. Choreography cGAN: generating dances with music beats using conditional generative adversarial networks. Neural Comput & Applic 33, 9817–9833 (2021). https://doi.org/10.1007/s00521-021-05752-x

Download citation

Received: 03 August 2020
Accepted: 16 January 2021
Published: 15 March 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00521-021-05752-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Choreography cGAN: generating dances with music beats using conditional generative adversarial networks

Abstract

Access this article

Similar content being viewed by others

TG-Dance: TransGAN-Based Intelligent Dance Generation with Music

Quantized GAN for Complex Music Generation from Dance Videos

Choreography Composed by Deep Learning

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary information

Supplementary file 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Choreography cGAN: generating dances with music beats using conditional generative adversarial networks

Abstract

Access this article

Similar content being viewed by others

TG-Dance: TransGAN-Based Intelligent Dance Generation with Music

Quantized GAN for Complex Music Generation from Dance Videos

Choreography Composed by Deep Learning

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary information

Supplementary file 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation