Abstract
Realistic music generation has always remained as a challenging problem as it may lack structure or rationality. In this work, we propose a deep learning based music generation method in order to produce old style music particularly JAZZ with rehashed melodic structures utilizing a Bi-directional Long Short Term Memory (Bi-LSTM) Neural Network with attention. Owing to the success in modelling long-term temporal dependencies in sequential data and its success in case of videos, Bi-LSTMs with attention serves as a natural choice and early utilization in music generation. We validate in our experiments that Bi-LSTMs with attention are able to preserve the richness and technical nuances of the music performed.
Similar content being viewed by others
References
Abraham A (2005) Artificial neural networks. Handbook of measuring system design
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Boulanger-Lewandowski N, Bengio Y, Vincent P (2012) Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. arXiv:1206.6392
Browne CB (2001) System and method for automatic music generation using a neural network architecture. US Patent 6, 297, 439
Chen CC, Miikkulainen R (2001) Creating melodies with evolving recurrent neural networks. In: IJCNN’01. International Joint Conference on Neural Networks. Proceedings (Cat. No. 01CH37222), vol 3. IEEE, pp 2241–2246
Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6964–6968
Dong HW, Hsiao WY, Yang LC, Yang YH (2018) Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Thirty-second AAAI conference on artificial intelligence
Drewes F, Högberg J. (2007) An algebra for tree-based music generation. In: International conference on algebraic informatics. Springer, pp 172–188
Eck D, Schmidhuber J (2002) Finding temporal structure in music: Blues improvisation with lstm recurrent networks. In: Proceedings of the 12th IEEE workshop on neural networks for signal processing. IEEE, pp 747–756
Eck D, Schmidhuber J (2002) A first look at music composition using lstm recurrent neural networks. Istit Dalle Molle Stud Sull Intell Artif 103:48
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18 (5-6):602–610
Hadjeres G, Nielsen F (2017) Interactive music generation with positional constraints using anticipation-rnns. arXiv:1709.06404
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Johnson DD (2017) Generating polyphonic music using tied parallel networks. In: International conference on evolutionary and biologically inspired music and art. Springer, pp 128–143
Lee H, Pham P, Largman Y, Ng A (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. Adv Neural Inf Process Syst 22:1096–1104
Lewis J (1988) Creation by refinement: a creativity paradigm for gradient descent learning networks. In: International conf. on neural networks, pp 229–233
Liu I, Ramakrishnan B, et al. (2014) Bach in 2014: Music composition with recurrent neural network. arXiv:1412.3191
Oord A.v.d., Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv:1609.03499
Schulze W, Van Der Merwe B (2010) Music generation with markov models. IEEE MultiMedia (3):78–85
Todd P (1988) A sequential network design for musical applications. In: Proceedings of the 1988 connectionist models summer school, pp 76–84
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: NIPS
Yang LC, Chou SY, Yang YH (2017) Midinet: A convolutional generative adversarial network for symbolic-domain music generation. arXiv:1703.10847
Ycart A, Benetos E, et al. (2017) A study on lstm networks for polyphonic music sequence modelling. ISMIR
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Keerti, G., Vaishnavi, A.N., Mukherjee, P. et al. Attentional networks for music generation. Multimed Tools Appl 81, 5179–5189 (2022). https://doi.org/10.1007/s11042-021-11881-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11881-1