Abstract
In this study, we explore to address challenges related to incorrect scores, inconsistent rhythm and labeling in the generation of symbolic music scores, with a focus on the utilization of self-supervised models. We present the SymforNet model for symbolic music generation, which is based on self-supervision and deep learning. The model incorporates an attention mechanism and demonstrates exceptional proficiency in recognizing contextual elements of various categories. Experimental results indicate that: (1) The SymforNet model achieve an impressive 88% accuracy in generating music score; (2) In both the training and test sets, the SymforNet model exhibits significantly superior loss values, surpassing the all baseline models; (3) An examination of the multi-track Used Pitch Class data reveals that the SymforNet model, particularly in the context of sequences comprising three to four tracks, displays a strong correlation; (4) By comparing about the quality of music scores, SymforNet has a 87% rate of generating correct scores.
Similar content being viewed by others
Data Availability
The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.
References
Zhang N (2020) Zhang N (2020) Learning adversarial transformer for symbolic music generation. IEEE Trans Neural Netw Learn Syst 34(4):1754–1763
Sulun S, Davies ME, Viana P (2022) Symbolic music generation conditioned on continuous-valued emotions. IEEE Access 10:44617–44626
Hernandez-Olivan C, Beltran JR (2023) Musicaiz: a python library for symbolic music generation, analysis and visualization. SoftwareX 22:101365
Solomonova OB, Zavgorodnia GF, Muravska OV, Chernoivanenko AD, Aleksandrova OO (2021) Interconnection of linguistics and musical art: specifics of music semantics development. Linguist Culture Rev 5(S4):700–713
Chen T-P, Su L (2021) Attend to Chords: Improving Harmonic Analysis of Symbolic Music Using Transformer-Based Models. Trans Int Soc Music Inf Retr 4(1):1–14
Baek S, Yoon G, Song J, Yoon SM (2022) Self-supervised deep geometric subspace clustering network. Inf Sci 610:235–245
Gunawan AAS, Iman AP, Suhartono D (2020) Automatic music generator using recurrent neural network. Int J Comput Intell Syst 13(1):645–654
Syarif AM, Azhari A, Suprapto S, Hastuti, K (2023) Gamelan Melody Generation Using LSTM Networks Controlled by Composition Meter Rules and Special Notes. J Adv Inf Technol 14(1):26–38
Ferreira P, Limongi R, Fávero LP (2023) Generating music with data: application of deep learning models for symbolic music composition. Appl Sci 13(7):4543
Wu J, Liu X, Hu X, Zhu J (2020) PopMNet: generating structured pop music melodies using neural networks. Artif Intell 286:103303
Yang LC, Lerch A (2020) On the evaluation of generative models in music. Neural Comput Applic 32(9):4773–4784
Alexandraki C, Akoumianakis D, Kalochristianakis M, Zervas P (2022) MusiCoLab: towards a modular architecture for collaborative music learning. In: Proceedings of the Web Audio Conference, pp 1–8
Shih Y-J, Wu S-L, Zalkow F, Müller M, Yang Y-H (2023) Theme transformer: symbolic music generation with theme-conditioned transformer. IEEE Trans Multimedia 25(3):3495–3508
Muhamed A, Li L, Shi X, Yaddanapudi S, Chi W, Jackson D, Suresh R, Lipton ZC, Smola AJ (2021) Symbolic music generation with transformer-GANs. Proc AAAI Conf Artif Intell 35:408–417
Kurniawati A, Yuniarno EM, Suprapto YK, Soewidiatmaka ANI (2023) Automatic note generator for Javanese gamelan music accompaniment using deep learning. Int J Adv Intell Inform 9(2):231–248
Hernandez-Olivan C, Beltran JR (2022) Music composition with deep learning: a review. Adv Speech Music Technol: Comput Asp Appl 25(3):25–50
Lisena P, Meroño-Peñuela A, Troncy R (2022) MIDI2vec: learning MIDI embeddings for reliable prediction of symbolic music metadata. Semant Web 13(3):357–377
Mooney J, Green O, Williams S (2022) Instrumental, hermeneutic, and ontological indeterminacy in Hugh Davies’s live electronic music. Contemp Music Rev 41(2–3):193–215
Yang L, Mi Z, Xiao J, Li R (2021) Machine learning for music aesthetic annotation using midi format: a harmony-based classification approach. Int J Comput Inf Eng 15(7):423–427
Shishido T, Fati F, Tokushige D, Ono Y, Kumazawa I (2023) Production of MusicXML from locally inclined sheetmusic photo image by using measure-based multimodal deep-learning-driven assembly method. Trans Jpn Soc Artif Intell 38(3):A–MA3_1
Tian Y (2021) Multi-note intelligent fusion method of music based on artificial neural network. Int J Arts Technol 13(1):1–17
Yang Y, Wang L, Huang M, Zhu Q, Wang R (2022) Polarization imaging based bruise detection of nectarine by using ResNet-18 and ghost bottleneck. Postharvest Biol Technol 189(111):916
Debbal SME, Mezinai F et al (2021) Pathologies cardiac discrimination using the Fast Fourir Transform (FFT) the Short Time Fourier Transforms (STFT) and the Wigner distribution (WD). Int Biol Biomed J 7(1):0
Farias F, Coelho R (2021) Blind adaptive mask to improve intelligibility of non-stationary noisy speech. IEEE Signal Process Lett 28:1170–1174
Chen C, Zhang T, Teng Y, Yu Y, Shu X, Zhang L, Zhao F, Xu J (2023) Automated segmentation of craniopharyngioma on MR images using U-Net-based deep convolutional neural network. Eur Radiol 33(4):2665–2675
Dalmazzo D, Waddell G, Ramírez R (2021) Applying deep learning techniques to estimate patterns of musical gesture. Front Psychol 11(575):971
Calvo-Zaragoza J, Toselli AH, Vidal E (2019) Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn Lett 128:115–121
Wu G, Liu S, Fan X (2023) The power of fragmentation: a hierarchical transformer model for structural segmentation in symbolic music generation. IEEE/ACM Trans Audio Speech Lang Process 31:1409–1420
Guo Y, Liu Y, Zhou T, Xu L, Zhang Q (2023) An automatic music generation and evaluation method based on transfer learning. PLoS ONE 18(5):e0283103
Gómez Aíza R (2023) Symbolic dynamical scales: modes, orbitals, and transversals. J Math Music 17(1):46–64
Acknowledgements
This work was supported by the National Natural Science Foundation of China under grant No. 61966033, 62241208, 62366050. We would like to thank Yuning Zhang, Yutao Hou, Yaqing Shi for their revision of the paper.
Author information
Authors and Affiliations
Contributions
Abudukelimu Halidanmu: Funding acquisition; Jishang Chen: Data analysis and Writing; Yunze Liang: Paper revision; Abudukelimu Abulizi and Alimujiang Yasen: Project administration.
Corresponding author
Ethics declarations
Competing Interests
No potential conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abudukelimu, H., Chen, J., Liang, Y. et al. SymforNet: application of cross-modal information correspondences based on self-supervision in symbolic music generation. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05335-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05335-y