Skip to main content
Log in

SymforNet: application of cross-modal information correspondences based on self-supervision in symbolic music generation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this study, we explore to address challenges related to incorrect scores, inconsistent rhythm and labeling in the generation of symbolic music scores, with a focus on the utilization of self-supervised models. We present the SymforNet model for symbolic music generation, which is based on self-supervision and deep learning. The model incorporates an attention mechanism and demonstrates exceptional proficiency in recognizing contextual elements of various categories. Experimental results indicate that: (1) The SymforNet model achieve an impressive 88% accuracy in generating music score; (2) In both the training and test sets, the SymforNet model exhibits significantly superior loss values, surpassing the all baseline models; (3) An examination of the multi-track Used Pitch Class data reveals that the SymforNet model, particularly in the context of sequences comprising three to four tracks, displays a strong correlation; (4) By comparing about the quality of music scores, SymforNet has a 87% rate of generating correct scores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. https://musescore.com/

References

  1. Zhang N (2020) Zhang N (2020) Learning adversarial transformer for symbolic music generation. IEEE Trans Neural Netw Learn Syst 34(4):1754–1763

  2. Sulun S, Davies ME, Viana P (2022) Symbolic music generation conditioned on continuous-valued emotions. IEEE Access 10:44617–44626

    Article  Google Scholar 

  3. Hernandez-Olivan C, Beltran JR (2023) Musicaiz: a python library for symbolic music generation, analysis and visualization. SoftwareX 22:101365

    Article  Google Scholar 

  4. Solomonova OB, Zavgorodnia GF, Muravska OV, Chernoivanenko AD, Aleksandrova OO (2021) Interconnection of linguistics and musical art: specifics of music semantics development. Linguist Culture Rev 5(S4):700–713

    Article  Google Scholar 

  5. Chen T-P, Su L (2021) Attend to Chords: Improving Harmonic Analysis of Symbolic Music Using Transformer-Based Models. Trans Int Soc Music Inf Retr 4(1):1–14

  6. Baek S, Yoon G, Song J, Yoon SM (2022) Self-supervised deep geometric subspace clustering network. Inf Sci 610:235–245

    Article  Google Scholar 

  7. Gunawan AAS, Iman AP, Suhartono D (2020) Automatic music generator using recurrent neural network. Int J Comput Intell Syst 13(1):645–654

    Article  Google Scholar 

  8. Syarif AM, Azhari A, Suprapto S, Hastuti, K (2023) Gamelan Melody Generation Using LSTM Networks Controlled by Composition Meter Rules and Special Notes. J Adv Inf Technol 14(1):26–38

  9. Ferreira P, Limongi R, Fávero LP (2023) Generating music with data: application of deep learning models for symbolic music composition. Appl Sci 13(7):4543

    Article  CAS  Google Scholar 

  10. Wu J, Liu X, Hu X, Zhu J (2020) PopMNet: generating structured pop music melodies using neural networks. Artif Intell 286:103303

    Article  Google Scholar 

  11. Yang LC, Lerch A (2020) On the evaluation of generative models in music. Neural Comput Applic 32(9):4773–4784

    Article  Google Scholar 

  12. Alexandraki C, Akoumianakis D, Kalochristianakis M, Zervas P (2022) MusiCoLab: towards a modular architecture for collaborative music learning. In: Proceedings of the Web Audio Conference, pp 1–8

  13. Shih Y-J, Wu S-L, Zalkow F, Müller M, Yang Y-H (2023) Theme transformer: symbolic music generation with theme-conditioned transformer. IEEE Trans Multimedia 25(3):3495–3508

  14. Muhamed A, Li L, Shi X, Yaddanapudi S, Chi W, Jackson D, Suresh R, Lipton ZC, Smola AJ (2021) Symbolic music generation with transformer-GANs. Proc AAAI Conf Artif Intell 35:408–417

    Google Scholar 

  15. Kurniawati A, Yuniarno EM, Suprapto YK, Soewidiatmaka ANI (2023) Automatic note generator for Javanese gamelan music accompaniment using deep learning. Int J Adv Intell Inform 9(2):231–248

    Google Scholar 

  16. Hernandez-Olivan C, Beltran JR (2022) Music composition with deep learning: a review. Adv Speech Music Technol: Comput Asp Appl 25(3):25–50

  17. Lisena P, Meroño-Peñuela A, Troncy R (2022) MIDI2vec: learning MIDI embeddings for reliable prediction of symbolic music metadata. Semant Web 13(3):357–377

    Article  Google Scholar 

  18. Mooney J, Green O, Williams S (2022) Instrumental, hermeneutic, and ontological indeterminacy in Hugh Davies’s live electronic music. Contemp Music Rev 41(2–3):193–215

    Article  Google Scholar 

  19. Yang L, Mi Z, Xiao J, Li R (2021) Machine learning for music aesthetic annotation using midi format: a harmony-based classification approach. Int J Comput Inf Eng 15(7):423–427

    Google Scholar 

  20. Shishido T, Fati F, Tokushige D, Ono Y, Kumazawa I (2023) Production of MusicXML from locally inclined sheetmusic photo image by using measure-based multimodal deep-learning-driven assembly method. Trans Jpn Soc Artif Intell 38(3):A–MA3_1

  21. Tian Y (2021) Multi-note intelligent fusion method of music based on artificial neural network. Int J Arts Technol 13(1):1–17

    Article  Google Scholar 

  22. Yang Y, Wang L, Huang M, Zhu Q, Wang R (2022) Polarization imaging based bruise detection of nectarine by using ResNet-18 and ghost bottleneck. Postharvest Biol Technol 189(111):916

  23. Debbal SME, Mezinai F et al (2021) Pathologies cardiac discrimination using the Fast Fourir Transform (FFT) the Short Time Fourier Transforms (STFT) and the Wigner distribution (WD). Int Biol Biomed J 7(1):0

    Google Scholar 

  24. Farias F, Coelho R (2021) Blind adaptive mask to improve intelligibility of non-stationary noisy speech. IEEE Signal Process Lett 28:1170–1174

    Article  ADS  Google Scholar 

  25. Chen C, Zhang T, Teng Y, Yu Y, Shu X, Zhang L, Zhao F, Xu J (2023) Automated segmentation of craniopharyngioma on MR images using U-Net-based deep convolutional neural network. Eur Radiol 33(4):2665–2675

    Article  PubMed  Google Scholar 

  26. Dalmazzo D, Waddell G, Ramírez R (2021) Applying deep learning techniques to estimate patterns of musical gesture. Front Psychol 11(575):971

    Google Scholar 

  27. Calvo-Zaragoza J, Toselli AH, Vidal E (2019) Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn Lett 128:115–121

    Article  ADS  Google Scholar 

  28. Wu G, Liu S, Fan X (2023) The power of fragmentation: a hierarchical transformer model for structural segmentation in symbolic music generation. IEEE/ACM Trans Audio Speech Lang Process 31:1409–1420

    Article  Google Scholar 

  29. Guo Y, Liu Y, Zhou T, Xu L, Zhang Q (2023) An automatic music generation and evaluation method based on transfer learning. PLoS ONE 18(5):e0283103

  30. Gómez Aíza R (2023) Symbolic dynamical scales: modes, orbitals, and transversals. J Math Music 17(1):46–64

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under grant No. 61966033, 62241208, 62366050. We would like to thank Yuning Zhang, Yutao Hou, Yaqing Shi for their revision of the paper.

Author information

Authors and Affiliations

Authors

Contributions

Abudukelimu Halidanmu: Funding acquisition; Jishang Chen: Data analysis and Writing; Yunze Liang: Paper revision; Abudukelimu Abulizi and Alimujiang Yasen: Project administration.

Corresponding author

Correspondence to Abudukelimu Abulizi.

Ethics declarations

Competing Interests

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abudukelimu, H., Chen, J., Liang, Y. et al. SymforNet: application of cross-modal information correspondences based on self-supervision in symbolic music generation. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05335-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05335-y

Keywords

Navigation