SymforNet: application of cross-modal information correspondences based on self-supervision in symbolic music generation

Abudukelimu, Halidanmu; Chen, Jishang; Liang, Yunze; Abulizi, Abudukelimu; Yasen, Alimujiang

doi:10.1007/s10489-024-05335-y

SymforNet: application of cross-modal information correspondences based on self-supervision in symbolic music generation

Published: 20 March 2024

(2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Halidanmu Abudukelimu¹,
Jishang Chen¹,
Yunze Liang¹,
Abudukelimu Abulizi ORCID: orcid.org/0009-0002-1407-4230¹ &
…
Alimujiang Yasen¹

135 Accesses
1 Altmetric
Explore all metrics

Abstract

In this study, we explore to address challenges related to incorrect scores, inconsistent rhythm and labeling in the generation of symbolic music scores, with a focus on the utilization of self-supervised models. We present the SymforNet model for symbolic music generation, which is based on self-supervision and deep learning. The model incorporates an attention mechanism and demonstrates exceptional proficiency in recognizing contextual elements of various categories. Experimental results indicate that: (1) The SymforNet model achieve an impressive 88% accuracy in generating music score; (2) In both the training and test sets, the SymforNet model exhibits significantly superior loss values, surpassing the all baseline models; (3) An examination of the multi-track Used Pitch Class data reveals that the SymforNet model, particularly in the context of sequences comprising three to four tracks, displays a strong correlation; (4) By comparing about the quality of music scores, SymforNet has a 87% rate of generating correct scores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HRPE: Hierarchical Relative Positional Encoding for Transformer-Based Structured Symbolic Music Generation

Genre Recognition from Symbolic Music with CNNs: Performance and Explainability

Article Open access 17 December 2022

Deep learning’s shallow gains: a comparative evaluation of algorithms for automatic music generation

Article Open access 21 March 2023

Data Availability

The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.

Notes

https://musescore.com/

References

Zhang N (2020) Zhang N (2020) Learning adversarial transformer for symbolic music generation. IEEE Trans Neural Netw Learn Syst 34(4):1754–1763
Sulun S, Davies ME, Viana P (2022) Symbolic music generation conditioned on continuous-valued emotions. IEEE Access 10:44617–44626
Article Google Scholar
Hernandez-Olivan C, Beltran JR (2023) Musicaiz: a python library for symbolic music generation, analysis and visualization. SoftwareX 22:101365
Article Google Scholar
Solomonova OB, Zavgorodnia GF, Muravska OV, Chernoivanenko AD, Aleksandrova OO (2021) Interconnection of linguistics and musical art: specifics of music semantics development. Linguist Culture Rev 5(S4):700–713
Article Google Scholar
Chen T-P, Su L (2021) Attend to Chords: Improving Harmonic Analysis of Symbolic Music Using Transformer-Based Models. Trans Int Soc Music Inf Retr 4(1):1–14
Baek S, Yoon G, Song J, Yoon SM (2022) Self-supervised deep geometric subspace clustering network. Inf Sci 610:235–245
Article Google Scholar
Gunawan AAS, Iman AP, Suhartono D (2020) Automatic music generator using recurrent neural network. Int J Comput Intell Syst 13(1):645–654
Article Google Scholar
Syarif AM, Azhari A, Suprapto S, Hastuti, K (2023) Gamelan Melody Generation Using LSTM Networks Controlled by Composition Meter Rules and Special Notes. J Adv Inf Technol 14(1):26–38
Ferreira P, Limongi R, Fávero LP (2023) Generating music with data: application of deep learning models for symbolic music composition. Appl Sci 13(7):4543
Article CAS Google Scholar
Wu J, Liu X, Hu X, Zhu J (2020) PopMNet: generating structured pop music melodies using neural networks. Artif Intell 286:103303
Article Google Scholar
Yang LC, Lerch A (2020) On the evaluation of generative models in music. Neural Comput Applic 32(9):4773–4784
Article Google Scholar
Alexandraki C, Akoumianakis D, Kalochristianakis M, Zervas P (2022) MusiCoLab: towards a modular architecture for collaborative music learning. In: Proceedings of the Web Audio Conference, pp 1–8
Shih Y-J, Wu S-L, Zalkow F, Müller M, Yang Y-H (2023) Theme transformer: symbolic music generation with theme-conditioned transformer. IEEE Trans Multimedia 25(3):3495–3508
Muhamed A, Li L, Shi X, Yaddanapudi S, Chi W, Jackson D, Suresh R, Lipton ZC, Smola AJ (2021) Symbolic music generation with transformer-GANs. Proc AAAI Conf Artif Intell 35:408–417
Google Scholar
Kurniawati A, Yuniarno EM, Suprapto YK, Soewidiatmaka ANI (2023) Automatic note generator for Javanese gamelan music accompaniment using deep learning. Int J Adv Intell Inform 9(2):231–248
Google Scholar
Hernandez-Olivan C, Beltran JR (2022) Music composition with deep learning: a review. Adv Speech Music Technol: Comput Asp Appl 25(3):25–50
Lisena P, Meroño-Peñuela A, Troncy R (2022) MIDI2vec: learning MIDI embeddings for reliable prediction of symbolic music metadata. Semant Web 13(3):357–377
Article Google Scholar
Mooney J, Green O, Williams S (2022) Instrumental, hermeneutic, and ontological indeterminacy in Hugh Davies’s live electronic music. Contemp Music Rev 41(2–3):193–215
Article Google Scholar
Yang L, Mi Z, Xiao J, Li R (2021) Machine learning for music aesthetic annotation using midi format: a harmony-based classification approach. Int J Comput Inf Eng 15(7):423–427
Google Scholar
Shishido T, Fati F, Tokushige D, Ono Y, Kumazawa I (2023) Production of MusicXML from locally inclined sheetmusic photo image by using measure-based multimodal deep-learning-driven assembly method. Trans Jpn Soc Artif Intell 38(3):A–MA3_1
Tian Y (2021) Multi-note intelligent fusion method of music based on artificial neural network. Int J Arts Technol 13(1):1–17
Article Google Scholar
Yang Y, Wang L, Huang M, Zhu Q, Wang R (2022) Polarization imaging based bruise detection of nectarine by using ResNet-18 and ghost bottleneck. Postharvest Biol Technol 189(111):916
Debbal SME, Mezinai F et al (2021) Pathologies cardiac discrimination using the Fast Fourir Transform (FFT) the Short Time Fourier Transforms (STFT) and the Wigner distribution (WD). Int Biol Biomed J 7(1):0
Google Scholar
Farias F, Coelho R (2021) Blind adaptive mask to improve intelligibility of non-stationary noisy speech. IEEE Signal Process Lett 28:1170–1174
Article ADS Google Scholar
Chen C, Zhang T, Teng Y, Yu Y, Shu X, Zhang L, Zhao F, Xu J (2023) Automated segmentation of craniopharyngioma on MR images using U-Net-based deep convolutional neural network. Eur Radiol 33(4):2665–2675
Article PubMed Google Scholar
Dalmazzo D, Waddell G, Ramírez R (2021) Applying deep learning techniques to estimate patterns of musical gesture. Front Psychol 11(575):971
Google Scholar
Calvo-Zaragoza J, Toselli AH, Vidal E (2019) Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn Lett 128:115–121
Article ADS Google Scholar
Wu G, Liu S, Fan X (2023) The power of fragmentation: a hierarchical transformer model for structural segmentation in symbolic music generation. IEEE/ACM Trans Audio Speech Lang Process 31:1409–1420
Article Google Scholar
Guo Y, Liu Y, Zhou T, Xu L, Zhang Q (2023) An automatic music generation and evaluation method based on transfer learning. PLoS ONE 18(5):e0283103
Gómez Aíza R (2023) Symbolic dynamical scales: modes, orbitals, and transversals. J Math Music 17(1):46–64
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under grant No. 61966033, 62241208, 62366050. We would like to thank Yuning Zhang, Yutao Hou, Yaqing Shi for their revision of the paper.

Author information

Authors and Affiliations

School of Information Management, Xinjiang University of Finance and Economics, 449 Beijing Middle Road, Urumqi, 830012, Xinjiang Uyghur Autonomous Region, China
Halidanmu Abudukelimu, Jishang Chen, Yunze Liang, Abudukelimu Abulizi & Alimujiang Yasen

Authors

Halidanmu Abudukelimu
View author publications
You can also search for this author in PubMed Google Scholar
Jishang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yunze Liang
View author publications
You can also search for this author in PubMed Google Scholar
Abudukelimu Abulizi
View author publications
You can also search for this author in PubMed Google Scholar
Alimujiang Yasen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Abudukelimu Halidanmu: Funding acquisition; Jishang Chen: Data analysis and Writing; Yunze Liang: Paper revision; Abudukelimu Abulizi and Alimujiang Yasen: Project administration.

Corresponding author

Correspondence to Abudukelimu Abulizi.

Ethics declarations

Competing Interests

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Abudukelimu, H., Chen, J., Liang, Y. et al. SymforNet: application of cross-modal information correspondences based on self-supervision in symbolic music generation. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05335-y

Download citation

Accepted: 12 February 2024
Published: 20 March 2024
DOI: https://doi.org/10.1007/s10489-024-05335-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SymforNet: application of cross-modal information correspondences based on self-supervision in symbolic music generation

Abstract

Access this article

Similar content being viewed by others

HRPE: Hierarchical Relative Positional Encoding for Transformer-Based Structured Symbolic Music Generation

Genre Recognition from Symbolic Music with CNNs: Performance and Explainability

Deep learning’s shallow gains: a comparative evaluation of algorithms for automatic music generation

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SymforNet: application of cross-modal information correspondences based on self-supervision in symbolic music generation

Abstract

Access this article

Similar content being viewed by others

HRPE: Hierarchical Relative Positional Encoding for Transformer-Based Structured Symbolic Music Generation

Genre Recognition from Symbolic Music with CNNs: Performance and Explainability

Deep learning’s shallow gains: a comparative evaluation of algorithms for automatic music generation

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation