Skip to main content

HRPE: Hierarchical Relative Positional Encoding for Transformer-Based Structured Symbolic Music Generation

  • Conference paper
  • First Online:
Music Intelligence (SOMI 2023)

Abstract

Musicians often structure their compositions hierarchically to imbue their music with rich expressiveness. As a result, generating musically meaningful music with well-organized structures has been a significant research goal for many scholars. Several approaches have been proposed to achieve this objective, typically involving multi-step generation pipelines or sophisticated model architectures based on domain knowledge, which can increase model complexity and generalization difficulty. In this study, we demonstrate that a hierarchical positional encoding adapted for music is sufficient to enhance model performance and generate coherent music with hierarchical structures. We incorporate hierarchical positional information into the Transformer model by modifying the attention matrix with relative position bias at different levels, enabling the model to learn long-short-term dependencies jointly and become less sensitive to positional shifts of several notes. Additionally, we investigate the design of section-level relative positional encoding through ablation studies. To validate our approach, we annotate two datasets (POP909-S and POP2000-S) with music sections and present evidence for both single-track monophonic music and multi-track polyphonic music generation tasks. Experimental results demonstrate that our approach outperforms state-of-the-art Transformer models in both subjective and objective evaluations. We plan to release the source code and annotated datasets upon acceptance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bai, H., et al.: Segatron: segment-aware transformer for language modeling and understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 12526–12534 (2021)

    Google Scholar 

  2. Chang, C.J., Lee, C.Y., Yang, Y.H.: Variable-length music score infilling via XLNET and musically specialized positional encoding. arXiv preprint arXiv:2108.05064 (2021)

  3. Dai, S., Jin, Z., Gomes, C., Dannenberg, R.B.: Controllable deep melody generation via hierarchical music structure representation. Cornell University. arXiv:2109.00663 (2021)

  4. Dai, S., Zhang, H., Dannenberg, R.B.: Automatic analysis and influence of hierarchical structure on melody, rhythm and harmony in popular music. Cornell University. arXiv:2010.07518 (2020)

  5. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-xl: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019)

  6. Dong, H.W., Chen, K., Dubnov, S., McAuley, J., Berg-Kirkpatrick, T.: Multitrack music transformer: learning long-term dependencies in music with diverse instruments. arXiv preprint arXiv:2207.06983 (2022)

  7. Dufter, P., Schmitt, M., Schütze, H.: Position information in transformers: an overview. Comput. Linguist. 48(3), 733–763 (2022)

    Article  Google Scholar 

  8. Guo, Z., Kang, J., Herremans, D.: A domain-knowledge-inspired music embedding space and a novel attention mechanism for symbolic music modeling. arXiv preprint arXiv:2212.00973 (2022)

  9. Haviv, A., Ram, O., Press, O., Izsak, P., Levy, O.: Transformer language models without positional encodings still learn positional information. arXiv preprint arXiv:2203.16634 (2022)

  10. He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with disentangled attention. arXiv preprint arXiv:2006.03654 (2020)

  11. Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751 (2019)

  12. Hsiao, W.Y., Liu, J.Y., Yeh, Y.C., Yang, Y.H.: Compound word transformer: learning to compose full-song music over dynamic directed hypergraphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 178–186 (2021)

    Google Scholar 

  13. Huang, C.Z.A., et al.: Music transformer: generating music with long-term structure. In: International Conference on Learning Representations (2019)

    Google Scholar 

  14. Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1180–1188 (2020)

    Google Scholar 

  15. Ke, G., He, D., Liu, T.Y.: Rethinking positional encoding in language pre-training. arXiv preprint arXiv:2006.15595 (2020)

  16. Lu, P., Tan, X., Yu, B., Qin, T., Zhao, S., Liu, T.Y.: Meloform: generating melody with musical form based on expert systems and neural networks. arXiv preprint arXiv:2208.14345 (2022)

  17. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)

    MathSciNet  Google Scholar 

  18. Vaswani, A.,et al., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  19. Wang, Z., et al.: Pop909: a pop-song dataset for music arrangement generation. In: International Symposium/Conference on Music Information Retrieval (2020)

    Google Scholar 

  20. Yu, B., et al.: Museformer: transformer with fine-and coarse-grained attention for music generation. arXiv preprint arXiv:2210.10349 (2022)

  21. Zhang, K., et al.: WuYun: exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning. arXiv preprint arXiv:2301.04488 (2023)

  22. Zhang, N.: Learning adversarial transformer for symbolic music generation. IEEE Trans. Neural Networks Learn. Syst. 34, 1754–1763 (2020)

    Article  Google Scholar 

  23. Zhang, X., Zhang, J., Qiu, Y., Wang, L., Zhou, J.: Structure-enhanced pop music generation via harmony-aware learning. Cornell University. arXiv:2109.06441 (2021)

  24. Zhao, J., Xia, G.: Accomontage: accompaniment arrangement via phrase selection and style transfer. arXiv preprint arXiv:2108.11213 (2021)

  25. Zou, Y., Zou, P., Zhao, Y., Zhang, K., Zhang, R., Wang, X.: Melons: generating melody with long-term structure using transformers and structure graph. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 191–195. IEEE (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengfei Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, P., Wu, J., Ji, Z. (2024). HRPE: Hierarchical Relative Positional Encoding for Transformer-Based Structured Symbolic Music Generation. In: Li, X., Guan, X., Tie, Y., Zhang, X., Zhou, Q. (eds) Music Intelligence. SOMI 2023. Communications in Computer and Information Science, vol 2007. Springer, Singapore. https://doi.org/10.1007/978-981-97-0576-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0576-4_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0575-7

  • Online ISBN: 978-981-97-0576-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics