From context to concept: exploring semantic relationships in music with word2vec

Chuan, Ching-Hua; Agres, Kat; Herremans, Dorien

doi:10.1007/s00521-018-3923-1

From context to concept: exploring semantic relationships in music with word2vec

Deep learning for music and audio
Published: 08 December 2018

Volume 32, pages 1023–1036, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

1664 Accesses
25 Citations
Explore all metrics

Abstract

We explore the potential of a popular distributional semantics vector space model, word2vec, for capturing meaningful relationships in ecological (complex polyphonic) music. More precisely, the skip-gram version of word2vec is used to model slices of music from a large corpus spanning eight musical genres. In this newly learned vector space, a metric based on cosine distance is able to distinguish between functional chord relationships, as well as harmonic associations in the music. Evidence, based on cosine distance between chord-pair vectors, suggests that an implicit circle-of-fifths exists in the vector space. In addition, a comparison between pieces in different keys reveals that key relationships are represented in word2vec space. These results suggest that the newly learned embedded vector representation does in fact capture tonal and harmonic characteristics of music, without receiving explicit information about the musical content of the constituent slices. In order to investigate whether proximity in the discovered space of embeddings is indicative of ‘semantically-related’ slices, we explore a music generation task, by automatically replacing existing slices from a given piece of music with new slices. We propose an algorithm to find substitute slices based on spatial proximity and the pitch class distribution inferred in the chosen subspace. The results indicate that the size of the subspace used has a significant effect on whether slices belonging to the same key are selected. In sum, the proposed word2vec model is able to learn music-vector embeddings that capture meaningful tonal and harmonic relationships in music, thereby providing a useful tool for exploring musical properties and comparisons across pieces, as a potential input representation for deep learning models, and as a music generation device.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Article 24 April 2024

Investigating Style with Scale Embeddings

ELMDist: A Vector Space Model with Words and MusicBrainz Entities

Notes

References

Agres K, Cancino C, Grachten M, Lattner S (2015) Harmonics co-occurrences bootstrap pitch and tonality perception in music: evidence from a statistical unsupervised learning model. In: Proceedings of the Cognitive Science Society
Agres K, Abdallah S, Pearce M (2018) Information-theoretic properties of auditory sequences dynamically influence expectation and memory. Cognit Sci 42(1):43–76
Article Google Scholar
Agres KR, McGregor S, Rataj K, Purver M, Wiggins GA (2016) Modeling metaphor perception with distributional semantics vector space models. In: Workshop on computational creativity, concept invention, and general intelligence. Proceedings of 5th international workshop, C3GI at ESSLI, pp 1–14
Allan M, Williams CKI (2005) Harmonising chorales by probabilistic inference. In: Proceedings of the advances in neural information processing systems (NIPS), pp 25–32
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155
MATH Google Scholar
Besson M, Schön D (2001) Comparison between language and music. Ann N Y Acad Sci 930(1):232–258
Article Google Scholar
Boulanger-Lewandowski N, Bengio Y, Vincent P (2012) Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. arXiv preprint arXiv:12066392
Cancino-Chacón C, Grachten M, Agres K (2017) From bach to the beatles: the simulation of human tonal expectation using ecologically-trained predictive models. In: ISMIR, Suzhou, China
Chacón CEC, Lattner S, Grachten M (2014) Developing tonal perception through unsupervised learning. In: ISMIR, pp 195–200
Chew E (2000) Towards a mathematical model of tonality. PhD thesis, Massachusetts Institute of Technology
Chew E et al (2014) Mathematical and computational modeling of tonality. AMC 10:12
MATH Google Scholar
Choi K, Fazekas G, Sandler M (2016) Text-based LSTM networks for automatic music composition. arXiv preprint arXiv:160405358
Chuan CH, Herremans D (2018) Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation. In: The thirty-second AAAI conference on artificial intelligence, AAAI, AAAI, New Orleans, USA
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, ACM, pp 160–167
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537
MATH Google Scholar
Conklin D, Witten IH (1995) Multiple viewpoint systems for music prediction. J New Music Res 24(1):51–73
Article Google Scholar
Dhillon P, Foster DP, Ungar LH (2011) Multi-view learning of word embeddings via CCA. In: Proceedings of advances in neural information processing systems (NIPS), pp 199–207
Eck D, Schmidhuber J (2002) Finding temporal structure in music: Blues improvisation with LSTM recurrent networks. In: Proceedings of the 2002 12th IEEE workshop on neural networks for signal processing, 2002. IEEE, pp 747–756
Erk K (2012) Vector space models of word meaning and phrase meaning: a survey. Lang Linguist Compass 6(10):635–653
Article Google Scholar
Firth JR (1957) A synopsis of linguistic theory, 1930–1955. In: Studies in linguistic analysis. The Philological Society, pp 1–32
Goldberg Y, Levy O (2014) word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:14023722
Gutmann MU, Hyvärinen A (2012) Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J Mach Learn Res 13(Feb):307–361
MathSciNet MATH Google Scholar
Harris ZS (1954) Distributional structure. Word 10(2–3):146–162
Article Google Scholar
Herremans D, Chuan CH (2017) Modeling musical context with word2vec, vol 1. In: First international workshop on deep learning and music joint with IJCNN, Anchorage, USA, pp 11–18
Herremans D, Weisser S, Sörensen K, Conklin D (2015) Generating structured music for bagana using quality metrics based on Markov models. Expert Syst Appl 42(21):7424–7435
Article Google Scholar
Herremans D, Chuan CH, Chew E (2017) A functional taxonomy of music generation systems. ACM Comput Surv (CSUR) 50(5):69
Article Google Scholar
Huang CZA, Duvenaud D, Gajos KZ (2016) Chordripple: recommending chords to help novice composers go beyond the ordinary. In: Proceedings of the 21st international conference on intelligent user interfaces, ACM, pp 241–250
Huron DB (2006) Sweet anticipation: music and the psychology of expectation. MIT Press, Cambridge
Book Google Scholar
Kielian-Gilbert M (1990) Interpreting musical analogy: from rhetorical device to perceptual process. Music Percept Interdiscip J 8(1):63–94
Article Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:14085882
Koelsch S, Schmidt Bh, Kansok J (2002) Effects of musical expertise on the early right anterior negativity: an event-related brain potential study. Psychophysiology 39(5):657–663
Article Google Scholar
Korzeniowski F, Widmer G (2016) A fully convolutional deep auditory model for musical chord recognition. In: 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP), IEEE, pp 1–6
Krumhansl CL (1990) Cognitive foundations of musical pitch. Oxford University Press, Oxford
Google Scholar
Krumhansl CL, Schmuckler M (1990) A key-finding algorithm based on tonal hierarchies. In: Cognitive Foundations of Musical Pitch. Oxford University Press, pp 77–110
Lebret R, Collobert R (2013) Word emdeddings through hellinger PCA. arXiv preprint arXiv:13125542
Lerdahl F, Jackendoff R (1977) Toward a formal theory of tonal music. J Music Theory 21(1):111–171
Article Google Scholar
Lewin D (1982) A formal theory of generalized tonal functions. J Music Theory 26(1):23–60
Article Google Scholar
Liddy ED, Paik W, Edmund SY, Li M (1999) Multilingual document retrieval system and method using semantic vector matching. US Patent 6,006,221
Madjiheurem S, Qu L, Walder C (2016) Chord2vec: learning musical chord embeddings. In: Proceedings of the constructive machine learning workshop at 30th conference on neural information processing systems (NIPS2016), Barcelona, Spain
McGregor S, Agres K, Purver M, Wiggins GA (2015) From distributional semantics to conceptual spaces: a novel computational method for concept creation. J Artif Gen Intell 6(1):55–86
Article Google Scholar
Meyer LB (1956) Emotion and meaning in music. University of Chicago Press, Chicago
Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781
Mikolov T, Le QV, Sutskever I (2013) Exploiting similarities among languages for machine translation. arXiv preprint arXiv:13094168
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of advances in neural information processing systems (NIPS), pp 3111–3119
Mikolov T, Yih Wt, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 746–751
Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. In: Proceedings of advances in neural information processing systems (NIPS), pp 1081–1088
Mnih A, Kavukcuoglu K (2013) Learning word embeddings efficiently with noise-contrastive estimation. In: Proceedings of advances in neural information processing systems (NIPS), pp 2265–2273
Noland K, Sandler M (2009) Influences of signal processing, tone profiles, and chord progressions on a model for estimating the musical key from audio. Comput Music J 33(1):42–56
Article Google Scholar
Pearce MT, Wiggins GA (2012) Auditory expectation: the information dynamics of music perception and cognition. Top Cognit Sci 4(4):625–652
Article Google Scholar
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Poliner GE, Ellis DP (2006) A discriminative model for polyphonic piano transcription. EURASIP J Adv Signal Process 2007(1):048,317
Article Google Scholar
Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2539–2544
Poria S, Cambria E, Hazarika D, Vij P (2016) A deeper look into sarcastic tweets using deep convolutional neural networks. arXiv preprint arXiv:161008815
Saffran JR, Johnson EK, Aslin RN, Newport EL (1999) Statistical learning of tone sequences by human infants and adults. Cognition 70(1):27–52
Article Google Scholar
Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Interspeech, pp 338–342
Salton G (1971) The SMART retrieval systemexperiments in automatic document processing. Prentice-Hall, Inc, Upper Saddle River
Google Scholar
Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18:613–620
Article Google Scholar
Schwartz R, Reichart R, Rappoport A (2015) Symmetric pattern based word embeddings for improved word similarity prediction. In: Proceedings of the nineteenth conference on computational natural language learning, pp 258–267
Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston
Google Scholar
Toiviainen P, Eerola T (2016) MIDI toolbox 1.1. https://github.com/miditoolbox/. Accessed Dec 2018
Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was partly supported through SUTD Grant No. SRG ISTD 2017 129.

Author information

Authors and Affiliations

Department of Cinema and Interactive Media, School of Communication, University of Miami, Coral Gables, USA
Ching-Hua Chuan
Social and Cognitive Computing Department, Institute for High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Kat Agres & Dorien Herremans
Information Systems, Technology, and Design Pillar, Singapore University of Technology and Design, Singapore, Singapore
Dorien Herremans

Authors

Ching-Hua Chuan
View author publications
You can also search for this author in PubMed Google Scholar
Kat Agres
View author publications
You can also search for this author in PubMed Google Scholar
Dorien Herremans
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ching-Hua Chuan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Fig. 10.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chuan, CH., Agres, K. & Herremans, D. From context to concept: exploring semantic relationships in music with word2vec. Neural Comput & Applic 32, 1023–1036 (2020). https://doi.org/10.1007/s00521-018-3923-1

Download citation

Received: 22 June 2018
Accepted: 29 November 2018
Published: 08 December 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s00521-018-3923-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From context to concept: exploring semantic relationships in music with word2vec

Abstract

Access this article

Similar content being viewed by others

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Investigating Style with Scale Embeddings

ELMDist: A Vector Space Model with Words and MusicBrainz Entities

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

From context to concept: exploring semantic relationships in music with word2vec

Abstract

Access this article

Similar content being viewed by others

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Investigating Style with Scale Embeddings

ELMDist: A Vector Space Model with Words and MusicBrainz Entities

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation