Skip to main content
Log in

From context to concept: exploring semantic relationships in music with word2vec

  • Deep learning for music and audio
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript


We explore the potential of a popular distributional semantics vector space model, word2vec, for capturing meaningful relationships in ecological (complex polyphonic) music. More precisely, the skip-gram version of word2vec is used to model slices of music from a large corpus spanning eight musical genres. In this newly learned vector space, a metric based on cosine distance is able to distinguish between functional chord relationships, as well as harmonic associations in the music. Evidence, based on cosine distance between chord-pair vectors, suggests that an implicit circle-of-fifths exists in the vector space. In addition, a comparison between pieces in different keys reveals that key relationships are represented in word2vec space. These results suggest that the newly learned embedded vector representation does in fact capture tonal and harmonic characteristics of music, without receiving explicit information about the musical content of the constituent slices. In order to investigate whether proximity in the discovered space of embeddings is indicative of ‘semantically-related’ slices, we explore a music generation task, by automatically replacing existing slices from a given piece of music with new slices. We propose an algorithm to find substitute slices based on spatial proximity and the pitch class distribution inferred in the chosen subspace. The results indicate that the size of the subspace used has a significant effect on whether slices belonging to the same key are selected. In sum, the proposed word2vec model is able to learn music-vector embeddings that capture meaningful tonal and harmonic relationships in music, thereby providing a useful tool for exploring musical properties and comparisons across pieces, as a potential input representation for deep learning models, and as a music generation device.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others











  1. Agres K, Cancino C, Grachten M, Lattner S (2015) Harmonics co-occurrences bootstrap pitch and tonality perception in music: evidence from a statistical unsupervised learning model. In: Proceedings of the Cognitive Science Society

  2. Agres K, Abdallah S, Pearce M (2018) Information-theoretic properties of auditory sequences dynamically influence expectation and memory. Cognit Sci 42(1):43–76

    Article  Google Scholar 

  3. Agres KR, McGregor S, Rataj K, Purver M, Wiggins GA (2016) Modeling metaphor perception with distributional semantics vector space models. In: Workshop on computational creativity, concept invention, and general intelligence. Proceedings of 5th international workshop, C3GI at ESSLI, pp 1–14

  4. Allan M, Williams CKI (2005) Harmonising chorales by probabilistic inference. In: Proceedings of the advances in neural information processing systems (NIPS), pp 25–32

  5. Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155

    MATH  Google Scholar 

  6. Besson M, Schön D (2001) Comparison between language and music. Ann N Y Acad Sci 930(1):232–258

    Article  Google Scholar 

  7. Boulanger-Lewandowski N, Bengio Y, Vincent P (2012) Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. arXiv preprint arXiv:12066392

  8. Cancino-Chacón C, Grachten M, Agres K (2017) From bach to the beatles: the simulation of human tonal expectation using ecologically-trained predictive models. In: ISMIR, Suzhou, China

  9. Chacón CEC, Lattner S, Grachten M (2014) Developing tonal perception through unsupervised learning. In: ISMIR, pp 195–200

  10. Chew E (2000) Towards a mathematical model of tonality. PhD thesis, Massachusetts Institute of Technology

  11. Chew E et al (2014) Mathematical and computational modeling of tonality. AMC 10:12

    MATH  Google Scholar 

  12. Choi K, Fazekas G, Sandler M (2016) Text-based LSTM networks for automatic music composition. arXiv preprint arXiv:160405358

  13. Chuan CH, Herremans D (2018) Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation. In: The thirty-second AAAI conference on artificial intelligence, AAAI, AAAI, New Orleans, USA

  14. Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, ACM, pp 160–167

  15. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537

    MATH  Google Scholar 

  16. Conklin D, Witten IH (1995) Multiple viewpoint systems for music prediction. J New Music Res 24(1):51–73

    Article  Google Scholar 

  17. Dhillon P, Foster DP, Ungar LH (2011) Multi-view learning of word embeddings via CCA. In: Proceedings of advances in neural information processing systems (NIPS), pp 199–207

  18. Eck D, Schmidhuber J (2002) Finding temporal structure in music: Blues improvisation with LSTM recurrent networks. In: Proceedings of the 2002 12th IEEE workshop on neural networks for signal processing, 2002. IEEE, pp 747–756

  19. Erk K (2012) Vector space models of word meaning and phrase meaning: a survey. Lang Linguist Compass 6(10):635–653

    Article  Google Scholar 

  20. Firth JR (1957) A synopsis of linguistic theory, 1930–1955. In: Studies in linguistic analysis. The Philological Society, pp 1–32

  21. Goldberg Y, Levy O (2014) word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:14023722

  22. Gutmann MU, Hyvärinen A (2012) Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J Mach Learn Res 13(Feb):307–361

    MathSciNet  MATH  Google Scholar 

  23. Harris ZS (1954) Distributional structure. Word 10(2–3):146–162

    Article  Google Scholar 

  24. Herremans D, Chuan CH (2017) Modeling musical context with word2vec, vol 1. In: First international workshop on deep learning and music joint with IJCNN, Anchorage, USA, pp 11–18

  25. Herremans D, Weisser S, Sörensen K, Conklin D (2015) Generating structured music for bagana using quality metrics based on Markov models. Expert Syst Appl 42(21):7424–7435

    Article  Google Scholar 

  26. Herremans D, Chuan CH, Chew E (2017) A functional taxonomy of music generation systems. ACM Comput Surv (CSUR) 50(5):69

    Article  Google Scholar 

  27. Huang CZA, Duvenaud D, Gajos KZ (2016) Chordripple: recommending chords to help novice composers go beyond the ordinary. In: Proceedings of the 21st international conference on intelligent user interfaces, ACM, pp 241–250

  28. Huron DB (2006) Sweet anticipation: music and the psychology of expectation. MIT Press, Cambridge

    Book  Google Scholar 

  29. Kielian-Gilbert M (1990) Interpreting musical analogy: from rhetorical device to perceptual process. Music Percept Interdiscip J 8(1):63–94

    Article  Google Scholar 

  30. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:14085882

  31. Koelsch S, Schmidt Bh, Kansok J (2002) Effects of musical expertise on the early right anterior negativity: an event-related brain potential study. Psychophysiology 39(5):657–663

    Article  Google Scholar 

  32. Korzeniowski F, Widmer G (2016) A fully convolutional deep auditory model for musical chord recognition. In: 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP), IEEE, pp 1–6

  33. Krumhansl CL (1990) Cognitive foundations of musical pitch. Oxford University Press, Oxford

    Google Scholar 

  34. Krumhansl CL, Schmuckler M (1990) A key-finding algorithm based on tonal hierarchies. In: Cognitive Foundations of Musical Pitch. Oxford University Press, pp 77–110

  35. Lebret R, Collobert R (2013) Word emdeddings through hellinger PCA. arXiv preprint arXiv:13125542

  36. Lerdahl F, Jackendoff R (1977) Toward a formal theory of tonal music. J Music Theory 21(1):111–171

    Article  Google Scholar 

  37. Lewin D (1982) A formal theory of generalized tonal functions. J Music Theory 26(1):23–60

    Article  Google Scholar 

  38. Liddy ED, Paik W, Edmund SY, Li M (1999) Multilingual document retrieval system and method using semantic vector matching. US Patent 6,006,221

  39. Madjiheurem S, Qu L, Walder C (2016) Chord2vec: learning musical chord embeddings. In: Proceedings of the constructive machine learning workshop at 30th conference on neural information processing systems (NIPS2016), Barcelona, Spain

  40. McGregor S, Agres K, Purver M, Wiggins GA (2015) From distributional semantics to conceptual spaces: a novel computational method for concept creation. J Artif Gen Intell 6(1):55–86

    Article  Google Scholar 

  41. Meyer LB (1956) Emotion and meaning in music. University of Chicago Press, Chicago

    Google Scholar 

  42. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781

  43. Mikolov T, Le QV, Sutskever I (2013) Exploiting similarities among languages for machine translation. arXiv preprint arXiv:13094168

  44. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of advances in neural information processing systems (NIPS), pp 3111–3119

  45. Mikolov T, Yih Wt, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 746–751

  46. Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. In: Proceedings of advances in neural information processing systems (NIPS), pp 1081–1088

  47. Mnih A, Kavukcuoglu K (2013) Learning word embeddings efficiently with noise-contrastive estimation. In: Proceedings of advances in neural information processing systems (NIPS), pp 2265–2273

  48. Noland K, Sandler M (2009) Influences of signal processing, tone profiles, and chord progressions on a model for estimating the musical key from audio. Comput Music J 33(1):42–56

    Article  Google Scholar 

  49. Pearce MT, Wiggins GA (2012) Auditory expectation: the information dynamics of music perception and cognition. Top Cognit Sci 4(4):625–652

    Article  Google Scholar 

  50. Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  51. Poliner GE, Ellis DP (2006) A discriminative model for polyphonic piano transcription. EURASIP J Adv Signal Process 2007(1):048,317

    Article  Google Scholar 

  52. Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2539–2544

  53. Poria S, Cambria E, Hazarika D, Vij P (2016) A deeper look into sarcastic tweets using deep convolutional neural networks. arXiv preprint arXiv:161008815

  54. Saffran JR, Johnson EK, Aslin RN, Newport EL (1999) Statistical learning of tone sequences by human infants and adults. Cognition 70(1):27–52

    Article  Google Scholar 

  55. Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Interspeech, pp 338–342

  56. Salton G (1971) The SMART retrieval systemexperiments in automatic document processing. Prentice-Hall, Inc, Upper Saddle River

    Google Scholar 

  57. Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18:613–620

    Article  Google Scholar 

  58. Schwartz R, Reichart R, Rappoport A (2015) Symmetric pattern based word embeddings for improved word similarity prediction. In: Proceedings of the nineteenth conference on computational natural language learning, pp 258–267

  59. Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston

    Google Scholar 

  60. Toiviainen P, Eerola T (2016) MIDI toolbox 1.1. Accessed Dec 2018

  61. Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188

    Article  MathSciNet  Google Scholar 

Download references


This research was partly supported through SUTD Grant No. SRG ISTD 2017 129.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ching-Hua Chuan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



See Fig. 10.

Fig. 10
figure 10

Generated slices and their cosine distance to the original slices from Chopin’s Mazurka Op. 67 No. 4, using a top 1, b top 5, c top 10, and d top 20 slices for the search in music word2vec space. Note that as the value of n increases (e.g., moving from figure a down to d), the number of pitches outside of the key (see generated pitches in black) decreases

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chuan, CH., Agres, K. & Herremans, D. From context to concept: exploring semantic relationships in music with word2vec. Neural Comput & Applic 32, 1023–1036 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: