Music Retrieval in Joint Emotion Space Using Audio Features and Emotional Tags

  • James J. Deng
  • C. H. C. Leung
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7732)


Emotion-based music retrieval provides a natural and humanized way to help people experience music. In this paper, we utilize the three-dimensional Resonance-Arousal-Valence emotion model to represent the emotions invoked by music, and the relationship between acoustic features and their emotional impact based on this model is established. In addition, we also consider the emotional tag features for music, and then represent acoustic features and emotional tag features jointly in a low dimensional embedding space for music emotion, while the joint emotion space is optimized by minimizing the joint loss of acoustic features and emotional tag features through dimension reduction. Finally we construct a unified framework for music retrieval in joint emotion space by the means of query-by-music or query-by-tag or together, and then we utilize our proposed ranking algorithm to return an optimized ranked list that has the highest emotional similarity. The experimental results show that the joint emotion space and unified framework can produce satisfying results for emotion-based music retrieval.


Music retrieval music emotion dimensionality reduction audio features emotional tag ranking 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Machine Learning 73(3), 243–272 (2006)CrossRefGoogle Scholar
  2. 2.
    Bigand, E., Vieillard, S., Madurell, F., Marozeau, J., Dacquet, A.: Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition & Emotion 19(8), 1113–1139 (2005)CrossRefGoogle Scholar
  3. 3.
    Csiszár, I., Tusnády, G.: Information geometry and alternating minimization procedures. Statistics and Decisions suppl (1), 205–237 (1984)Google Scholar
  4. 4.
    Evangelista, A.J.: Google distance between words. Frontiers A Journal of Women Studies, 1–3 (2006)Google Scholar
  5. 5.
    Gebhard, P.: Alma: a layered model of affect. In: Autonomous Agents & Multiagent Systems/Agent Theories, Architectures, and Languages, pp. 29–36 (2005)Google Scholar
  6. 6.
    Han, B.-J., Rho, S., Dannenberg, R.B., Hwang, E.: SMERS: Music Emotion Recognition Using Support Vector Regression. In: International Society for Music Information Retrieval, Number Ismir, pp. 651–656 (2009)Google Scholar
  7. 7.
    Hu, X., Downie, J.S.: Exploring mood metadata: Relationships with genre, artist and usage metadata. In: International Symposium on Music Information Retrieval (2007)Google Scholar
  8. 8.
    Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classification. Information Retrieval 183(Ismir), 411–416 (2009)Google Scholar
  9. 9.
    Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B.G., Richardson, P., Scott, J., Speck, J.A., Turnbull, D.: Music emotion recognition: a state of the art review. Information Retrieval (Ismir), 255–266 (2010)Google Scholar
  10. 10.
    Lartillot, O., Toiviainen, P.: Mir in matlab (ii): A toolbox for musical feature extraction from audio. Spectrum (Ii), 127–130 (2007)Google Scholar
  11. 11.
    Lu, L.L.L., Liu, D., Zhang, H.-J.Z.H.-J.: Automatic mood detection and tracking of music audio signals (2006)Google Scholar
  12. 12.
    Nanopoulos, A., Karydis, I.: Know thy neighbor: Combining audio features and social tags for effective music similarity. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 165–168 (2011)Google Scholar
  13. 13.
    Ortony, A., Clore, G.L., Collins, A.: The Cognitive Structure of Emotions, vol. 18. Cambridge University Press (1988)Google Scholar
  14. 14.
    Ruxanda, M.M., Chua, B.Y., Nanopoulos, A., Jensen, C.S.: Emotion-based music retrieval on a well-reduced audio feature space. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 181–184 (2009)Google Scholar
  15. 15.
    Scherer, K.: Which emotions can be induced by music? what are the underlying mechanisms? and how can we measure them? Journal of New Music Research 33(3), 239–251 (2004)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Schimmack, U., Reisenzein, R.: Experiencing activation: energetic arousal and tense arousal are not mixtures of valence and activation. Emotion 2(4) (2002)Google Scholar
  17. 17.
    Schmidt, E.M., Turnbull, D., Kim, Y.E.: Feature selection for content-based, time-varying musical emotion regression categories and subject descriptors. Spectrum, 267–273 (2010)Google Scholar
  18. 18.
    Thayer, R.: The biopsychology of mood and arousal. Oxford University Press (1989)Google Scholar
  19. 19.
    Turnbull, D.R., Barrington, L., Lanckriet, G.R.G., Yazdani, M.: Combining audio content and social context for semantic music discovery. In: Research and Development in Information Retrieval, pp. 387–394 (2009)Google Scholar
  20. 20.
    Weston, J., Bengio, S., Hamel, P.: Large-scale music annotation and retrieval: Learning to rank in joint semantic spaces. CoRR, abs/1105.5196 (2011)Google Scholar
  21. 21.
    Yan, S., Xu, D., Zhang, B., Zhang, H.-J., Yang, Q., Lin, S.: Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(1), 40–51 (2007)CrossRefGoogle Scholar
  22. 22.
    Yang, Y., Chen, H.: Ranking-based emotion recognition for music organization and retrieval. IEEE Transactions on Audio Speech and Language (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • James J. Deng
    • 1
  • C. H. C. Leung
    • 1
  1. 1.Department of Computer ScienceHong Kong Baptist UniversityHong Kong

Personalised recommendations