Integration of Text and Audio Features for Genre Classification in Music Information Retrieval

  • Robert Neumayer
  • Andreas Rauber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4425)


Multimedia content can be described in versatile ways as its essence is not limited to one view. For music data these multiple views could be a song’s audio features as well as its lyrics. Both of these modalities have their advantages as text may be easier to search in and could cover more of the ‘content semantics’ of a song, while omitting other types of semantic categorisation. (Psycho)acoustic feature sets, on the other hand, provide the means to identify tracks that ‘sound similar’ while less supporting other kinds of semantic categorisation. Those discerning characteristics of different feature sets meet users’ differing information needs. We will explain the nature of text and audio feature sets which describe the same audio tracks. Moreover, we will propose the use of textual data on top of low level audio features for music genre classification. Further, we will show the impact of different combinations of audio features and textual features based on content words.


Semantic Categorisation Content Word Critical Band Audio Feature Music Genre 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Foote, J.: An overview of audio information retrieval. Multimedia Systems 7(1), 2–10 (1999)CrossRefGoogle Scholar
  2. 2.
    Lidy, T., Rauber, A.: Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In: Proceedings of the Sixth International Conference on Music Information Retrieval (ISMIR 2005), London, UK, September 11-15, 2005, pp. 34–41 (2005)Google Scholar
  3. 3.
    Logan, B., Kositsky, A., Moreno, P.: Semantic analysis of song lyrics. In: Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, ICME 2004, Taipei, Taiwan, 27-30 June 2004, IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  4. 4.
    Mahedero, J.P.G., et al.: Natural language processing of lyrics. In: MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia, pp. 475–478. ACM Press, New York (2005)CrossRefGoogle Scholar
  5. 5.
    Rauber, A., Pampalk, E., Merkl, D.: Using psycho-acoustic models and self-organizing maps to create a hierarchical structuring of music by musical styles. In: Proceedings of the 3rd International Symposium on Music Information Retrieval, Paris, France, October 13-17, 2002, pp. 71–80 (2002)Google Scholar
  6. 6.
    Tzanetakis, G., Cook, P.: Marsyas: A framework for audio analysis. Organized Sound 4(30) (2000)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Robert Neumayer
    • 1
  • Andreas Rauber
    • 1
  1. 1.Vienna University of Technology, Institute for Software Technology and Interactive Systems 

Personalised recommendations