Enhancing Music Information Retrieval by Incorporating Image-Based Local Features

  • Leszek Kaliciak
  • Ben Horsburgh
  • Dawei Song
  • Nirmalie Wiratunga
  • Jeff Pan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7675)


This paper presents a novel approach to music genre classification. Having represented music tracks in the form of two dimensional images, we apply the “bag of visual words” method from visual IR in order to classify the songs into 19 genres. By switching to visual domain, we can abstract from musical concepts such as melody, timbre and rhythm. We obtained classification accuracy of 46% (with 5% theoretical baseline for random classification) which is comparable with existing state-of-the-art approaches. Moreover, the novel features characterize different properties of the signal than standard methods. Therefore, the combination of them should further improve the performance of existing techniques.

The motivation behind this work was the hypothesis, that 2D images of music tracs (spectrograms) perceived as similar would correspond to the same music genres. Conversely, it is possible to treat real life images as spectrograms and utilize music-based features to represent these images in a vector form. This points to an interesting interchangeability between visual and music information retrieval.


Local features Co-occurrence matrix Colour moments K-means algorithm Fourier transform 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Suyoto, I.S.H., Uitdenbogerd, A.L., Scholer, F.: Searching musical audio using symbolic queries. IEEE Transaction on Audio Speech and Language Processing 16(2), 372–381 (2008)CrossRefGoogle Scholar
  2. 2.
    Hu, N., Dannenberg, R., Tzanetakis, G.: Polyphonic audio matching and alignment for music retrieval. In: Proc. IEEE WASPAA, New Paltz, NY (2003)Google Scholar
  3. 3.
    Collins, N.: Using a pitch detector for onset detection. In: Proc. of ISMIR 2005, pp. 100–106 (2005)Google Scholar
  4. 4.
    Foote, J., Cooper, M.: Visualizing musical structure and rhythm via self-similarity. In: Proceedings of the 2001 International Computer Music Conference, pp. 419–422. Citeseer (2001)Google Scholar
  5. 5.
    Bello, J., Daudet, L., Abdallah, L., Duxbury, S., Davies, M., Sandler, M.: A tutorial on onset detection in music signals. IEEE Transaction on Speech and Audio Processing 13(5), 1035 (2005)CrossRefGoogle Scholar
  6. 6.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999)Google Scholar
  7. 7.
    Mikolajczyk, K., Schmidt, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1615–1630 (2005)Google Scholar
  8. 8.
    Lindeberg, T.: Scale-space. Encyclopedia of Computer Science and Engineering 4, 2495–2504 (2009)Google Scholar
  9. 9.
    Lopes, A.P.B., De Avila, S.E.F., Peixoto, A.N.A., Oliveira, R.S., Araujo, A.A.: A bag-of-features approach based on hue-SIFT descriptor for nude detection. In: Proceedings of the 17th European Signal Processing Conference, Glasgow, Scotland (2009)Google Scholar
  10. 10.
    Nowak, E., Jurie, F., Triggs, B.: Sampling Strategies for Bag-of-Features Image Classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, p. 206 (2007)Google Scholar
  12. 12.
    Liu, H., Song, D., Rüger, S.M., Hu, R., Uren, V.S.: Comparing Dissimilarity Measures for Content-Based Image Retrieval. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 44–50. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Meng, A., Ahrendt, P., Larsen, J., Hansen, L.K.: Temporal feature integration for music genre classification. IEEE Transactions on Audio, Speech and Language Processing 15(5), 1654–1664 (2007)CrossRefGoogle Scholar
  14. 14.
    Shi, J., Tomasi, C.: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)Google Scholar
  15. 15.
    Kaliciak, L., Song, D., Wiratunga, N., Pan, J.: Novel local features with hybrid sampling technique for image retrieval. In: Proceedings of Conference on Information and Knowledge Management (CIKM), pp. 1557–1560 (2010)Google Scholar
  16. 16.
    Serra, X.: Audio Content Processing for Automatic Music Genre Classification: Descriptors, Databases, and Classifiers. Doctoral Dissertation (2009)
  17. 17.
    Perrot, D., Gjerdigen, R.: Scanning the Dial: An Exploration of Factors in Identification of Musical Style. In: Proceedings of Social Music Perception Cognition, p. 88 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Leszek Kaliciak
    • 1
  • Ben Horsburgh
    • 1
  • Dawei Song
    • 2
    • 3
  • Nirmalie Wiratunga
    • 1
  • Jeff Pan
    • 4
  1. 1.The Robert Gordon UniversityAberdeenUK
  2. 2.Tianjin UniversityTianjinChina
  3. 3.The Open UniversityMilton KeynesUK
  4. 4.Aberdeen UniversityAberdeenUK

Personalised recommendations