Multimedia Systems

, Volume 12, Issue 1, pp 3–13 | Cite as

Support vector machine active learning for music retrieval

  • Michael I. Mandel
  • Graham E. Poliner
  • Daniel P. W. Ellis
Regular Paper

Abstract

Searching and organizing growing digital music collections requires a computational model of music similarity. This paper describes a system for performing flexible music similarity queries using SVM active learning. We evaluated the success of our system by classifying 1210 pop songs according to mood and style (from an online music guide) and by the performing artist. In comparing a number of representations for songs, we found the statistics of mel-frequency cepstral coefficients to perform best in precision-at-20 comparisons. We also show that by choosing training examples intelligently, active learning requires half as many labeled examples to achieve the same accuracy as a standard scheme.

Keywords

Support vector machines Active learning Music classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Aucouturier, J.J., Pachet, F.: Improving timbre similarity: How high's the sky? J. Negative Results Speech Audio Sci. 1(1), (2004)Google Scholar
  3. 3.
    Berenzweig, A., Ellis, D.P.W., Lawrence, S.:Using voice segmentsto improve artist classification of music. In: Proceedings of AES International Conference on Virtual, Synthetic, and Entertainment Audio. Espoo, Finland (2002)Google Scholar
  4. 4.
    Berenzweig, A., Ellis, D.P.W., Lawrence, S.: Anchor space for classification and similarity measurement of music. In: Proceedings of IEEE International Conference on Multimedia & Expo, pp. 29–32 (2003)Google Scholar
  5. 5.
    Berenzweig, A., Logan, B., Ellis, D.P.W., Whitman, B.: A large-scale evalutation of acoustic and subjective music similarity measures. In: Proceedings International Conference on Music Information Retrieval, pp. 103–109 (2003)Google Scholar
  6. 6.
    Burgess, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discov. 2(2), 121–167 (1998)Google Scholar
  7. 7.
    Chang, E.Y., Tong, S., Goh, K., Chang, C.W.: Support vector machine concept-dependent active learning for image retrieval. ACM Trans. Multimedia (2005) in pressGoogle Scholar
  8. 8.
    Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian Information Criterion. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (1998)Google Scholar
  9. 9.
    Cristianini, N., Shawe-Taylor, J.: An introduction to support Vector Machines: And other kernel-based learning methods, Cambridge University Press, New York, NY (2000)Google Scholar
  10. 10.
    Downie, J.S., West, K., Ehmann, A., Vincent, E.: The 2005 music information retrieval evaluation exchange (MIREX 2005): Preliminary overview. In: Reiss, J.D., Wiggins, G.A. (eds.) Proceedings of the International Conference on Music Information Retrieval, pp. 320–323 (2005)Google Scholar
  11. 11.
    Ellis, D., Berenzweig, A., Whitman, B.: The “uspop2002” pop music data set (2003). URL http://labrosa.ee. columbia.edu/projects/musicsim/uspop2002.html
  12. 12.
    Ellis, D.P.W., Whitman, B., Berenzweig, A., Lawrence, S.: The quest for ground truth in musical artist similarity. In: Proceedings of the International Conference on Music Information Retrieval, pp. 170–177 (2002)Google Scholar
  13. 13.
    Foote, J.T.: Content-based retrieval of music and audio. In: C.C.J.K. et al. (ed.) Proceedings Storage and Retrieval for Image and Video Databases (SPIE), vol. 3229, pp. 138–147 (1997)Google Scholar
  14. 14.
    Gish, H., Siu, M.H., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 873–876 (1991)Google Scholar
  15. 15.
    Hoashi, K., Matsumoto, K., Inoue, N.: Personalization of user profiles for content-based music retrieval based on relevance feedback. In: Proceedings of ACM International Conference on Multimedia, pp. 110–119. ACM Press, New York, NY (2003)Google Scholar
  16. 16.
    Hoashi, K., Zeitler, E., Inoue, N.: Implementation of relevance feedback for content-based music retrieval based on user prefences. In: International ACM SIGIR conference on Research and development in information retrieval, pp. 385–386. ACM Press, New York, NY (2002)Google Scholar
  17. 17.
    Ihler, A.: Kernel density estimation toolbox for MATLAB (2005)URL http://ssg.mit.edu/~ihler/code/
  18. 18.
    Jaakkola, T.S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, pp. 487–493. MIT Press, Cambridge, MA (1999)Google Scholar
  19. 19.
    Lai, W.C., Goh, K., Chang, E.Y.: On scalability of active learning for formulating query concepts. In: Amsaleg, L., Jónsson, B.T., Oria, V. (eds.) Workshop on Computer Vision Meets Databases, CVDB, pp. 11–18. ACM (2004)Google Scholar
  20. 20.
    Logan, B.: Mel frequency cepstral coefficients for music modelling. In: Proceedings of the International Conference on Music Information Retrieval, pp. 33–45 (2000)Google Scholar
  21. 21.
    Logan, B., Salomon, A.: A music similarity function based on signal analysis. In: Proceedings of IEEE International Conference on Multimedia & Expo. Tokyo, Japan, pp. 745–748 (2001)Google Scholar
  22. 22.
    Moreno, P., Rifkin, R.: Using the fisher kernel for web audio classification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 2417–2420 (2000)Google Scholar
  23. 23.
    Moreno, P.J., Ho, P.P., Vasconcelos, N.: A kullback-leibler divergence based kernel for SVM classification in multimedia applications. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA (2004)Google Scholar
  24. 24.
    Oppenheim, A.V.: A speech analysis-synthesis system based on homomorphic filtering. J. Acoust. Soc. Am. 45, 458–465 (1969)CrossRefPubMedGoogle Scholar
  25. 25.
    Penny, W.D.: Kullback-Liebler divergences of normal, gamma, Dirichlet and Wishart densities. Technical report, Wellcome Department of Cognitive Neurology (2001)Google Scholar
  26. 26.
    Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification. In: Solla, S., Leen, T., Mueller, K.R. (eds.) Advances in Neural Information Processing Systems, pp. 547–553 (2000)Google Scholar
  27. 27.
    Tong, S., Chang, E.: Support vector machine active learning for image retrieval. In: Proceedings of ACM International Conference on Multimedia, pp. 107–118. ACM Press, New York, NY (2001)Google Scholar
  28. 28.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of the International Conference on Machine Learning, pp. 999–1006 (2000)Google Scholar
  29. 29.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learning Res. 2, 45–66 (2001)CrossRefGoogle Scholar
  30. 30.
    Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)CrossRefGoogle Scholar
  31. 31.
    Whitman, B., Flake, G., Lawrence, S.: Artist detection in music with minnowmatch. In: IEEE Workshop on Neural Networks for Signal Processing, pp. 559–568. Falmouth, Massachusetts (2001)Google Scholar
  32. 32.
    Whitman, B., Rifkin, R.: Musical query-by-description as a multi-class learning problem. In: Proceedings of IEEE Multimedia Signal Processing Conference, pp. 153–156 (2002)Google Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  • Michael I. Mandel
    • 1
  • Graham E. Poliner
    • 1
  • Daniel P. W. Ellis
    • 1
  1. 1.Department of Electrical Engineering, 1312 S.W. MuddNew YorkU.S.A

Personalised recommendations