Skip to main content

Making Explicit the Formalism Underlying Evaluation in Music Information Retrieval Research: A Look at the MIREX Automatic Mood Classification Task

  • Conference paper
  • First Online:
Sound, Music, and Motion (CMMR 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8905))

Included in the following conference series:

Abstract

We make explicit the formalism underlying evaluation in music information retrieval research. We define a “system,” what it means to “analyze” one, and make clear the aims, parts, design, execution, interpretation, assumptions and limitations of its “evaluation.” We apply this formalism to discuss the MIREX automatic mood classification task.

This work was supported in part by Independent Postdoc Grant 11-105218 from Det Frie Forskningsråd. Part of this work was undertaken during a visit to the Centre for Digital Music at Queen Mary University of London, supported by EPSRC grant EP/G007144/1 (Plumbley).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the terminology and notation of Bailey [7].

  2. 2.

    R. Hamming, “You get what you measure”, lecture at Naval Post- graduate School, June 1995. http://www.youtube.com/watch?v=LNhcaVi3zPA.

  3. 3.

    http://www.music-ir.org/mirex/wiki/2013:Audio_Classification_(Train/Test)_Tasks.

  4. 4.

    From the MATLAB implementation friedman.

References

  1. Aucouturier, J.J.: Sounds like teen spirit: computational insights into the grounding of everyday musical terms. In: Minett, J., Wang, W. (eds.) Language, Evolution and the Brain. Frontiers in Linguistic Series. Academia Sinica Press, Taipei (2009)

    Google Scholar 

  2. Aucouturier, J.J., Bigand, E.: Seven problems that keep MIR from attracting the interest of cognition and neuroscience. J. Intell. Info. Syst. 41(3), 483–497 (2013)

    Article  Google Scholar 

  3. Aucouturier, J.J., Pachet, F.: Representing music genre: a state of the art. J. New Music Res. 32(1), 83–93 (2003)

    Article  Google Scholar 

  4. Aucouturier, J.J., Pachet, F.: Improving timbre similarity: how high is the sky? J. Neg. Results Speech Audio Sci. 1(1), 1–13 (2004)

    Google Scholar 

  5. Aucouturier, J.J., Pampalk, E.: Introduction - from genres to tags: a little epistemology of music information retrieval research. J. New Music Res. 37(2), 87–92 (2008)

    Article  Google Scholar 

  6. Aucouturier, J.J., Pachet, F., Roy, P., Beurivé, A.: Signal + context = better classification. In: ISMIR, pp. 425–430 (2007)

    Google Scholar 

  7. Bailey, R.A.: Design of Comparative Experiments. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  8. Bertin-Mahieux, T., Eck, D., Mandel, M.: Automatic tagging of audio: the state-of-the-art. In: Wang, W. (ed.) Machine Audition: Principles, Algorithms and Systems. IGI Publishing, New York (2010)

    Google Scholar 

  9. Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of ISMIR (2011). http://labrosa.ee.columbia.edu/millionsong/

  10. Celma, O., Herrera, P., Serra, X.: Bridging the music semantic gap. In: Proceedings of International Conference Semantics and Digital Media Technology (2006)

    Google Scholar 

  11. Craft, A.: The role of culture in the music genre classification task: Human behaviour and its effect on methodology and evaluation. Technical report, Queen Mary University of London, Nov 2007

    Google Scholar 

  12. Craft, A., Wiggins, G.A., Crawford, T.: How many beans make five? The consensus problem in music-genre classification and a new evaluation method for single-genre categorisation systems. In: Proceedings of ISMIR, pp. 73–76 (2007)

    Google Scholar 

  13. Cunningham, S.J., Bainbridge, D., Downie, J.S.: The impact of MIREX on scholarly research. In: Proceedings of ISMIR, pp. 259–264 (2012)

    Google Scholar 

  14. Dougherty, E.R., Dalton, L.A.: Scientific knowledge is possible with small-sample classification. EURASIP J. Bioinform. Syst. Biol. 2013, 10 (2013)

    Article  Google Scholar 

  15. Downie, J., Ehmann, A., Bay, M., Jones, M.: The music information retrieval evaluation exchange: some observations and insights. In: Ras, Z., Wieczorkowska, A. (eds.) Advances in Music Information Retrieval, pp. 93–115. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Downie, J.S. (ed.): The MIR/MDL Evaluation Project White Paper Collection (2003). http://www.music-ir.org/evaluation/wp.html

  17. Downie, J.S.: Toward the scientific evaluation of music information retrieval systems. In: Proceedings of ISMIR, Oct 2003

    Google Scholar 

  18. Downie, J.S.: The scientific evaluation of music information retrieval systems: foundations and future. Comput. Music J. 28(2), 12–23 (2004)

    Article  Google Scholar 

  19. Downie, J.S.: The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research. Acoust. Sci. Tech. 29(4), 247–255 (2008)

    Article  Google Scholar 

  20. Flexer, A.: Statistical evaluation of music information retrieval experiments. J. New Music Res. 35(2), 113–120 (2006)

    Article  Google Scholar 

  21. Friedman, M.: The use of ranks to avoid the assumption of normality in the analysis of variance. J. Am. Statist. Assoc. 32, 675–701 (1937)

    Article  Google Scholar 

  22. Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011)

    Article  Google Scholar 

  23. Gouyon, F., Sturm, B.L., Oliveira, J.L., Hespanhol, N., Langlois, T.: On evaluation validity in music autotagging (2014). http://arxiv.org/abs/1410.0001

  24. Hand, D.J.: Deconstructing statistical questions. J. Royal Statist. Soc. A (Statist. Soc.) 157(3), 317–356 (1994)

    Article  MathSciNet  Google Scholar 

  25. Hu, X., Downie, J.S., Laurier, C., Bay, M., Ehmann, A.F.: The 2007 MIREX audio mood classification task: lessons learned. In: Proceedings of ISMIR (2008)

    Google Scholar 

  26. Humphrey, E.J., Bello, J.P., LeCun, Y.: Feature learning and deep architectures: new directions for music informatics. J. Intell. Info. Syst. 41(3), 461–481 (2013)

    Article  Google Scholar 

  27. Karydis, I., Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Looking through the “glass ceiling”: a conceptual framework for the problems of spectral similarity. In: ISMIR (2010)

    Google Scholar 

  28. Kimball, A.W.: Errors of the third kind in statistical consulting. J. Am. Stat. Assoc. 52(278), 133–142 (1957)

    Article  Google Scholar 

  29. Marques, G., Domingues, M., Langlois, T., Gouyon, F.: Three current issues in music autotagging. In: Proceedings of ISMIR, pp. 795–800 (2011)

    Google Scholar 

  30. Marques, G., Langlois, T., Gouyon, F., Lopes, M., Sordo, M.: Short-term feature space and music genre classification. J. New Music Res. 40(2), 127–137 (2011)

    Article  Google Scholar 

  31. Marques, G., Lopes, M., Sordo, M., Langlois, T., Gouyon, F.: Additional evidence that common low-level features of individual audio frames are not representative of music genres. In: Proceedings of SMC, Barcelona, Spain, July 2010

    Google Scholar 

  32. McKay, C., Fujinaga, I.: Music genre classification: Is it worth pursuing and how can it be improved? In: Proceedings of ISMIR, pp. 101–106, Oct 2006

    Google Scholar 

  33. MIREX (2012). http://www.music-ir.org/mirex

  34. Pachet, F., Cazaly, D.: A taxonomy of musical genres. In: Proceedings of Content-based Multimedia Information Access Conference, Paris, France, Apr 2000

    Google Scholar 

  35. Pampalk, E., Flexer, A., Widmer, G.: Improvements of audio-based music similarity and genre classification. In: Proceedings of ISMIR, pp. 628–233 (2005)

    Google Scholar 

  36. Peeters, G., Fort, K.: Towards a (better) definition of the description of annotated mir corpora. In: ISMIR, pp. 25–30 (2012)

    Google Scholar 

  37. Rowe, W.: Why system science and cybernetics? IEEE Trans. Syst. Cybernet. 1, 2–3 (1965)

    Article  Google Scholar 

  38. Saheb-Ettaba, C., McFarland, R.B.: The Alpha-numeric System for Classification of Recordings. Bro-Dart Publishing Company, Williamsport (1969)

    Google Scholar 

  39. Schedl, M., Flexer, A., Urbano, J.: The neglected user in music information retrieval research. J. Intell. Info. Syst. 41(3), 523–539 (2013)

    Article  Google Scholar 

  40. Schindler, A., Mayer, R., Rauber, A.: Facilitating comprehensive benchmarking experiments on the million song dataset. In: Proceedings of ISMIR, Oct 2012

    Google Scholar 

  41. Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., Gómez, E., Gouyon, F., Herrera, P., Jordà, S., Paytuvi, O., Peeters, G., Schlüter, J., Vinet, H., Widmer, G.: Roadmap for Music Information ReSearch. Creative Commons (2013)

    Google Scholar 

  42. Sturm, B.L.: A survey of evaluation in music genre recognition. In: Proceedings of Adaptive Multimedia Retrieval, Oct 2012

    Google Scholar 

  43. Sturm, B.L.: Two systems for automatic music genre recognition: what are they really recognizing? In: Proceedings of ACM MIRUM Workshop, pp. 69–74, Nov 2012

    Google Scholar 

  44. Sturm, B.L.: Classification accuracy is not enough: on the evaluation of music genre recognition systems. J. Intell. Info. Syst. 41(3), 371–406 (2013)

    Article  Google Scholar 

  45. Sturm, B.L.: Evaluating music emotion recognition: Lessons from music genre recognition? In: Proceedings of ICME (2013)

    Google Scholar 

  46. Sturm, B.L.: The state of the art ten years after a state of the art: future research in music information retrieval. J. New Music Res. 43(2), 147–172 (2014)

    Article  Google Scholar 

  47. Sturm, B.L.: A simple method to determine if a music information retrieval system is a “horse”. IEEE Trans. Multimedia (in press, 2014)

    Google Scholar 

  48. Sturm, B.L., Kereliuk, C., Pikrakis, A.: A closer look at deep learning neural networks with low-level spectral periodicity features. In: Proceedings of International Workshop on Cognitive Information Processing (2014)

    Google Scholar 

  49. Urbano, J.: Information retrieval meta-evaluation: challenges and opportunities in the music domain. In: Proceedings of ISMIR, pp. 609–614 (2011)

    Google Scholar 

  50. Urbano, J.: Evaluation in Audio Music Similarity. Ph.D. thesis, University Carlos III of Madrid (2013)

    Google Scholar 

  51. Urbano, J., McFee, B., Downie, J.S., Schedl, M.: How significant is statistically significant? the case of audio music similarity and retrieval. In: Proceedings of ISMIR, pp. 181–186 (2012)

    Google Scholar 

  52. Urbano, J., Mónica, M., Morato, J.: Audio music similarity and retrieval: evaluation power and stability. In: Proceedings of ISMIR, pp. 597–602 (2011)

    Google Scholar 

  53. Urbano, J., Schedl, M., Serra, X.: Evaluation in music information retrieval. J. Intell. Info. Syst. 41(3), 345–369 (2013)

    Article  Google Scholar 

  54. Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S. Statistics and Computing, 4th edn. Springer, New York (2002)

    Book  MATH  Google Scholar 

  55. Wiggins, G.A.: Semantic gap?? Schemantic schmap!! Methodological considerations in the scientific study of music. In: Proceedings of IEEE International Symposium Mulitmedia, pp. 477–482, Dec 2009

    Google Scholar 

Download references

Acknowledgments

Many thanks to Mathieu Barthet for inviting this paper, and to Nick Collins for the fun discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bob L. Sturm .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sturm, B.L. (2014). Making Explicit the Formalism Underlying Evaluation in Music Information Retrieval Research: A Look at the MIREX Automatic Mood Classification Task. In: Aramaki, M., Derrien, O., Kronland-Martinet, R., Ystad, S. (eds) Sound, Music, and Motion. CMMR 2013. Lecture Notes in Computer Science(), vol 8905. Springer, Cham. https://doi.org/10.1007/978-3-319-12976-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12976-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12975-4

  • Online ISBN: 978-3-319-12976-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics