Abstract
We make explicit the formalism underlying evaluation in music information retrieval research. We define a “system,” what it means to “analyze” one, and make clear the aims, parts, design, execution, interpretation, assumptions and limitations of its “evaluation.” We apply this formalism to discuss the MIREX automatic mood classification task.
This work was supported in part by Independent Postdoc Grant 11-105218 from Det Frie Forskningsråd. Part of this work was undertaken during a visit to the Centre for Digital Music at Queen Mary University of London, supported by EPSRC grant EP/G007144/1 (Plumbley).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use the terminology and notation of Bailey [7].
- 2.
R. Hamming, “You get what you measure”, lecture at Naval Post- graduate School, June 1995. http://www.youtube.com/watch?v=LNhcaVi3zPA.
- 3.
- 4.
From the MATLAB implementation friedman.
References
Aucouturier, J.J.: Sounds like teen spirit: computational insights into the grounding of everyday musical terms. In: Minett, J., Wang, W. (eds.) Language, Evolution and the Brain. Frontiers in Linguistic Series. Academia Sinica Press, Taipei (2009)
Aucouturier, J.J., Bigand, E.: Seven problems that keep MIR from attracting the interest of cognition and neuroscience. J. Intell. Info. Syst. 41(3), 483–497 (2013)
Aucouturier, J.J., Pachet, F.: Representing music genre: a state of the art. J. New Music Res. 32(1), 83–93 (2003)
Aucouturier, J.J., Pachet, F.: Improving timbre similarity: how high is the sky? J. Neg. Results Speech Audio Sci. 1(1), 1–13 (2004)
Aucouturier, J.J., Pampalk, E.: Introduction - from genres to tags: a little epistemology of music information retrieval research. J. New Music Res. 37(2), 87–92 (2008)
Aucouturier, J.J., Pachet, F., Roy, P., Beurivé, A.: Signal + context = better classification. In: ISMIR, pp. 425–430 (2007)
Bailey, R.A.: Design of Comparative Experiments. Cambridge University Press, Cambridge (2008)
Bertin-Mahieux, T., Eck, D., Mandel, M.: Automatic tagging of audio: the state-of-the-art. In: Wang, W. (ed.) Machine Audition: Principles, Algorithms and Systems. IGI Publishing, New York (2010)
Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of ISMIR (2011). http://labrosa.ee.columbia.edu/millionsong/
Celma, O., Herrera, P., Serra, X.: Bridging the music semantic gap. In: Proceedings of International Conference Semantics and Digital Media Technology (2006)
Craft, A.: The role of culture in the music genre classification task: Human behaviour and its effect on methodology and evaluation. Technical report, Queen Mary University of London, Nov 2007
Craft, A., Wiggins, G.A., Crawford, T.: How many beans make five? The consensus problem in music-genre classification and a new evaluation method for single-genre categorisation systems. In: Proceedings of ISMIR, pp. 73–76 (2007)
Cunningham, S.J., Bainbridge, D., Downie, J.S.: The impact of MIREX on scholarly research. In: Proceedings of ISMIR, pp. 259–264 (2012)
Dougherty, E.R., Dalton, L.A.: Scientific knowledge is possible with small-sample classification. EURASIP J. Bioinform. Syst. Biol. 2013, 10 (2013)
Downie, J., Ehmann, A., Bay, M., Jones, M.: The music information retrieval evaluation exchange: some observations and insights. In: Ras, Z., Wieczorkowska, A. (eds.) Advances in Music Information Retrieval, pp. 93–115. Springer, Heidelberg (2010)
Downie, J.S. (ed.): The MIR/MDL Evaluation Project White Paper Collection (2003). http://www.music-ir.org/evaluation/wp.html
Downie, J.S.: Toward the scientific evaluation of music information retrieval systems. In: Proceedings of ISMIR, Oct 2003
Downie, J.S.: The scientific evaluation of music information retrieval systems: foundations and future. Comput. Music J. 28(2), 12–23 (2004)
Downie, J.S.: The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research. Acoust. Sci. Tech. 29(4), 247–255 (2008)
Flexer, A.: Statistical evaluation of music information retrieval experiments. J. New Music Res. 35(2), 113–120 (2006)
Friedman, M.: The use of ranks to avoid the assumption of normality in the analysis of variance. J. Am. Statist. Assoc. 32, 675–701 (1937)
Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011)
Gouyon, F., Sturm, B.L., Oliveira, J.L., Hespanhol, N., Langlois, T.: On evaluation validity in music autotagging (2014). http://arxiv.org/abs/1410.0001
Hand, D.J.: Deconstructing statistical questions. J. Royal Statist. Soc. A (Statist. Soc.) 157(3), 317–356 (1994)
Hu, X., Downie, J.S., Laurier, C., Bay, M., Ehmann, A.F.: The 2007 MIREX audio mood classification task: lessons learned. In: Proceedings of ISMIR (2008)
Humphrey, E.J., Bello, J.P., LeCun, Y.: Feature learning and deep architectures: new directions for music informatics. J. Intell. Info. Syst. 41(3), 461–481 (2013)
Karydis, I., Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Looking through the “glass ceiling”: a conceptual framework for the problems of spectral similarity. In: ISMIR (2010)
Kimball, A.W.: Errors of the third kind in statistical consulting. J. Am. Stat. Assoc. 52(278), 133–142 (1957)
Marques, G., Domingues, M., Langlois, T., Gouyon, F.: Three current issues in music autotagging. In: Proceedings of ISMIR, pp. 795–800 (2011)
Marques, G., Langlois, T., Gouyon, F., Lopes, M., Sordo, M.: Short-term feature space and music genre classification. J. New Music Res. 40(2), 127–137 (2011)
Marques, G., Lopes, M., Sordo, M., Langlois, T., Gouyon, F.: Additional evidence that common low-level features of individual audio frames are not representative of music genres. In: Proceedings of SMC, Barcelona, Spain, July 2010
McKay, C., Fujinaga, I.: Music genre classification: Is it worth pursuing and how can it be improved? In: Proceedings of ISMIR, pp. 101–106, Oct 2006
MIREX (2012). http://www.music-ir.org/mirex
Pachet, F., Cazaly, D.: A taxonomy of musical genres. In: Proceedings of Content-based Multimedia Information Access Conference, Paris, France, Apr 2000
Pampalk, E., Flexer, A., Widmer, G.: Improvements of audio-based music similarity and genre classification. In: Proceedings of ISMIR, pp. 628–233 (2005)
Peeters, G., Fort, K.: Towards a (better) definition of the description of annotated mir corpora. In: ISMIR, pp. 25–30 (2012)
Rowe, W.: Why system science and cybernetics? IEEE Trans. Syst. Cybernet. 1, 2–3 (1965)
Saheb-Ettaba, C., McFarland, R.B.: The Alpha-numeric System for Classification of Recordings. Bro-Dart Publishing Company, Williamsport (1969)
Schedl, M., Flexer, A., Urbano, J.: The neglected user in music information retrieval research. J. Intell. Info. Syst. 41(3), 523–539 (2013)
Schindler, A., Mayer, R., Rauber, A.: Facilitating comprehensive benchmarking experiments on the million song dataset. In: Proceedings of ISMIR, Oct 2012
Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., Gómez, E., Gouyon, F., Herrera, P., Jordà, S., Paytuvi, O., Peeters, G., Schlüter, J., Vinet, H., Widmer, G.: Roadmap for Music Information ReSearch. Creative Commons (2013)
Sturm, B.L.: A survey of evaluation in music genre recognition. In: Proceedings of Adaptive Multimedia Retrieval, Oct 2012
Sturm, B.L.: Two systems for automatic music genre recognition: what are they really recognizing? In: Proceedings of ACM MIRUM Workshop, pp. 69–74, Nov 2012
Sturm, B.L.: Classification accuracy is not enough: on the evaluation of music genre recognition systems. J. Intell. Info. Syst. 41(3), 371–406 (2013)
Sturm, B.L.: Evaluating music emotion recognition: Lessons from music genre recognition? In: Proceedings of ICME (2013)
Sturm, B.L.: The state of the art ten years after a state of the art: future research in music information retrieval. J. New Music Res. 43(2), 147–172 (2014)
Sturm, B.L.: A simple method to determine if a music information retrieval system is a “horse”. IEEE Trans. Multimedia (in press, 2014)
Sturm, B.L., Kereliuk, C., Pikrakis, A.: A closer look at deep learning neural networks with low-level spectral periodicity features. In: Proceedings of International Workshop on Cognitive Information Processing (2014)
Urbano, J.: Information retrieval meta-evaluation: challenges and opportunities in the music domain. In: Proceedings of ISMIR, pp. 609–614 (2011)
Urbano, J.: Evaluation in Audio Music Similarity. Ph.D. thesis, University Carlos III of Madrid (2013)
Urbano, J., McFee, B., Downie, J.S., Schedl, M.: How significant is statistically significant? the case of audio music similarity and retrieval. In: Proceedings of ISMIR, pp. 181–186 (2012)
Urbano, J., Mónica, M., Morato, J.: Audio music similarity and retrieval: evaluation power and stability. In: Proceedings of ISMIR, pp. 597–602 (2011)
Urbano, J., Schedl, M., Serra, X.: Evaluation in music information retrieval. J. Intell. Info. Syst. 41(3), 345–369 (2013)
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S. Statistics and Computing, 4th edn. Springer, New York (2002)
Wiggins, G.A.: Semantic gap?? Schemantic schmap!! Methodological considerations in the scientific study of music. In: Proceedings of IEEE International Symposium Mulitmedia, pp. 477–482, Dec 2009
Acknowledgments
Many thanks to Mathieu Barthet for inviting this paper, and to Nick Collins for the fun discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sturm, B.L. (2014). Making Explicit the Formalism Underlying Evaluation in Music Information Retrieval Research: A Look at the MIREX Automatic Mood Classification Task. In: Aramaki, M., Derrien, O., Kronland-Martinet, R., Ystad, S. (eds) Sound, Music, and Motion. CMMR 2013. Lecture Notes in Computer Science(), vol 8905. Springer, Cham. https://doi.org/10.1007/978-3-319-12976-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-12976-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12975-4
Online ISBN: 978-3-319-12976-1
eBook Packages: Computer ScienceComputer Science (R0)