Making Explicit the Formalism Underlying Evaluation in Music Information Retrieval Research: A Look at the MIREX Automatic Mood Classification Task

Sturm, Bob L.

doi:10.1007/978-3-319-12976-1_6

Bob L. Sturm¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8905))

Included in the following conference series:

International Symposium on Computer Music Multidisciplinary Research

1951 Accesses
3 Citations

Abstract

We make explicit the formalism underlying evaluation in music information retrieval research. We define a “system,” what it means to “analyze” one, and make clear the aims, parts, design, execution, interpretation, assumptions and limitations of its “evaluation.” We apply this formalism to discuss the MIREX automatic mood classification task.

This work was supported in part by Independent Postdoc Grant 11-105218 from Det Frie Forskningsråd. Part of this work was undertaken during a visit to the Centre for Digital Music at Queen Mary University of London, supported by EPSRC grant EP/G007144/1 (Plumbley).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use the terminology and notation of Bailey [7].
2.
R. Hamming, “You get what you measure”, lecture at Naval Post- graduate School, June 1995. http://www.youtube.com/watch?v=LNhcaVi3zPA.
3.
http://www.music-ir.org/mirex/wiki/2013:Audio_Classification_(Train/Test)_Tasks.
4.
From the MATLAB implementation friedman.

References

Aucouturier, J.J.: Sounds like teen spirit: computational insights into the grounding of everyday musical terms. In: Minett, J., Wang, W. (eds.) Language, Evolution and the Brain. Frontiers in Linguistic Series. Academia Sinica Press, Taipei (2009)
Google Scholar
Aucouturier, J.J., Bigand, E.: Seven problems that keep MIR from attracting the interest of cognition and neuroscience. J. Intell. Info. Syst. 41(3), 483–497 (2013)
Article Google Scholar
Aucouturier, J.J., Pachet, F.: Representing music genre: a state of the art. J. New Music Res. 32(1), 83–93 (2003)
Article Google Scholar
Aucouturier, J.J., Pachet, F.: Improving timbre similarity: how high is the sky? J. Neg. Results Speech Audio Sci. 1(1), 1–13 (2004)
Google Scholar
Aucouturier, J.J., Pampalk, E.: Introduction - from genres to tags: a little epistemology of music information retrieval research. J. New Music Res. 37(2), 87–92 (2008)
Article Google Scholar
Aucouturier, J.J., Pachet, F., Roy, P., Beurivé, A.: Signal + context = better classification. In: ISMIR, pp. 425–430 (2007)
Google Scholar
Bailey, R.A.: Design of Comparative Experiments. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Bertin-Mahieux, T., Eck, D., Mandel, M.: Automatic tagging of audio: the state-of-the-art. In: Wang, W. (ed.) Machine Audition: Principles, Algorithms and Systems. IGI Publishing, New York (2010)
Google Scholar
Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of ISMIR (2011). http://labrosa.ee.columbia.edu/millionsong/
Celma, O., Herrera, P., Serra, X.: Bridging the music semantic gap. In: Proceedings of International Conference Semantics and Digital Media Technology (2006)
Google Scholar
Craft, A.: The role of culture in the music genre classification task: Human behaviour and its effect on methodology and evaluation. Technical report, Queen Mary University of London, Nov 2007
Google Scholar
Craft, A., Wiggins, G.A., Crawford, T.: How many beans make five? The consensus problem in music-genre classification and a new evaluation method for single-genre categorisation systems. In: Proceedings of ISMIR, pp. 73–76 (2007)
Google Scholar
Cunningham, S.J., Bainbridge, D., Downie, J.S.: The impact of MIREX on scholarly research. In: Proceedings of ISMIR, pp. 259–264 (2012)
Google Scholar
Dougherty, E.R., Dalton, L.A.: Scientific knowledge is possible with small-sample classification. EURASIP J. Bioinform. Syst. Biol. 2013, 10 (2013)
Article Google Scholar
Downie, J., Ehmann, A., Bay, M., Jones, M.: The music information retrieval evaluation exchange: some observations and insights. In: Ras, Z., Wieczorkowska, A. (eds.) Advances in Music Information Retrieval, pp. 93–115. Springer, Heidelberg (2010)
Chapter Google Scholar
Downie, J.S. (ed.): The MIR/MDL Evaluation Project White Paper Collection (2003). http://www.music-ir.org/evaluation/wp.html
Downie, J.S.: Toward the scientific evaluation of music information retrieval systems. In: Proceedings of ISMIR, Oct 2003
Google Scholar
Downie, J.S.: The scientific evaluation of music information retrieval systems: foundations and future. Comput. Music J. 28(2), 12–23 (2004)
Article Google Scholar
Downie, J.S.: The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research. Acoust. Sci. Tech. 29(4), 247–255 (2008)
Article Google Scholar
Flexer, A.: Statistical evaluation of music information retrieval experiments. J. New Music Res. 35(2), 113–120 (2006)
Article Google Scholar
Friedman, M.: The use of ranks to avoid the assumption of normality in the analysis of variance. J. Am. Statist. Assoc. 32, 675–701 (1937)
Article Google Scholar
Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimedia 13(2), 303–319 (2011)
Article Google Scholar
Gouyon, F., Sturm, B.L., Oliveira, J.L., Hespanhol, N., Langlois, T.: On evaluation validity in music autotagging (2014). http://arxiv.org/abs/1410.0001
Hand, D.J.: Deconstructing statistical questions. J. Royal Statist. Soc. A (Statist. Soc.) 157(3), 317–356 (1994)
Article MathSciNet Google Scholar
Hu, X., Downie, J.S., Laurier, C., Bay, M., Ehmann, A.F.: The 2007 MIREX audio mood classification task: lessons learned. In: Proceedings of ISMIR (2008)
Google Scholar
Humphrey, E.J., Bello, J.P., LeCun, Y.: Feature learning and deep architectures: new directions for music informatics. J. Intell. Info. Syst. 41(3), 461–481 (2013)
Article Google Scholar
Karydis, I., Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Looking through the “glass ceiling”: a conceptual framework for the problems of spectral similarity. In: ISMIR (2010)
Google Scholar
Kimball, A.W.: Errors of the third kind in statistical consulting. J. Am. Stat. Assoc. 52(278), 133–142 (1957)
Article Google Scholar
Marques, G., Domingues, M., Langlois, T., Gouyon, F.: Three current issues in music autotagging. In: Proceedings of ISMIR, pp. 795–800 (2011)
Google Scholar
Marques, G., Langlois, T., Gouyon, F., Lopes, M., Sordo, M.: Short-term feature space and music genre classification. J. New Music Res. 40(2), 127–137 (2011)
Article Google Scholar
Marques, G., Lopes, M., Sordo, M., Langlois, T., Gouyon, F.: Additional evidence that common low-level features of individual audio frames are not representative of music genres. In: Proceedings of SMC, Barcelona, Spain, July 2010
Google Scholar
McKay, C., Fujinaga, I.: Music genre classification: Is it worth pursuing and how can it be improved? In: Proceedings of ISMIR, pp. 101–106, Oct 2006
Google Scholar
MIREX (2012). http://www.music-ir.org/mirex
Pachet, F., Cazaly, D.: A taxonomy of musical genres. In: Proceedings of Content-based Multimedia Information Access Conference, Paris, France, Apr 2000
Google Scholar
Pampalk, E., Flexer, A., Widmer, G.: Improvements of audio-based music similarity and genre classification. In: Proceedings of ISMIR, pp. 628–233 (2005)
Google Scholar
Peeters, G., Fort, K.: Towards a (better) definition of the description of annotated mir corpora. In: ISMIR, pp. 25–30 (2012)
Google Scholar
Rowe, W.: Why system science and cybernetics? IEEE Trans. Syst. Cybernet. 1, 2–3 (1965)
Article Google Scholar
Saheb-Ettaba, C., McFarland, R.B.: The Alpha-numeric System for Classification of Recordings. Bro-Dart Publishing Company, Williamsport (1969)
Google Scholar
Schedl, M., Flexer, A., Urbano, J.: The neglected user in music information retrieval research. J. Intell. Info. Syst. 41(3), 523–539 (2013)
Article Google Scholar
Schindler, A., Mayer, R., Rauber, A.: Facilitating comprehensive benchmarking experiments on the million song dataset. In: Proceedings of ISMIR, Oct 2012
Google Scholar
Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., Gómez, E., Gouyon, F., Herrera, P., Jordà, S., Paytuvi, O., Peeters, G., Schlüter, J., Vinet, H., Widmer, G.: Roadmap for Music Information ReSearch. Creative Commons (2013)
Google Scholar
Sturm, B.L.: A survey of evaluation in music genre recognition. In: Proceedings of Adaptive Multimedia Retrieval, Oct 2012
Google Scholar
Sturm, B.L.: Two systems for automatic music genre recognition: what are they really recognizing? In: Proceedings of ACM MIRUM Workshop, pp. 69–74, Nov 2012
Google Scholar
Sturm, B.L.: Classification accuracy is not enough: on the evaluation of music genre recognition systems. J. Intell. Info. Syst. 41(3), 371–406 (2013)
Article Google Scholar
Sturm, B.L.: Evaluating music emotion recognition: Lessons from music genre recognition? In: Proceedings of ICME (2013)
Google Scholar
Sturm, B.L.: The state of the art ten years after a state of the art: future research in music information retrieval. J. New Music Res. 43(2), 147–172 (2014)
Article Google Scholar
Sturm, B.L.: A simple method to determine if a music information retrieval system is a “horse”. IEEE Trans. Multimedia (in press, 2014)
Google Scholar
Sturm, B.L., Kereliuk, C., Pikrakis, A.: A closer look at deep learning neural networks with low-level spectral periodicity features. In: Proceedings of International Workshop on Cognitive Information Processing (2014)
Google Scholar
Urbano, J.: Information retrieval meta-evaluation: challenges and opportunities in the music domain. In: Proceedings of ISMIR, pp. 609–614 (2011)
Google Scholar
Urbano, J.: Evaluation in Audio Music Similarity. Ph.D. thesis, University Carlos III of Madrid (2013)
Google Scholar
Urbano, J., McFee, B., Downie, J.S., Schedl, M.: How significant is statistically significant? the case of audio music similarity and retrieval. In: Proceedings of ISMIR, pp. 181–186 (2012)
Google Scholar
Urbano, J., Mónica, M., Morato, J.: Audio music similarity and retrieval: evaluation power and stability. In: Proceedings of ISMIR, pp. 597–602 (2011)
Google Scholar
Urbano, J., Schedl, M., Serra, X.: Evaluation in music information retrieval. J. Intell. Info. Syst. 41(3), 345–369 (2013)
Article Google Scholar
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S. Statistics and Computing, 4th edn. Springer, New York (2002)
Book MATH Google Scholar
Wiggins, G.A.: Semantic gap?? Schemantic schmap!! Methodological considerations in the scientific study of music. In: Proceedings of IEEE International Symposium Mulitmedia, pp. 477–482, Dec 2009
Google Scholar

Download references

Acknowledgments

Many thanks to Mathieu Barthet for inviting this paper, and to Nick Collins for the fun discussions.

Author information

Authors and Affiliations

Audio Analysis Lab, AD:MT, Aalborg University Copenhagen, A.C. Meyers Vænge 15, 2450, Copenhagen SV, Denmark
Bob L. Sturm

Authors

Bob L. Sturm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bob L. Sturm .

Editor information

Editors and Affiliations

CNRS - LMA, Marseille, France
Mitsuko Aramaki
Toulon-Var University and CNRS - LMA, Marseille, France
Olivier Derrien
CNRS - LMA, Marseille, France
Richard Kronland-Martinet
CNRS - LMA, Marseille, France
Sølvi Ystad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sturm, B.L. (2014). Making Explicit the Formalism Underlying Evaluation in Music Information Retrieval Research: A Look at the MIREX Automatic Mood Classification Task. In: Aramaki, M., Derrien, O., Kronland-Martinet, R., Ystad, S. (eds) Sound, Music, and Motion. CMMR 2013. Lecture Notes in Computer Science(), vol 8905. Springer, Cham. https://doi.org/10.1007/978-3-319-12976-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-12976-1_6
Published: 05 December 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12975-4
Online ISBN: 978-3-319-12976-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics