Skip to main content

Advertisement

Log in

Seven problems that keep MIR from attracting the interest of cognition and neuroscience

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Despite one and a half decade of research and an impressive body of knowledge on how to represent and process musical audio signals, the discipline of Music Information Retrieval still does not enjoy broad recognition outside of computer science. In music cognition and neuroscience in particular, where MIR’s contribution could be most needed, MIR technologies are scarcely ever utilized—when they’re not simply brushed aside as irrelevant. This, we contend here, is the result of a series of misunderstandings between the two fields, about deeply different methodologies and assumptions that are not often made explicit. A collaboration between a MIR researcher and a music psychologist, this article attempts to clarify some of these assumptions, and offers some suggestions on how to adapt some of MIR’s most emblematic signal processing paradigms, evaluation procedures and application scenarios to the new challenges brought forth by the natural sciences of music.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. This work primarily addresses the subset of MIR research concerned with the automatic ranking and classification of audio signals, and not the equally-important work based on symbolic musical formats. In the following, we will take the shortcut of referring to audio MIR as, merely, MIR. This does not presume that symbolic MIR should take a secondary role in this debate, of course—see e.g. Volk and Honingh (2011).

  2. In the following, we will collectively refer to these disciplines as the “natural sciences of music”, by which we mean the study by experimental methods of the principles of perception and cognition of music; we do not address in this article other areas that either study music as a cultural artefact (e.g. musicology) or as social capital (e.g. anthropology, sociology, economy).

  3. We do need to acknowledge a few, recent positive examples (MacCallum et al. 2012; Serra et al. 2012) which all the more so encouraged us to write this piece.

  4. We use the term modularity in its “modularity of mind” definition, following Fodor (1983).

  5. The chromagram, yet-another MIR construct, gives, at every time step, the energy found in the signal’s Fourier spectrum in the frequency bands corresponding to each note of the octave— c, c#, d, etc.

  6. By “wrong”, we do not mean that the typical attitude of MIR research is mistaken or flawed. If anything, the tasks and methodologies discussed here are largely to credit for the many and important technological successes achieved in the MIR community. By calling these “wrong”, we propose however that these attitudes, while arguably beneficial for the engineering purposes of MIR, are also harmful to the interdisciplinary dialog between MIR and the natural sciences of music. These are “the wrong things” for a MIR practitioner to do when addressing a psychologist. See also Problem 6 below.

  7. A valuable topic of investigation in its own right, see e.g. Salganik et al. (2006).

  8. A strong initiative not without precedent, e.g. when in 2009 the Python community stopped all changes to the language’s syntax for a period of two years from the release of Python 3.1., in order to let non-CPython implementations “catch up” to the core implementation of the language (Cannon et al. 2009).

  9. One reviewer of this article went even further to say that MIR is merely a category of vocational training, and therefore does not belong to the scientific community and should not be expected to be capable to generate scientific questions. While this view is historically accurate, we believe recent years have seen increasing academic migration between these both extremes, with MIR researchers turning to traditional scientific disciplines such as psychology or neuroscience, and conversely, graduates from such fields coming to MIR in their postgraduate or postdoctoral days. In any case, one is forced to consider that much of the obstacles identified in this article could be addressed by more systematic training for MIR students in the methods of empirical sciences, incl. experimental design and hypothesis testing.

References

  • Alluri, V., & Toiviainen, P. (2010). Exploring perceptual and acoustic correlates of polyphonic timbre. Music Perception, 27(3), 223–241.

    Article  Google Scholar 

  • Aucouturier, J.J. (2009). Sounds like teen spirit: computational insights into the grounding of everyday musical terms. In J. Minett & W. Wang (Eds.), Language, evolution and the brain. Frontiers in linguistics series.

  • Aucouturier, J.J., & Bigand, E. (2012). Mel cepstrum and ann ova: the difficult dialogue between mir and cognitive psychology. In Proc. of the 13th international conference on music information retrieval. Porto, Portugal.

  • Balkwill, L., & Thompson, W.F. (1999). A cross-cultural investigation of the perception of emotion in music: psycho-physical and cultural cues. Music Perception, 17, 43–64.

    Article  Google Scholar 

  • Bertin-Mahieux, T., Eck, D., Maillet, F., Lamere, P. (2008). Autotagger: a model for predicting social tags from acoustic features on large music databases. Journal of New Music Research, 37(2), 151–165.

    Article  Google Scholar 

  • Bigand, E., Delbé, C., Gérard, Y., Tillmann, B. (2011). Categorization of extremely brief auditory stimuli: Domain-specific or domain-general processes? PLoS ONE, 6(10), e27024. doi:10.1371/journal.pone.0027024.

    Article  Google Scholar 

  • Bigand, E., Vieillard, S., Madurel, F., Marozeau, J., Dacquet, A. (2005). Multidimensional scaling of emotions responses to music: effect of musical expertise and duration. Cognition & Emotion, 19, 1113–1139.

    Article  Google Scholar 

  • Birmingham, W.P., & Meek, C.J. (2004). A comprehensive trainable error model for sung music queries. Journal of Artificial Intelligence Research, 22, 57–91.

    Google Scholar 

  • Bonini, F. (2009). All the pain and joy of the world in a single melody: a Transylvanian case study on musical emotion. Music Perception, 26(3), 257–261.

    Article  Google Scholar 

  • Bostanov, V., & Kotchoubey, B. (2004). Recognition of affective prosody: continuous wavelet measures of event-related brain potentials to emotional exclamations. Psychophysiology, 41, 259–268.

    Article  Google Scholar 

  • Cannon, B., Noller, J., van Rossum, G. (2009). Python language moratorium. Python Enhancement Proposals (PEPs) 3003, available: http://www.python.org/dev/peps/pep-3003.

  • Chase, A.R. (2001). Music discriminations by carp (cyprinus carpio). Animal Learning & Behavior, 29(4), 336–353.

    Article  Google Scholar 

  • Chi, T., Ru, P., Shamma, S. (2005). Multi-resolution spectrotemporal analysis of complex sounds. Journal of Acoustical Society of America, 118(2), 887–906.

    Article  Google Scholar 

  • Crouzet, S.M., Kirchner, H., Thorpe, S.J. (2010). Fast saccades toward faces: face detection in just 100 ms. Journal of Vision, 10(4)–16, 1–17. doi:10.1167/10.4.16.

    Article  Google Scholar 

  • De Boer, B., & Kuhl, P. (2003). Investigating the role of infant-directed speech with a computer model. Acoustics Research Letters Online, 4(4), 129–134.

    Article  Google Scholar 

  • Dehaene, S. (1992). Varieties of numerical abilities. Cognition, 44, 1–42.

    Article  Google Scholar 

  • Fiebrink, R., & Fujinaga, I. (2006). Feature selection pitfalls and music classification. In Proc. international conference on music information retrieval.

  • Flexer, A., Schnitzer, D., Schlueter, J. (2012). A mirex meta-analysis of hubness in audio music similarity. In Proc. 13th international conference on music information retrieval. Porto, Portugal.

  • Fodor, J. (1983). Modularity of mind: An essay on faculty psychology. Cambridge: MIT.

    Google Scholar 

  • Ghazanfar, A., & Nicolelis, M. (2001). The structure and function of dynamic cortical and thalamic receptive fields. Cerebral Cortex, 11(3), 183–193.

    Article  Google Scholar 

  • Gigerenzer, G., & Todd, P.M. (1999). Simple heuristics that make us smart. New York: Oxford University Press.

    Google Scholar 

  • Goerlich, K., Witteman, J., Schiller, N., Van Heuven, V., Aleman, A., Martens, S. (2012). The nature of affective priming in music and speech. Journal of Cognitive Neuroscience, 24(8), 1725–1741.

    Article  Google Scholar 

  • Goydke, K., Altenmüller, E., Möller, J., Münte, T. (2004). Changes in emotional tone and instrumental timbre are reflected by the mismatch negativity. Cognitive Brain Research, 21(3), 351–359.

    Article  Google Scholar 

  • Grey, J.M. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America, 61, 1270–1277.

    Article  Google Scholar 

  • Humphrey, E.J., Bello, J.P., LeCun, Y. (2012). Moving beyond feature design: deep architectures and automatic feature learning in music informatics. In Proc. 13th international conference on music information retrieval. Porto, Portugal.

  • Juslin, P., & Sloboda, J. (2010). Handbook of music and emotion. New York: Oxford University Press.

    Google Scholar 

  • Juslin, P., & Västfjäll, D. (2008). Emotional responses to music: the need to consider underlying mechanisms. Behavioural and Brain Sciences, 31, 559–621.

    Google Scholar 

  • Lartillot, O., & Toiviainen, P. (2007). A matlab toolbox for musical feature extraction from audio. In Proceedings of the 10th int. conference on digital audio effects. Bordeaux, France.

  • Lewicki, M. (2002). Efficient coding of natural sounds. Nature Neuroscience, 5(4), 356–363.

    Article  Google Scholar 

  • Lima, C.F., & Castro, S.L. (2011). Emotion recognition in music changes across the adult life span. Cognition and Emotion, 25(4), 585–598.

    Article  Google Scholar 

  • Liu, D., & Zhang, H.J. (2006). Automatic mood detection and tracking of music audio signal. IEEE Transactions on Speech and Audio Processing, 14(1), 5–18.

    Article  Google Scholar 

  • Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In Proc. 1st int. conf. on music information retrieval. Plymouth, MA, USA.

  • MacCallum, B., Mauch, M., Burt, A., Leroi, A.M. (2012). Evolution of music by public choice. Proceedings of the National Academy of Sciences, 109(30), 12081–12086.

    Article  Google Scholar 

  • Mannes, E. (2011). The power of music: Pioneering discoveries in the new science of song. Walker & Co.

  • Masataka, N., & Perlovsky, L. (2012). The efficacy of musical emotions provoked by Mozart’s music for the reconciliation of cognitive dissonance. Scientific Reports, 2. doi:10.1038/srep00694. Accessed 25 Sept 2012.

  • May, P.J.C., & Tiitinen, H. (2010). Mismatch negativity (mmn), the deviance-elicited auditory deflection, explained. Psychophysiology, 47, 66–122.

    Article  Google Scholar 

  • Mithen, S. (2007). The singing neanderthal: The origins of music, language, mind, and body. Cambridge: Harvard University Press.

    Google Scholar 

  • Molnár, C., Kaplan, F., Roy, P., Pachet, F., Pongrácz, P., Dóka, A., Miklósi, Á. (2008). Classification of dog barks: a machine learning approach. Animal Cognition, 11(3), 389–400.

    Article  Google Scholar 

  • Niedenthal, P.M. (2007). Embodying emotion. Science, 316(5827), 1002–1005.

    Article  Google Scholar 

  • Pachet, F., & Roy, P. (2009). Analytical features: a knowledge-based approach to audio feature generation. EURASIP Journal on Audio, Speech, and Music Processing (1). doi:10.1155/2009/153017.

  • Patil, K., Pressnitzer, D., Shamma, S., Elhilali, M. (2012). Music in our ears: the biological bases of musical timbre perception. PLoS Computational Biology, 8(11), e1002759. doi:10.1371/journal.pcbi.1002759.

    Article  Google Scholar 

  • Peeters, G., McAdams, S., Herrera, P. (2000). Instrument sound description in the context of mpeg-7. In Proceedings of the international computer music conference. Berlin, Germany.

  • Peeters, G., Urbano, J., Jones, G.J.F. (2012). Notes from the ismir 2012 late-breaking session on evaluation in music information retrieval. In Proc. 13th international conference on music information retrieval. Porto, Portugal.

  • Platt, J.R. (1964). Strong inference. Science, 146(3642), 347–353.

    Article  Google Scholar 

  • Pollack, I. (1978). Decoupling of auditory pitch and stimulus frequency: The shepard demonstration revisited. Journal of the Acoustical Society of America, 63, 202–206.

    Article  Google Scholar 

  • Poulin-Charronnat, B., Bigand, E., Koelsch, S. (2006). Processing of musical syntax tonic versus subdominant: An event-related potential study. Journal of Cognitive Neuroscience, 18(9), 1545–1554.

    Article  Google Scholar 

  • Rabiner, L.R., & Juang, B.H. (1993). Fundamentals of speech recognition. Prentice-Hall.

  • Sacks, O. (2008). Musicophilia: Tales of music and the brain. Knopf, New York.

    Google Scholar 

  • Salganik, M.J., Dodds, P., Watts, D.J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311(5762), 854–856.

    Article  Google Scholar 

  • Schedl, M., & Flexer, A. (2012). Putting the user in the center of music information retrieval. In Proc. 13th international conference on music information retrieval. Porto, Portugal.

  • Schirmer, A., & Kotz, S. (2006). Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences, 10, 24–30.

    Article  Google Scholar 

  • Serra, J. (2000). Is pattern recognition a physical science? In 15th international conference on pattern recognition. Barcelona, Spain.

  • Serra, J., Corral, A., Boguna, M., Haro, M., Arcos, J.L. (2012). Measuring the evolution of contemporary western popular music. Scientific Reports, 2. doi:10.1038/srep00521. Accessed 26 July 2012.

  • Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T. (2007). Object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 411–426.

    Article  Google Scholar 

  • Sturm, B. (2013). Classification accuracy is not enough: On the analysis of music genre recognition systems. Journal of Intelligent Information Systems (accepted).

  • Szeliski, R. (2011). Computer vision: Algorithms and applications.

  • Teglas, E., Vul, E., Girotto, V., Gonzalez, M., Tenenbaum, J.B., Bonatti, L.L. (2011). Pure reasoning in 12-month-old infants as probabilistic inference. Science, 332, 1054–1059.

    Article  MathSciNet  Google Scholar 

  • Terasawa, H., Slaney, M., Berger, J. (2005). The thirteen colors of timbre. In Proc. IEEE workshop on applications of signal processing to audio and acoustics. New Paltz, NY, USA.

  • Toiviainen, P., Tervaniemi, M., Louhivuori, J., Saher, M., Huotilainen, M., N\( \hat{\hat{a}} \)ätänen, R. (1998). Timbre similarity: convergence of neural, behavioral and computational approaches. Music Perception, 16, 223–241.

    Google Scholar 

  • Urbano, J., Downie, J.S., McFee, B., Schedl, M. (2012). How significant is statistically significant? The case of audio music similarity and retrieval. In Proceedings of 13th international conference on music information retrieval. Porto, Portugal.

  • Volk, A., & Honingh, A. (2012). Mathematical and computational approaches to music: challenges in an interdisciplinary enterprise. Journal of Mathematics and Music, 6(2), 73–81.

    Article  MathSciNet  MATH  Google Scholar 

  • Vuoskoski, J.K., & Eerola, T. (2011). The role of mood and personality in the perception of emotions represented by music. Cortex, 47(9), 1099.

    Article  Google Scholar 

  • Zatorre, R., & Belin, P. (2001). Spectral and temporal processing in human auditory cortex. Cerebral Cortex, 11, 946–953.

    Article  Google Scholar 

  • Zwicker, E. (1977). Procedure for calculating loudness of temporally variable sounds. Journal of the Acoustical Society of America, 62, 675.

    Article  Google Scholar 

Download references

Acknowledgements

We wish to credit Gert Lanckriet (UCSD), Juan Bello (NYU) and Geoffroy Peeters (IRCAM) for an animated discussion at ISMIR 2012 leading to the idea of a moratorium on all non-essential MIR tasks, from which we borrowed some of the thinking in the present Section 3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-Julien Aucouturier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aucouturier, JJ., Bigand, E. Seven problems that keep MIR from attracting the interest of cognition and neuroscience. J Intell Inf Syst 41, 483–497 (2013). https://doi.org/10.1007/s10844-013-0251-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0251-x

Keywords

Navigation