Journal of Intelligent Information Systems

, Volume 41, Issue 3, pp 483–497 | Cite as

Seven problems that keep MIR from attracting the interest of cognition and neuroscience

  • Jean-Julien AucouturierEmail author
  • Emmanuel Bigand


Despite one and a half decade of research and an impressive body of knowledge on how to represent and process musical audio signals, the discipline of Music Information Retrieval still does not enjoy broad recognition outside of computer science. In music cognition and neuroscience in particular, where MIR’s contribution could be most needed, MIR technologies are scarcely ever utilized—when they’re not simply brushed aside as irrelevant. This, we contend here, is the result of a series of misunderstandings between the two fields, about deeply different methodologies and assumptions that are not often made explicit. A collaboration between a MIR researcher and a music psychologist, this article attempts to clarify some of these assumptions, and offers some suggestions on how to adapt some of MIR’s most emblematic signal processing paradigms, evaluation procedures and application scenarios to the new challenges brought forth by the natural sciences of music.


MIR Music cognition Interdisciplinarity 



We wish to credit Gert Lanckriet (UCSD), Juan Bello (NYU) and Geoffroy Peeters (IRCAM) for an animated discussion at ISMIR 2012 leading to the idea of a moratorium on all non-essential MIR tasks, from which we borrowed some of the thinking in the present Section 3.


  1. Alluri, V., & Toiviainen, P. (2010). Exploring perceptual and acoustic correlates of polyphonic timbre. Music Perception, 27(3), 223–241.CrossRefGoogle Scholar
  2. Aucouturier, J.J. (2009). Sounds like teen spirit: computational insights into the grounding of everyday musical terms. In J. Minett & W. Wang (Eds.), Language, evolution and the brain. Frontiers in linguistics series.Google Scholar
  3. Aucouturier, J.J., & Bigand, E. (2012). Mel cepstrum and ann ova: the difficult dialogue between mir and cognitive psychology. In Proc. of the 13th international conference on music information retrieval. Porto, Portugal.Google Scholar
  4. Balkwill, L., & Thompson, W.F. (1999). A cross-cultural investigation of the perception of emotion in music: psycho-physical and cultural cues. Music Perception, 17, 43–64.CrossRefGoogle Scholar
  5. Bertin-Mahieux, T., Eck, D., Maillet, F., Lamere, P. (2008). Autotagger: a model for predicting social tags from acoustic features on large music databases. Journal of New Music Research, 37(2), 151–165.CrossRefGoogle Scholar
  6. Bigand, E., Delbé, C., Gérard, Y., Tillmann, B. (2011). Categorization of extremely brief auditory stimuli: Domain-specific or domain-general processes? PLoS ONE, 6(10), e27024. doi: 10.1371/journal.pone.0027024.CrossRefGoogle Scholar
  7. Bigand, E., Vieillard, S., Madurel, F., Marozeau, J., Dacquet, A. (2005). Multidimensional scaling of emotions responses to music: effect of musical expertise and duration. Cognition & Emotion, 19, 1113–1139.CrossRefGoogle Scholar
  8. Birmingham, W.P., & Meek, C.J. (2004). A comprehensive trainable error model for sung music queries. Journal of Artificial Intelligence Research, 22, 57–91.Google Scholar
  9. Bonini, F. (2009). All the pain and joy of the world in a single melody: a Transylvanian case study on musical emotion. Music Perception, 26(3), 257–261.CrossRefGoogle Scholar
  10. Bostanov, V., & Kotchoubey, B. (2004). Recognition of affective prosody: continuous wavelet measures of event-related brain potentials to emotional exclamations. Psychophysiology, 41, 259–268.CrossRefGoogle Scholar
  11. Cannon, B., Noller, J., van Rossum, G. (2009). Python language moratorium. Python Enhancement Proposals (PEPs) 3003, available:
  12. Chase, A.R. (2001). Music discriminations by carp (cyprinus carpio). Animal Learning & Behavior, 29(4), 336–353.CrossRefGoogle Scholar
  13. Chi, T., Ru, P., Shamma, S. (2005). Multi-resolution spectrotemporal analysis of complex sounds. Journal of Acoustical Society of America, 118(2), 887–906.CrossRefGoogle Scholar
  14. Crouzet, S.M., Kirchner, H., Thorpe, S.J. (2010). Fast saccades toward faces: face detection in just 100 ms. Journal of Vision, 10(4)–16, 1–17. doi: 10.1167/10.4.16.CrossRefGoogle Scholar
  15. De Boer, B., & Kuhl, P. (2003). Investigating the role of infant-directed speech with a computer model. Acoustics Research Letters Online, 4(4), 129–134.CrossRefGoogle Scholar
  16. Dehaene, S. (1992). Varieties of numerical abilities. Cognition, 44, 1–42.CrossRefGoogle Scholar
  17. Fiebrink, R., & Fujinaga, I. (2006). Feature selection pitfalls and music classification. In Proc. international conference on music information retrieval.Google Scholar
  18. Flexer, A., Schnitzer, D., Schlueter, J. (2012). A mirex meta-analysis of hubness in audio music similarity. In Proc. 13th international conference on music information retrieval. Porto, Portugal.Google Scholar
  19. Fodor, J. (1983). Modularity of mind: An essay on faculty psychology. Cambridge: MIT.Google Scholar
  20. Ghazanfar, A., & Nicolelis, M. (2001). The structure and function of dynamic cortical and thalamic receptive fields. Cerebral Cortex, 11(3), 183–193.CrossRefGoogle Scholar
  21. Gigerenzer, G., & Todd, P.M. (1999). Simple heuristics that make us smart. New York: Oxford University Press.Google Scholar
  22. Goerlich, K., Witteman, J., Schiller, N., Van Heuven, V., Aleman, A., Martens, S. (2012). The nature of affective priming in music and speech. Journal of Cognitive Neuroscience, 24(8), 1725–1741.CrossRefGoogle Scholar
  23. Goydke, K., Altenmüller, E., Möller, J., Münte, T. (2004). Changes in emotional tone and instrumental timbre are reflected by the mismatch negativity. Cognitive Brain Research, 21(3), 351–359.CrossRefGoogle Scholar
  24. Grey, J.M. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America, 61, 1270–1277.CrossRefGoogle Scholar
  25. Humphrey, E.J., Bello, J.P., LeCun, Y. (2012). Moving beyond feature design: deep architectures and automatic feature learning in music informatics. In Proc. 13th international conference on music information retrieval. Porto, Portugal.Google Scholar
  26. Juslin, P., & Sloboda, J. (2010). Handbook of music and emotion. New York: Oxford University Press.Google Scholar
  27. Juslin, P., & Västfjäll, D. (2008). Emotional responses to music: the need to consider underlying mechanisms. Behavioural and Brain Sciences, 31, 559–621.Google Scholar
  28. Lartillot, O., & Toiviainen, P. (2007). A matlab toolbox for musical feature extraction from audio. In Proceedings of the 10th int. conference on digital audio effects. Bordeaux, France.Google Scholar
  29. Lewicki, M. (2002). Efficient coding of natural sounds. Nature Neuroscience, 5(4), 356–363.CrossRefGoogle Scholar
  30. Lima, C.F., & Castro, S.L. (2011). Emotion recognition in music changes across the adult life span. Cognition and Emotion, 25(4), 585–598.CrossRefGoogle Scholar
  31. Liu, D., & Zhang, H.J. (2006). Automatic mood detection and tracking of music audio signal. IEEE Transactions on Speech and Audio Processing, 14(1), 5–18.CrossRefGoogle Scholar
  32. Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In Proc. 1st int. conf. on music information retrieval. Plymouth, MA, USA.Google Scholar
  33. MacCallum, B., Mauch, M., Burt, A., Leroi, A.M. (2012). Evolution of music by public choice. Proceedings of the National Academy of Sciences, 109(30), 12081–12086.CrossRefGoogle Scholar
  34. Mannes, E. (2011). The power of music: Pioneering discoveries in the new science of song. Walker & Co.Google Scholar
  35. Masataka, N., & Perlovsky, L. (2012). The efficacy of musical emotions provoked by Mozart’s music for the reconciliation of cognitive dissonance. Scientific Reports, 2. doi: 10.1038/srep00694. Accessed 25 Sept 2012.
  36. May, P.J.C., & Tiitinen, H. (2010). Mismatch negativity (mmn), the deviance-elicited auditory deflection, explained. Psychophysiology, 47, 66–122.CrossRefGoogle Scholar
  37. Mithen, S. (2007). The singing neanderthal: The origins of music, language, mind, and body. Cambridge: Harvard University Press.Google Scholar
  38. Molnár, C., Kaplan, F., Roy, P., Pachet, F., Pongrácz, P., Dóka, A., Miklósi, Á. (2008). Classification of dog barks: a machine learning approach. Animal Cognition, 11(3), 389–400.CrossRefGoogle Scholar
  39. Niedenthal, P.M. (2007). Embodying emotion. Science, 316(5827), 1002–1005.CrossRefGoogle Scholar
  40. Pachet, F., & Roy, P. (2009). Analytical features: a knowledge-based approach to audio feature generation. EURASIP Journal on Audio, Speech, and Music Processing (1). doi: 10.1155/2009/153017.
  41. Patil, K., Pressnitzer, D., Shamma, S., Elhilali, M. (2012). Music in our ears: the biological bases of musical timbre perception. PLoS Computational Biology, 8(11), e1002759. doi: 10.1371/journal.pcbi.1002759.CrossRefGoogle Scholar
  42. Peeters, G., McAdams, S., Herrera, P. (2000). Instrument sound description in the context of mpeg-7. In Proceedings of the international computer music conference. Berlin, Germany.Google Scholar
  43. Peeters, G., Urbano, J., Jones, G.J.F. (2012). Notes from the ismir 2012 late-breaking session on evaluation in music information retrieval. In Proc. 13th international conference on music information retrieval. Porto, Portugal.Google Scholar
  44. Platt, J.R. (1964). Strong inference. Science, 146(3642), 347–353.CrossRefGoogle Scholar
  45. Pollack, I. (1978). Decoupling of auditory pitch and stimulus frequency: The shepard demonstration revisited. Journal of the Acoustical Society of America, 63, 202–206.CrossRefGoogle Scholar
  46. Poulin-Charronnat, B., Bigand, E., Koelsch, S. (2006). Processing of musical syntax tonic versus subdominant: An event-related potential study. Journal of Cognitive Neuroscience, 18(9), 1545–1554.CrossRefGoogle Scholar
  47. Rabiner, L.R., & Juang, B.H. (1993). Fundamentals of speech recognition. Prentice-Hall.Google Scholar
  48. Sacks, O. (2008). Musicophilia: Tales of music and the brain. Knopf, New York.Google Scholar
  49. Salganik, M.J., Dodds, P., Watts, D.J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311(5762), 854–856.CrossRefGoogle Scholar
  50. Schedl, M., & Flexer, A. (2012). Putting the user in the center of music information retrieval. In Proc. 13th international conference on music information retrieval. Porto, Portugal.Google Scholar
  51. Schirmer, A., & Kotz, S. (2006). Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences, 10, 24–30.CrossRefGoogle Scholar
  52. Serra, J. (2000). Is pattern recognition a physical science? In 15th international conference on pattern recognition. Barcelona, Spain.Google Scholar
  53. Serra, J., Corral, A., Boguna, M., Haro, M., Arcos, J.L. (2012). Measuring the evolution of contemporary western popular music. Scientific Reports, 2. doi: 10.1038/srep00521. Accessed 26 July 2012.
  54. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T. (2007). Object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 411–426.CrossRefGoogle Scholar
  55. Sturm, B. (2013). Classification accuracy is not enough: On the analysis of music genre recognition systems. Journal of Intelligent Information Systems (accepted).Google Scholar
  56. Szeliski, R. (2011). Computer vision: Algorithms and applications.Google Scholar
  57. Teglas, E., Vul, E., Girotto, V., Gonzalez, M., Tenenbaum, J.B., Bonatti, L.L. (2011). Pure reasoning in 12-month-old infants as probabilistic inference. Science, 332, 1054–1059.MathSciNetCrossRefGoogle Scholar
  58. Terasawa, H., Slaney, M., Berger, J. (2005). The thirteen colors of timbre. In Proc. IEEE workshop on applications of signal processing to audio and acoustics. New Paltz, NY, USA.Google Scholar
  59. Toiviainen, P., Tervaniemi, M., Louhivuori, J., Saher, M., Huotilainen, M., N\( \hat{\hat{a}} \)ätänen, R. (1998). Timbre similarity: convergence of neural, behavioral and computational approaches. Music Perception, 16, 223–241.Google Scholar
  60. Urbano, J., Downie, J.S., McFee, B., Schedl, M. (2012). How significant is statistically significant? The case of audio music similarity and retrieval. In Proceedings of 13th international conference on music information retrieval. Porto, Portugal.Google Scholar
  61. Volk, A., & Honingh, A. (2012). Mathematical and computational approaches to music: challenges in an interdisciplinary enterprise. Journal of Mathematics and Music, 6(2), 73–81.MathSciNetCrossRefzbMATHGoogle Scholar
  62. Vuoskoski, J.K., & Eerola, T. (2011). The role of mood and personality in the perception of emotions represented by music. Cortex, 47(9), 1099.CrossRefGoogle Scholar
  63. Zatorre, R., & Belin, P. (2001). Spectral and temporal processing in human auditory cortex. Cerebral Cortex, 11, 946–953.CrossRefGoogle Scholar
  64. Zwicker, E. (1977). Procedure for calculating loudness of temporally variable sounds. Journal of the Acoustical Society of America, 62, 675.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.IRCAM/UPMC/CNRS STMS UMR 9912ParisFrance
  2. 2.LEAD/CNRS UMR 5022DijonFrance

Personalised recommendations