Multimedia Tools and Applications

, Volume 48, Issue 1, pp 161–184 | Cite as

Indexing music by mood: design and integration of an automatic content-based annotator

  • Cyril Laurier
  • Owen Meyers
  • Joan Serrà
  • Martin Blech
  • Perfecto Herrera
  • Xavier Serra
Article

Abstract

In the context of content analysis for indexing and retrieval, a method for creating automatic music mood annotation is presented. The method is based on results from psychological studies and framed into a supervised learning approach using musical features automatically extracted from the raw audio signal. We present here some of the most relevant audio features to solve this problem. A ground truth, used for training, is created using both social network information systems (wisdom of crowds) and individual experts (wisdom of the few). At the experimental level, we evaluate our approach on a database of 1,000 songs. Tests of different classification methods, configurations and optimizations have been conducted, showing that Support Vector Machines perform best for the task at hand. Moreover, we evaluate the algorithm robustness against different audio compression schemes. This fact, often neglected, is fundamental to build a system that is usable in real conditions. In addition, the integration of a fast and scalable version of this technique with the European Project PHAROS is discussed. This real world application demonstrates the usability of this tool to annotate large-scale databases. We also report on a user evaluation in the context of the PHAROS search engine, asking people about the utility, interest and innovation of this technology in real world use cases.

Keywords

Music information retrieval Mood annotation Content-based audio Social networks User evaluation 

References

  1. 1.
    Andric A, Haus G (2006) Automatic playlist generation based on tracking user’s listening habits. Multimed Tools Appl 29(2):127–151CrossRefGoogle Scholar
  2. 2.
    Berenson ML, Goldstein M, Levine D (1983) Intermediate statistical methods and applications: a computer package approach. Prentice-HallGoogle Scholar
  3. 3.
    Bigand E, Vieillard S, Madurell F, Marozeau J, Dacquet A (2005) Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition & Emotion 19(8):1113–1139CrossRefGoogle Scholar
  4. 4.
    Boser BE, Guyon, IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In COLT '92: Proceedings of the fifth annual workshop on Computational learning theory. ACM, New York, pp 144–152Google Scholar
  5. 5.
    Casey MA, Veltkamp R, Goto M, Leman M, Rhodes C, Slaney M (2008) Content-based music information retrieval: Current directions and future challenges. Proc IEEE 96(4):668–696CrossRefGoogle Scholar
  6. 6.
    Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  7. 7.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc, B 39(1):1–38MATHMathSciNetGoogle Scholar
  8. 8.
    Downie JS (2008) The music information retrieval evaluation exchange (2005–2007): a window into music information retrieval research. Acoust Sci Technol 29(4):247–255CrossRefGoogle Scholar
  9. 9.
    Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, SomersetMATHGoogle Scholar
  10. 10.
    Farnsworth PR (1954) A study of the Hevner adjective list. J Aesthet Art Crit 13(1):97–103CrossRefGoogle Scholar
  11. 11.
    Gómez E (2006) Tonal description of music audio signals. PhD thesis, Universitat Pompeu FabraGoogle Scholar
  12. 12.
    Gouyon F, Herrera P, Gómez E, Cano P, Bonada J, Loscos A, Amatriain X, Serra X (2008) Content Processing of Music Audio Signals, chapter 3, pages 83–160. Logos Verlag Berlin GmbH, BerlinGoogle Scholar
  13. 13.
    Hevner K (1936) Experimental studies of the elements of expression in music. Am J Psychol 58:246–268CrossRefGoogle Scholar
  14. 14.
    Hu X, Downie JS, Laurier C, Bay M, Ehmann AF (2008) The 2007 MIREX audio mood classification task: Lessons learned. In Proceedings of the 9th International Conference on Music Information Retrieval, pp 462–467, Philadelphia, PA, USA, 2008Google Scholar
  15. 15.
    Juslin PN, Laukka P (2004) Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of New Music Research, 33(3)Google Scholar
  16. 16.
    Juslin PN, Västfjäll D (2008) Emotional responses to music: the need to consider underlying mechanisms. Behavioral and Brain Sciences, 31 (5)Google Scholar
  17. 17.
    Krumhansl CL (1997) An exploratory study of musical emotions and psychophysiology. Can J Exp Psychol 51(4):336–353Google Scholar
  18. 18.
    Laurier C, Herrera P (2007) Audio music mood classification using support vector machine. Music Information Retrieval Evaluation eXchange (MIREX) extended abstractGoogle Scholar
  19. 19.
    Laurier C, Herrera P (2009) Automatic detection of emotion in music: interaction with emotionally sensitive machines. Handbook of Research on Synthetic Emotions and Sociable Robotics. IGI Global, pp 9–32Google Scholar
  20. 20.
    Laurier C, Grivolla J, Herrera P (2008) Multimodal music mood classification using audio and lyrics. In Proceedings of the International Conference on Machine Learning and Applications. San Diego, CA, USAGoogle Scholar
  21. 21.
    Le Cessie S, Van Houwelingen JC (1992) Ridge estimators in logistic regression. Appl Stat 41(1):191–201MATHCrossRefGoogle Scholar
  22. 22.
    Li T, Ogihara M (2003) Detecting emotion in music. In Proceedings of the 4th International Conference on Music Information Retrieval, pages 239–240, Baltimore, MD, USAGoogle Scholar
  23. 23.
    Lidy T, Rauber A, Pertusa A, Iñesta JM (2007) MIREX 2007: combining audio and symbolic descriptors for music classification from audio. MIREX 2007 — music information retrieval evaluation eXchange, Vienna, Austria, September 23–27, 2007Google Scholar
  24. 24.
    Lindström E (1997) Impact of melodic structure on emotional expression. In Proceedings of the Third Triennial ESCOM Conference, pp 292–297Google Scholar
  25. 25.
    Logan B (2000) Mel frequency cepstral coefficients for music modeling. In Proceeding of the 1st International Symposium on Music Information Retrieval, Plymouth, MA, USA, 2000Google Scholar
  26. 26.
    Lu D, Liu L, Zhang H (2006) Automatic mood detection and tracking of music audio signals. IEEE Trans Audio Speech Lang Process 14(1):5–18CrossRefMathSciNetGoogle Scholar
  27. 27.
    Mandel M, Ellis DP (2007) Labrosa’s audio music similarity and classification submissions. MIREX 2007 — Music Information Retrieval Evaluation eXchange, Vienna, Austria, September 23–27, 2007Google Scholar
  28. 28.
    Mandel M, Poliner GE, Ellis DP (2006) Support vector machine active learning for music retrieval. Multimedia Systems, 12(1)Google Scholar
  29. 29.
    Orio N (2006) Music retrieval: a tutorial and review. Found Trends Inf Retr 1(1):1–96CrossRefGoogle Scholar
  30. 30.
    Pachet F, Roy P (2009) Analytical features: a knowledge-based approach to audio feature generation. EURASIP Journal on Audio, Speech, and Music Processing (1)Google Scholar
  31. 31.
    Peeters G (2004) A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Tech. rep., IRCAMGoogle Scholar
  32. 32.
    Peretz I, Gagnon L, Bouchard B (1998) Music and emotion: perceptual determinants, immediacy, and isolation after brain damage. Cognition 68(2):111–141CrossRefGoogle Scholar
  33. 33.
    Quinlan RJ (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San FranciscoGoogle Scholar
  34. 34.
    Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178CrossRefGoogle Scholar
  35. 35.
    Sethares WA (1998) Tuning timbre spectrum scale. Springer-VerlagGoogle Scholar
  36. 36.
    Shi YY, Zhu X, Kim HG, Eom KW (2006) A tempo feature via modulation spectrum analysis and its application to music emotion classification. In Proceedings of the IEEE International Conference on Multimedia and Expo Toronto, Canada, pp 1085–1088Google Scholar
  37. 37.
    Skowronek J, McKinney MF, van de Par S (2007) A demonstrator for automatic music mood estimation. In Proceedings of the International Conference on Music Information Retrieval, Vienna, AustriaGoogle Scholar
  38. 38.
    Smith JO, Abel JS (1999) Bark and erb bilinear transforms. IEEE Trans Speech Audio Process 7(6):697–708CrossRefGoogle Scholar
  39. 39.
    Sordo M, Laurier C, Celma O (2007) Annotating music collections: how content-based similarity helps to propagate labels. In Proceedings of the 8th International Conference on Music Information Retrieval, Vienna, Austria, pp 531–534Google Scholar
  40. 40.
    Thayer RE (1996) The origin of everyday moods: managing energy, tension, and stress. Oxford University Press, OxfordGoogle Scholar
  41. 41.
    Tzanetakis G (2007) Marsyas-0.2: a case study in implementing music information retrieval systems. In Intelligent Music Information SystemsGoogle Scholar
  42. 42.
    Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Audio Speech Lang Process 10(5):293–302CrossRefGoogle Scholar
  43. 43.
    Vieillard S, Peretz I, Gosselin N, Khalfa S, Gagnon L, Bouchard B (2008) Happy, sad, scary and peaceful musical excerpts for research on emotions. Cognition & Emotion 22(4):720–752CrossRefGoogle Scholar
  44. 44.
    Wedin L (1972) A Multidimensional study of perceptual-emotional qualities in music. Scand J Psychol 13(4):241–257CrossRefMathSciNetGoogle Scholar
  45. 45.
    Wieczorkowska A, Synak P, Lewis R, Ras Z (2005) Extracting emotions from music data. In Foundations of Intelligent Systems, Springer-Verlag, pp 456–465Google Scholar
  46. 46.
    Witten IH, Frank E (1999) Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, San FranciscoGoogle Scholar
  47. 47.
    Yang YH, Lin YC, Su YF, Chen HH (2008) A regression approach to music emotion recognition. IEEE Trans Audio Speech Lang Process 16(2):448–457CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Cyril Laurier
    • 1
  • Owen Meyers
    • 1
  • Joan Serrà
    • 1
  • Martin Blech
    • 1
  • Perfecto Herrera
    • 1
  • Xavier Serra
    • 1
  1. 1.Music Technology GroupUniversitat Pompeu FabraBarcelonaSpain

Personalised recommendations