Skip to main content
Log in

Indexing music by mood: design and integration of an automatic content-based annotator

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the context of content analysis for indexing and retrieval, a method for creating automatic music mood annotation is presented. The method is based on results from psychological studies and framed into a supervised learning approach using musical features automatically extracted from the raw audio signal. We present here some of the most relevant audio features to solve this problem. A ground truth, used for training, is created using both social network information systems (wisdom of crowds) and individual experts (wisdom of the few). At the experimental level, we evaluate our approach on a database of 1,000 songs. Tests of different classification methods, configurations and optimizations have been conducted, showing that Support Vector Machines perform best for the task at hand. Moreover, we evaluate the algorithm robustness against different audio compression schemes. This fact, often neglected, is fundamental to build a system that is usable in real conditions. In addition, the integration of a fast and scalable version of this technique with the European Project PHAROS is discussed. This real world application demonstrates the usability of this tool to annotate large-scale databases. We also report on a user evaluation in the context of the PHAROS search engine, asking people about the utility, interest and innovation of this technology in real world use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. In psychology, the term valence describes the attractiveness or aversiveness of an event, object or situation. For instance happy and joy have a positive valence and anger and fear a negative valence.

  2. http://www.music-ir.org/mirex2007/index.php/Audio_Music_Mood_Classification

  3. http://trec.nist.gov/

  4. http://www-nlpir.nist.gov/projects/trecvid/

  5. http://www.last.fm

  6. Wordnet is a large lexical database of English words with sets of synonyms http://wordnet.princeton.edu/

  7. http://www.pharos-audiovisual-search.eu

  8. http://www.webratio.com

References

  1. Andric A, Haus G (2006) Automatic playlist generation based on tracking user’s listening habits. Multimed Tools Appl 29(2):127–151

    Article  Google Scholar 

  2. Berenson ML, Goldstein M, Levine D (1983) Intermediate statistical methods and applications: a computer package approach. Prentice-Hall

  3. Bigand E, Vieillard S, Madurell F, Marozeau J, Dacquet A (2005) Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition & Emotion 19(8):1113–1139

    Article  Google Scholar 

  4. Boser BE, Guyon, IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In COLT '92: Proceedings of the fifth annual workshop on Computational learning theory. ACM, New York, pp 144–152

  5. Casey MA, Veltkamp R, Goto M, Leman M, Rhodes C, Slaney M (2008) Content-based music information retrieval: Current directions and future challenges. Proc IEEE 96(4):668–696

    Article  Google Scholar 

  6. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  7. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc, B 39(1):1–38

    MATH  MathSciNet  Google Scholar 

  8. Downie JS (2008) The music information retrieval evaluation exchange (2005–2007): a window into music information retrieval research. Acoust Sci Technol 29(4):247–255

    Article  Google Scholar 

  9. Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, Somerset

    MATH  Google Scholar 

  10. Farnsworth PR (1954) A study of the Hevner adjective list. J Aesthet Art Crit 13(1):97–103

    Article  Google Scholar 

  11. Gómez E (2006) Tonal description of music audio signals. PhD thesis, Universitat Pompeu Fabra

  12. Gouyon F, Herrera P, Gómez E, Cano P, Bonada J, Loscos A, Amatriain X, Serra X (2008) Content Processing of Music Audio Signals, chapter 3, pages 83–160. Logos Verlag Berlin GmbH, Berlin

  13. Hevner K (1936) Experimental studies of the elements of expression in music. Am J Psychol 58:246–268

    Article  Google Scholar 

  14. Hu X, Downie JS, Laurier C, Bay M, Ehmann AF (2008) The 2007 MIREX audio mood classification task: Lessons learned. In Proceedings of the 9th International Conference on Music Information Retrieval, pp 462–467, Philadelphia, PA, USA, 2008

  15. Juslin PN, Laukka P (2004) Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of New Music Research, 33(3)

  16. Juslin PN, Västfjäll D (2008) Emotional responses to music: the need to consider underlying mechanisms. Behavioral and Brain Sciences, 31 (5)

  17. Krumhansl CL (1997) An exploratory study of musical emotions and psychophysiology. Can J Exp Psychol 51(4):336–353

    Google Scholar 

  18. Laurier C, Herrera P (2007) Audio music mood classification using support vector machine. Music Information Retrieval Evaluation eXchange (MIREX) extended abstract

  19. Laurier C, Herrera P (2009) Automatic detection of emotion in music: interaction with emotionally sensitive machines. Handbook of Research on Synthetic Emotions and Sociable Robotics. IGI Global, pp 9–32

  20. Laurier C, Grivolla J, Herrera P (2008) Multimodal music mood classification using audio and lyrics. In Proceedings of the International Conference on Machine Learning and Applications. San Diego, CA, USA

  21. Le Cessie S, Van Houwelingen JC (1992) Ridge estimators in logistic regression. Appl Stat 41(1):191–201

    Article  MATH  Google Scholar 

  22. Li T, Ogihara M (2003) Detecting emotion in music. In Proceedings of the 4th International Conference on Music Information Retrieval, pages 239–240, Baltimore, MD, USA

  23. Lidy T, Rauber A, Pertusa A, Iñesta JM (2007) MIREX 2007: combining audio and symbolic descriptors for music classification from audio. MIREX 2007 — music information retrieval evaluation eXchange, Vienna, Austria, September 23–27, 2007

  24. Lindström E (1997) Impact of melodic structure on emotional expression. In Proceedings of the Third Triennial ESCOM Conference, pp 292–297

  25. Logan B (2000) Mel frequency cepstral coefficients for music modeling. In Proceeding of the 1st International Symposium on Music Information Retrieval, Plymouth, MA, USA, 2000

  26. Lu D, Liu L, Zhang H (2006) Automatic mood detection and tracking of music audio signals. IEEE Trans Audio Speech Lang Process 14(1):5–18

    Article  MathSciNet  Google Scholar 

  27. Mandel M, Ellis DP (2007) Labrosa’s audio music similarity and classification submissions. MIREX 2007 — Music Information Retrieval Evaluation eXchange, Vienna, Austria, September 23–27, 2007

  28. Mandel M, Poliner GE, Ellis DP (2006) Support vector machine active learning for music retrieval. Multimedia Systems, 12(1)

  29. Orio N (2006) Music retrieval: a tutorial and review. Found Trends Inf Retr 1(1):1–96

    Article  Google Scholar 

  30. Pachet F, Roy P (2009) Analytical features: a knowledge-based approach to audio feature generation. EURASIP Journal on Audio, Speech, and Music Processing (1)

  31. Peeters G (2004) A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Tech. rep., IRCAM

  32. Peretz I, Gagnon L, Bouchard B (1998) Music and emotion: perceptual determinants, immediacy, and isolation after brain damage. Cognition 68(2):111–141

    Article  Google Scholar 

  33. Quinlan RJ (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco

    Google Scholar 

  34. Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178

    Article  Google Scholar 

  35. Sethares WA (1998) Tuning timbre spectrum scale. Springer-Verlag

  36. Shi YY, Zhu X, Kim HG, Eom KW (2006) A tempo feature via modulation spectrum analysis and its application to music emotion classification. In Proceedings of the IEEE International Conference on Multimedia and Expo Toronto, Canada, pp 1085–1088

  37. Skowronek J, McKinney MF, van de Par S (2007) A demonstrator for automatic music mood estimation. In Proceedings of the International Conference on Music Information Retrieval, Vienna, Austria

  38. Smith JO, Abel JS (1999) Bark and erb bilinear transforms. IEEE Trans Speech Audio Process 7(6):697–708

    Article  Google Scholar 

  39. Sordo M, Laurier C, Celma O (2007) Annotating music collections: how content-based similarity helps to propagate labels. In Proceedings of the 8th International Conference on Music Information Retrieval, Vienna, Austria, pp 531–534

  40. Thayer RE (1996) The origin of everyday moods: managing energy, tension, and stress. Oxford University Press, Oxford

    Google Scholar 

  41. Tzanetakis G (2007) Marsyas-0.2: a case study in implementing music information retrieval systems. In Intelligent Music Information Systems

  42. Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Audio Speech Lang Process 10(5):293–302

    Article  Google Scholar 

  43. Vieillard S, Peretz I, Gosselin N, Khalfa S, Gagnon L, Bouchard B (2008) Happy, sad, scary and peaceful musical excerpts for research on emotions. Cognition & Emotion 22(4):720–752

    Article  Google Scholar 

  44. Wedin L (1972) A Multidimensional study of perceptual-emotional qualities in music. Scand J Psychol 13(4):241–257

    Article  MathSciNet  Google Scholar 

  45. Wieczorkowska A, Synak P, Lewis R, Ras Z (2005) Extracting emotions from music data. In Foundations of Intelligent Systems, Springer-Verlag, pp 456–465

  46. Witten IH, Frank E (1999) Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco

    Google Scholar 

  47. Yang YH, Lin YC, Su YF, Chen HH (2008) A regression approach to music emotion recognition. IEEE Trans Audio Speech Lang Process 16(2):448–457

    Article  Google Scholar 

Download references

Acknowledgments

We are very grateful to all the human annotators that helped to create our ground truth dataset. We also want to thank all the people contributing to the Music Technology Group (Universitat Pompeu Fabra, Barcelona) technologies and, in particular, Nicolas Wack, Eduard Aylon and Robert Toscano. We are also grateful to the entire MIREX team, specifically Stephen Downie and Xiao. We finally want to thank Michel Plu and Valérie Botherel from Orange Labs for the user evaluation data and Piero Fraternali, Alessandro Bozzon and Marco Brambilla from WebModels for the user interface. This research has been partially funded by the EU Project PHAROS IST-2006-045035.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cyril Laurier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laurier, C., Meyers, O., Serrà, J. et al. Indexing music by mood: design and integration of an automatic content-based annotator. Multimed Tools Appl 48, 161–184 (2010). https://doi.org/10.1007/s11042-009-0360-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-009-0360-2

Keywords

Navigation