Skip to main content

Multimodal Aspects of Music Retrieval: Audio, Song Lyrics – and Beyond?

  • Chapter
Advances in Music Information Retrieval

Part of the book series: Studies in Computational Intelligence ((SCI,volume 274))

Abstract

Music retrieval is predominantly seen as a problem to be tackled in the acoustic domain. With the exception of symbolic music retrieval and score-based systems, which form rather separate sub-disciplines on their own, most approaches to retrieve recordings of music by content rely on different features extracted from the audio signal. Music is subsequently retrieved by similarity matching, or classified into genre, instrumentation, artist or other categories. Yet, music is an inherently multimodal type of data. Apart from purely instrumental pieces, the lyrics associated with the music are as essential to the reception and the message of a song as is the audio.Albumcovers are carefully designed by artists to convey a message that is consistent with the message sent by the music on the album as well as by the image of a band in general. Music videos, fan sites and other sources of information add to that in a usually coherent manner. This paper takes a look at recent developments in multimodal analysis of music. It discusses different types of information sources available, stressing the multimodal character of music. It then reviews some features that may be extracted from those sources, focussing particularly on audio and lyrics as sources of information. Experimental results on different collections and categorisation tasks will round off the chapter. It shows the merits and open issues to be addressed to fully benefit from the rich and complex information space that music creates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Allamanche, E., Herre, J., Hellmuth, O., Fröba, B., Kastner, T., Cremer, M.: Content-based identification of audio material using MPEG-7 low level description. In: Proceedings of the International Symposium on Music Information Retrieval (ISMIR), Bloomington, IN, USA, October 15-17, pp. 197–204 (2001)

    Google Scholar 

  2. Aucouturier, J.-J., Pachet, F.: Improving timbre similarity: How high is the sky? Journal of Negative Results in Speech and Audio Sciences 1(1) (2004)

    Google Scholar 

  3. Baumann, S., Pohle, T., Vembu, S.: Towards a socio-cultural compatibility of MIR systems. In: Proceedings of the 5th International Conference of Music Information Retrieval (ISMIR 2004), Barcelona, Spain, October 10-14, pp. 460–465 (2004)

    Google Scholar 

  4. Brochu, E., de Freitas, N., Bao, K.: The sound of an album cover: Probabilistic multimedia and IR. In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA, January 3-6 (2003)

    Google Scholar 

  5. Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1994), Las Vegas, USA, pp. 161–175 (1994)

    Google Scholar 

  6. Crysandt, H., Wellhausen, J.: Music classification with MPEG-7. In: Proceedings of SPIE-IS&T Electronic Imaging, Santa Clara (CA), USA, January 2003. Storage and Retrieval for Media Databases, vol. 5021, pp. 307–404. The International Society for Optical Engineering (2003)

    Google Scholar 

  7. Cunningham, S.J., Reeves, N., Britland, M.: An ethnographic study of music information seeking: implications for the design of a music digital library. In: Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries (JCDL 2003), Washington, DC, USA, pp. 5–16. IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

  8. Stephen Downie, J.: Music Information Retrieval. In: Annual Review of Information Science and Technology, vol. 37, pp. 295–340. Information Today, Medford (2003)

    Google Scholar 

  9. Elovitz, H.S., Johnson, R., McHugh, A., Shore, J.E.: Letter-to-sound rules for automatic translation of English text to phonetics. IEEE Transactions on Acoustics, Speech and Signal Processing 24(6), 446–459 (1976)

    Article  Google Scholar 

  10. Foote, J.: An overview of audio information retrieval. Multimedia Systems 7(1), 2–10 (1999)

    Article  Google Scholar 

  11. Goto, M.: A chorus section detection method for musical audio signals and its application to a music listening station. IEEE Transactions on Audio, Speech & Language Processing 14(5), 1783–1794 (2006)

    Article  Google Scholar 

  12. Iskandar, D., Wang, Y., Kan, M.-Y., Li, H.: Syllabic level automatic synchronization of music signals and text lyrics. In: Proceedings of the ACM 14th International Conference on Multimedia (MM 2006), New York, NY, USA, pp. 659–662(2006)

    Google Scholar 

  13. Knees, P., Schedl, M., Pohle, T., Widmer, G.: An Innovative Three-Dimensional User Interface for Exploring Music Collections Enriched with Meta-Information from the Web. In: Proceedings of the ACM 14th International Conference on Multimedia (MM2006), Santa Barbara, California, USA, October 23-26, pp. 17–24 (2006)

    Google Scholar 

  14. Knees, P., Schedl, M., Widmer, G.: Multiple lyrics alignment: Automatic retrieval of song lyrics. In: Proceedings of 6th International Conference on Music Information Retrieval (ISMIR 2005), London, UK, September 11-15, pp. 564–569 (2005)

    Google Scholar 

  15. Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: Proceedings of the Seventh International Conference on Machine Learning and Applications (ICMLA 2008), San Diego, CA, USA, December 11-13, pp. 688–693 (2008)

    Google Scholar 

  16. Li, T., Ogihara, M., Li, Q.: A comparative study on content-based music genre classification. In: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), Toronto, Canada, pp. 282–289 (2003)

    Google Scholar 

  17. Lidy, T., Rauber, A.: Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In: Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005), London, UK, September 11-15, pp. 34–41 (2005)

    Google Scholar 

  18. Lidy, T., Rauber, A., Pertusa, A., Inesta, J.M.: Improving genre classification by combination of audio and symbolic descriptors using a transcription system. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), Vienna, Austria, September 23-27, pp. 61–66 (2007)

    Google Scholar 

  19. Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proceedings of the International Symposium on Music Information Retrieval (ISMIR), Plymouth, Massachusetts, USA, October 23-25 (2000)

    Google Scholar 

  20. Logan, B., Kositsky, A., Moreno, P.: Semantic analysis of song lyrics. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2004), Taipei, Taiwan, June 27-30, pp. 827–830 (2004)

    Google Scholar 

  21. Logan, B., Salomon, A.: A music similarity function based on signal analysis. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Tokyo, Japan (August 2001)

    Google Scholar 

  22. Mahedero, J.P.G., Martínez, Á., Cano, P., Koppenberger, M., Gouyon, F.: Natural language processing of lyrics. In: Proceedings of the ACM 13th International Conference on Multimedia (MM 2005), New York, NY, USA, pp. 475–478 (2005)

    Google Scholar 

  23. Mayer, R., Neumayer, R., Rauber, A.: Combination of audio and lyrics features for genre classification in digital audio collections. In: Proceedings of the ACM Multimedia 2008, October 27-31, pp. 159–168. ACM, New York (2008)

    Google Scholar 

  24. Mayer, R., Neumayer, R., Rauber, A.: Rhyme and style features for musical genre classification by song lyrics. In: Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, PA, USA, September 14-18 (2008)

    Google Scholar 

  25. Neumayer, R., Rauber, A.: Integration of text and audio features for genre classification in music information retrieval. In: Proceedings of the 29th European Conference on Information Retrieval (ECIR 2007), pp. 724–727, Rome, Italy, April 2-5 (2007)

    Google Scholar 

  26. Neumayer, R., Rauber, A.: Multi-modal music information retrieval - visualisation and evaluation of clusterings by both audio and lyrics. In: Proceedings of the 8th Conference Recherche d’Information Assistée par Ordinateur (RIAO 2007), Pittsburgh, PA, USA, May 29th - June 1 (2007)

    Google Scholar 

  27. Orio, N.: Music retrieval: A tutorial and review. Foundations and Trends in Information Retrieval 1(1), 1–90 (2006)

    Article  Google Scholar 

  28. Pampalk, E., Flexer, A., Widmer, G.: Hierarchical organization and description of music collections at the artist level. In: Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2009) (2009)

    Google Scholar 

  29. Pampalk, E., Rauber, A., Merkl, D.: Content-based Organization and Visualization of Music Archives. In: Proceedings of the ACM 10th International Conference on Multimedia (MM 2002), Juan les Pins, France, December 1-6, pp. 570–579 (2002)

    Google Scholar 

  30. Ross Quinlan, J.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  31. Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  32. Rauber, A., Pampalk, E., Merkl, D.: Using psycho-acoustic models and self-organizing maps to create a hierarchical structuring of music by musical styles. In: Proceedings of the 3rd International Symposium on Music Information Retrieval (ISMIR 2002), Paris, France, October 13-17, pp. 71–80 (2002)

    Google Scholar 

  33. Rauber, A., Pampalk, E., Merkl, D.: The SOM-enhanced JukeBox: Organization and visualization of music collections based on perceptual models. Journal of New Music Research 32(2), 193–210 (2003)

    Article  Google Scholar 

  34. Salton, G.: Automatic text processing – The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Longman Publishing Co., Inc., Amsterdam (1989)

    Google Scholar 

  35. Shepard, R.N.: Circularity in judgments of relative pitch. The Journal of the Acoustical Society of America 36(12), 2346–2353 (1964)

    Article  Google Scholar 

  36. Tzanetakis, G., Cook, P.: Marsyas: A framework for audio analysis. Organized Sound 4(30), 169–175 (2000)

    Article  Google Scholar 

  37. Tzanetakis, G., Cook, P.: Sound analysis using MPEG compressed audio. In: Proceedings of the International Conference on Audio, Speech and Signal Processing (ICASSP), Istanbul, Turkey (2000)

    Google Scholar 

  38. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302 (2002)

    Article  Google Scholar 

  39. Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification. In: Proceedings of the International Conference on Multimedia and Expo, ICME (2003)

    Google Scholar 

  40. Yang, D., Lee, W.: Disambiguating music emotion using software agents. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), Barcelona, Spain (October 2004)

    Google Scholar 

  41. Zhu, Y., Chen, K., Sun, Q.: Multimodal content-based structure analysis of karaoke music. In: Proceedings of the ACM 13th International Conference on Multimedia (MM 2005), Singapore, pp. 638–647 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mayer, R., Rauber, A. (2010). Multimodal Aspects of Music Retrieval: Audio, Song Lyrics – and Beyond?. In: Raś, Z.W., Wieczorkowska, A.A. (eds) Advances in Music Information Retrieval. Studies in Computational Intelligence, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11674-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11674-2_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11673-5

  • Online ISBN: 978-3-642-11674-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics