Multimodal Aspects of Music Retrieval: Audio, Song Lyrics – and Beyond?

Mayer, Rudolf; Rauber, Andreas

doi:10.1007/978-3-642-11674-2_15

Rudolf Mayer⁴ &
Andreas Rauber⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 274))

2050 Accesses
5 Citations

Abstract

Music retrieval is predominantly seen as a problem to be tackled in the acoustic domain. With the exception of symbolic music retrieval and score-based systems, which form rather separate sub-disciplines on their own, most approaches to retrieve recordings of music by content rely on different features extracted from the audio signal. Music is subsequently retrieved by similarity matching, or classified into genre, instrumentation, artist or other categories. Yet, music is an inherently multimodal type of data. Apart from purely instrumental pieces, the lyrics associated with the music are as essential to the reception and the message of a song as is the audio.Albumcovers are carefully designed by artists to convey a message that is consistent with the message sent by the music on the album as well as by the image of a band in general. Music videos, fan sites and other sources of information add to that in a usually coherent manner. This paper takes a look at recent developments in multimodal analysis of music. It discusses different types of information sources available, stressing the multimodal character of music. It then reviews some features that may be extracted from those sources, focussing particularly on audio and lyrics as sources of information. Experimental results on different collections and categorisation tasks will round off the chapter. It shows the merits and open issues to be addressed to fully benefit from the rich and complex information space that music creates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

ALF-200k: Towards Extensive Multimodal Analyses of Music Tracks and Playlists

Music Information Retrieval: A Window into the Needs and Challenges

From Audio to Music Notation

References

Allamanche, E., Herre, J., Hellmuth, O., Fröba, B., Kastner, T., Cremer, M.: Content-based identification of audio material using MPEG-7 low level description. In: Proceedings of the International Symposium on Music Information Retrieval (ISMIR), Bloomington, IN, USA, October 15-17, pp. 197–204 (2001)
Google Scholar
Aucouturier, J.-J., Pachet, F.: Improving timbre similarity: How high is the sky? Journal of Negative Results in Speech and Audio Sciences 1(1) (2004)
Google Scholar
Baumann, S., Pohle, T., Vembu, S.: Towards a socio-cultural compatibility of MIR systems. In: Proceedings of the 5th International Conference of Music Information Retrieval (ISMIR 2004), Barcelona, Spain, October 10-14, pp. 460–465 (2004)
Google Scholar
Brochu, E., de Freitas, N., Bao, K.: The sound of an album cover: Probabilistic multimedia and IR. In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA, January 3-6 (2003)
Google Scholar
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1994), Las Vegas, USA, pp. 161–175 (1994)
Google Scholar
Crysandt, H., Wellhausen, J.: Music classification with MPEG-7. In: Proceedings of SPIE-IS&T Electronic Imaging, Santa Clara (CA), USA, January 2003. Storage and Retrieval for Media Databases, vol. 5021, pp. 307–404. The International Society for Optical Engineering (2003)
Google Scholar
Cunningham, S.J., Reeves, N., Britland, M.: An ethnographic study of music information seeking: implications for the design of a music digital library. In: Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries (JCDL 2003), Washington, DC, USA, pp. 5–16. IEEE Computer Society, Los Alamitos (2003)
Google Scholar
Stephen Downie, J.: Music Information Retrieval. In: Annual Review of Information Science and Technology, vol. 37, pp. 295–340. Information Today, Medford (2003)
Google Scholar
Elovitz, H.S., Johnson, R., McHugh, A., Shore, J.E.: Letter-to-sound rules for automatic translation of English text to phonetics. IEEE Transactions on Acoustics, Speech and Signal Processing 24(6), 446–459 (1976)
Article Google Scholar
Foote, J.: An overview of audio information retrieval. Multimedia Systems 7(1), 2–10 (1999)
Article Google Scholar
Goto, M.: A chorus section detection method for musical audio signals and its application to a music listening station. IEEE Transactions on Audio, Speech & Language Processing 14(5), 1783–1794 (2006)
Article Google Scholar
Iskandar, D., Wang, Y., Kan, M.-Y., Li, H.: Syllabic level automatic synchronization of music signals and text lyrics. In: Proceedings of the ACM 14th International Conference on Multimedia (MM 2006), New York, NY, USA, pp. 659–662(2006)
Google Scholar
Knees, P., Schedl, M., Pohle, T., Widmer, G.: An Innovative Three-Dimensional User Interface for Exploring Music Collections Enriched with Meta-Information from the Web. In: Proceedings of the ACM 14th International Conference on Multimedia (MM2006), Santa Barbara, California, USA, October 23-26, pp. 17–24 (2006)
Google Scholar
Knees, P., Schedl, M., Widmer, G.: Multiple lyrics alignment: Automatic retrieval of song lyrics. In: Proceedings of 6th International Conference on Music Information Retrieval (ISMIR 2005), London, UK, September 11-15, pp. 564–569 (2005)
Google Scholar
Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: Proceedings of the Seventh International Conference on Machine Learning and Applications (ICMLA 2008), San Diego, CA, USA, December 11-13, pp. 688–693 (2008)
Google Scholar
Li, T., Ogihara, M., Li, Q.: A comparative study on content-based music genre classification. In: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), Toronto, Canada, pp. 282–289 (2003)
Google Scholar
Lidy, T., Rauber, A.: Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In: Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005), London, UK, September 11-15, pp. 34–41 (2005)
Google Scholar
Lidy, T., Rauber, A., Pertusa, A., Inesta, J.M.: Improving genre classification by combination of audio and symbolic descriptors using a transcription system. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), Vienna, Austria, September 23-27, pp. 61–66 (2007)
Google Scholar
Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proceedings of the International Symposium on Music Information Retrieval (ISMIR), Plymouth, Massachusetts, USA, October 23-25 (2000)
Google Scholar
Logan, B., Kositsky, A., Moreno, P.: Semantic analysis of song lyrics. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2004), Taipei, Taiwan, June 27-30, pp. 827–830 (2004)
Google Scholar
Logan, B., Salomon, A.: A music similarity function based on signal analysis. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Tokyo, Japan (August 2001)
Google Scholar
Mahedero, J.P.G., Martínez, Á., Cano, P., Koppenberger, M., Gouyon, F.: Natural language processing of lyrics. In: Proceedings of the ACM 13th International Conference on Multimedia (MM 2005), New York, NY, USA, pp. 475–478 (2005)
Google Scholar
Mayer, R., Neumayer, R., Rauber, A.: Combination of audio and lyrics features for genre classification in digital audio collections. In: Proceedings of the ACM Multimedia 2008, October 27-31, pp. 159–168. ACM, New York (2008)
Google Scholar
Mayer, R., Neumayer, R., Rauber, A.: Rhyme and style features for musical genre classification by song lyrics. In: Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, PA, USA, September 14-18 (2008)
Google Scholar
Neumayer, R., Rauber, A.: Integration of text and audio features for genre classification in music information retrieval. In: Proceedings of the 29th European Conference on Information Retrieval (ECIR 2007), pp. 724–727, Rome, Italy, April 2-5 (2007)
Google Scholar
Neumayer, R., Rauber, A.: Multi-modal music information retrieval - visualisation and evaluation of clusterings by both audio and lyrics. In: Proceedings of the 8th Conference Recherche d’Information Assistée par Ordinateur (RIAO 2007), Pittsburgh, PA, USA, May 29th - June 1 (2007)
Google Scholar
Orio, N.: Music retrieval: A tutorial and review. Foundations and Trends in Information Retrieval 1(1), 1–90 (2006)
Article Google Scholar
Pampalk, E., Flexer, A., Widmer, G.: Hierarchical organization and description of music collections at the artist level. In: Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2009) (2009)
Google Scholar
Pampalk, E., Rauber, A., Merkl, D.: Content-based Organization and Visualization of Music Archives. In: Proceedings of the ACM 10th International Conference on Multimedia (MM 2002), Juan les Pins, France, December 1-6, pp. 570–579 (2002)
Google Scholar
Ross Quinlan, J.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Rauber, A., Pampalk, E., Merkl, D.: Using psycho-acoustic models and self-organizing maps to create a hierarchical structuring of music by musical styles. In: Proceedings of the 3rd International Symposium on Music Information Retrieval (ISMIR 2002), Paris, France, October 13-17, pp. 71–80 (2002)
Google Scholar
Rauber, A., Pampalk, E., Merkl, D.: The SOM-enhanced JukeBox: Organization and visualization of music collections based on perceptual models. Journal of New Music Research 32(2), 193–210 (2003)
Article Google Scholar
Salton, G.: Automatic text processing – The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Longman Publishing Co., Inc., Amsterdam (1989)
Google Scholar
Shepard, R.N.: Circularity in judgments of relative pitch. The Journal of the Acoustical Society of America 36(12), 2346–2353 (1964)
Article Google Scholar
Tzanetakis, G., Cook, P.: Marsyas: A framework for audio analysis. Organized Sound 4(30), 169–175 (2000)
Article Google Scholar
Tzanetakis, G., Cook, P.: Sound analysis using MPEG compressed audio. In: Proceedings of the International Conference on Audio, Speech and Signal Processing (ICASSP), Istanbul, Turkey (2000)
Google Scholar
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302 (2002)
Article Google Scholar
Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification. In: Proceedings of the International Conference on Multimedia and Expo, ICME (2003)
Google Scholar
Yang, D., Lee, W.: Disambiguating music emotion using software agents. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), Barcelona, Spain (October 2004)
Google Scholar
Zhu, Y., Chen, K., Sun, Q.: Multimodal content-based structure analysis of karaoke music. In: Proceedings of the ACM 13th International Conference on Multimedia (MM 2005), Singapore, pp. 638–647 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology,
Rudolf Mayer & Andreas Rauber

Authors

Rudolf Mayer
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Rauber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of North Carolina, Charlotte, NC, USA
Zbigniew W. Raś
Polish-Japanese Institute of IT, Warsaw, Poland
Alicja A. Wieczorkowska

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mayer, R., Rauber, A. (2010). Multimodal Aspects of Music Retrieval: Audio, Song Lyrics – and Beyond?. In: Raś, Z.W., Wieczorkowska, A.A. (eds) Advances in Music Information Retrieval. Studies in Computational Intelligence, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11674-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-11674-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11673-5
Online ISBN: 978-3-642-11674-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Multimodal Aspects of Music Retrieval: Audio, Song Lyrics – and Beyond?

Abstract

Access this chapter

Preview

Similar content being viewed by others

ALF-200k: Towards Extensive Multimodal Analyses of Music Tracks and Playlists

Music Information Retrieval: A Window into the Needs and Challenges

From Audio to Music Notation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Multimodal Aspects of Music Retrieval: Audio, Song Lyrics – and Beyond?

Abstract

Access this chapter

Preview

Similar content being viewed by others

ALF-200k: Towards Extensive Multimodal Analyses of Music Tracks and Playlists

Music Information Retrieval: A Window into the Needs and Challenges

From Audio to Music Notation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation