Abstract
Music retrieval is predominantly seen as a problem to be tackled in the acoustic domain. With the exception of symbolic music retrieval and score-based systems, which form rather separate sub-disciplines on their own, most approaches to retrieve recordings of music by content rely on different features extracted from the audio signal. Music is subsequently retrieved by similarity matching, or classified into genre, instrumentation, artist or other categories. Yet, music is an inherently multimodal type of data. Apart from purely instrumental pieces, the lyrics associated with the music are as essential to the reception and the message of a song as is the audio.Albumcovers are carefully designed by artists to convey a message that is consistent with the message sent by the music on the album as well as by the image of a band in general. Music videos, fan sites and other sources of information add to that in a usually coherent manner. This paper takes a look at recent developments in multimodal analysis of music. It discusses different types of information sources available, stressing the multimodal character of music. It then reviews some features that may be extracted from those sources, focussing particularly on audio and lyrics as sources of information. Experimental results on different collections and categorisation tasks will round off the chapter. It shows the merits and open issues to be addressed to fully benefit from the rich and complex information space that music creates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allamanche, E., Herre, J., Hellmuth, O., Fröba, B., Kastner, T., Cremer, M.: Content-based identification of audio material using MPEG-7 low level description. In: Proceedings of the International Symposium on Music Information Retrieval (ISMIR), Bloomington, IN, USA, October 15-17, pp. 197–204 (2001)
Aucouturier, J.-J., Pachet, F.: Improving timbre similarity: How high is the sky? Journal of Negative Results in Speech and Audio Sciences 1(1) (2004)
Baumann, S., Pohle, T., Vembu, S.: Towards a socio-cultural compatibility of MIR systems. In: Proceedings of the 5th International Conference of Music Information Retrieval (ISMIR 2004), Barcelona, Spain, October 10-14, pp. 460–465 (2004)
Brochu, E., de Freitas, N., Bao, K.: The sound of an album cover: Probabilistic multimedia and IR. In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA, January 3-6 (2003)
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1994), Las Vegas, USA, pp. 161–175 (1994)
Crysandt, H., Wellhausen, J.: Music classification with MPEG-7. In: Proceedings of SPIE-IS&T Electronic Imaging, Santa Clara (CA), USA, January 2003. Storage and Retrieval for Media Databases, vol. 5021, pp. 307–404. The International Society for Optical Engineering (2003)
Cunningham, S.J., Reeves, N., Britland, M.: An ethnographic study of music information seeking: implications for the design of a music digital library. In: Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries (JCDL 2003), Washington, DC, USA, pp. 5–16. IEEE Computer Society, Los Alamitos (2003)
Stephen Downie, J.: Music Information Retrieval. In: Annual Review of Information Science and Technology, vol. 37, pp. 295–340. Information Today, Medford (2003)
Elovitz, H.S., Johnson, R., McHugh, A., Shore, J.E.: Letter-to-sound rules for automatic translation of English text to phonetics. IEEE Transactions on Acoustics, Speech and Signal Processing 24(6), 446–459 (1976)
Foote, J.: An overview of audio information retrieval. Multimedia Systems 7(1), 2–10 (1999)
Goto, M.: A chorus section detection method for musical audio signals and its application to a music listening station. IEEE Transactions on Audio, Speech & Language Processing 14(5), 1783–1794 (2006)
Iskandar, D., Wang, Y., Kan, M.-Y., Li, H.: Syllabic level automatic synchronization of music signals and text lyrics. In: Proceedings of the ACM 14th International Conference on Multimedia (MM 2006), New York, NY, USA, pp. 659–662(2006)
Knees, P., Schedl, M., Pohle, T., Widmer, G.: An Innovative Three-Dimensional User Interface for Exploring Music Collections Enriched with Meta-Information from the Web. In: Proceedings of the ACM 14th International Conference on Multimedia (MM2006), Santa Barbara, California, USA, October 23-26, pp. 17–24 (2006)
Knees, P., Schedl, M., Widmer, G.: Multiple lyrics alignment: Automatic retrieval of song lyrics. In: Proceedings of 6th International Conference on Music Information Retrieval (ISMIR 2005), London, UK, September 11-15, pp. 564–569 (2005)
Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: Proceedings of the Seventh International Conference on Machine Learning and Applications (ICMLA 2008), San Diego, CA, USA, December 11-13, pp. 688–693 (2008)
Li, T., Ogihara, M., Li, Q.: A comparative study on content-based music genre classification. In: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), Toronto, Canada, pp. 282–289 (2003)
Lidy, T., Rauber, A.: Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In: Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005), London, UK, September 11-15, pp. 34–41 (2005)
Lidy, T., Rauber, A., Pertusa, A., Inesta, J.M.: Improving genre classification by combination of audio and symbolic descriptors using a transcription system. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), Vienna, Austria, September 23-27, pp. 61–66 (2007)
Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proceedings of the International Symposium on Music Information Retrieval (ISMIR), Plymouth, Massachusetts, USA, October 23-25 (2000)
Logan, B., Kositsky, A., Moreno, P.: Semantic analysis of song lyrics. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2004), Taipei, Taiwan, June 27-30, pp. 827–830 (2004)
Logan, B., Salomon, A.: A music similarity function based on signal analysis. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Tokyo, Japan (August 2001)
Mahedero, J.P.G., Martínez, Á., Cano, P., Koppenberger, M., Gouyon, F.: Natural language processing of lyrics. In: Proceedings of the ACM 13th International Conference on Multimedia (MM 2005), New York, NY, USA, pp. 475–478 (2005)
Mayer, R., Neumayer, R., Rauber, A.: Combination of audio and lyrics features for genre classification in digital audio collections. In: Proceedings of the ACM Multimedia 2008, October 27-31, pp. 159–168. ACM, New York (2008)
Mayer, R., Neumayer, R., Rauber, A.: Rhyme and style features for musical genre classification by song lyrics. In: Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, PA, USA, September 14-18 (2008)
Neumayer, R., Rauber, A.: Integration of text and audio features for genre classification in music information retrieval. In: Proceedings of the 29th European Conference on Information Retrieval (ECIR 2007), pp. 724–727, Rome, Italy, April 2-5 (2007)
Neumayer, R., Rauber, A.: Multi-modal music information retrieval - visualisation and evaluation of clusterings by both audio and lyrics. In: Proceedings of the 8th Conference Recherche d’Information Assistée par Ordinateur (RIAO 2007), Pittsburgh, PA, USA, May 29th - June 1 (2007)
Orio, N.: Music retrieval: A tutorial and review. Foundations and Trends in Information Retrieval 1(1), 1–90 (2006)
Pampalk, E., Flexer, A., Widmer, G.: Hierarchical organization and description of music collections at the artist level. In: Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2009) (2009)
Pampalk, E., Rauber, A., Merkl, D.: Content-based Organization and Visualization of Music Archives. In: Proceedings of the ACM 10th International Conference on Multimedia (MM 2002), Juan les Pins, France, December 1-6, pp. 570–579 (2002)
Ross Quinlan, J.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Rauber, A., Pampalk, E., Merkl, D.: Using psycho-acoustic models and self-organizing maps to create a hierarchical structuring of music by musical styles. In: Proceedings of the 3rd International Symposium on Music Information Retrieval (ISMIR 2002), Paris, France, October 13-17, pp. 71–80 (2002)
Rauber, A., Pampalk, E., Merkl, D.: The SOM-enhanced JukeBox: Organization and visualization of music collections based on perceptual models. Journal of New Music Research 32(2), 193–210 (2003)
Salton, G.: Automatic text processing – The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Longman Publishing Co., Inc., Amsterdam (1989)
Shepard, R.N.: Circularity in judgments of relative pitch. The Journal of the Acoustical Society of America 36(12), 2346–2353 (1964)
Tzanetakis, G., Cook, P.: Marsyas: A framework for audio analysis. Organized Sound 4(30), 169–175 (2000)
Tzanetakis, G., Cook, P.: Sound analysis using MPEG compressed audio. In: Proceedings of the International Conference on Audio, Speech and Signal Processing (ICASSP), Istanbul, Turkey (2000)
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302 (2002)
Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification. In: Proceedings of the International Conference on Multimedia and Expo, ICME (2003)
Yang, D., Lee, W.: Disambiguating music emotion using software agents. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), Barcelona, Spain (October 2004)
Zhu, Y., Chen, K., Sun, Q.: Multimodal content-based structure analysis of karaoke music. In: Proceedings of the ACM 13th International Conference on Multimedia (MM 2005), Singapore, pp. 638–647 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Mayer, R., Rauber, A. (2010). Multimodal Aspects of Music Retrieval: Audio, Song Lyrics – and Beyond?. In: Raś, Z.W., Wieczorkowska, A.A. (eds) Advances in Music Information Retrieval. Studies in Computational Intelligence, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11674-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-11674-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11673-5
Online ISBN: 978-3-642-11674-2
eBook Packages: EngineeringEngineering (R0)