Abstract
In order to train and test Intelligent Audio Analysis systems, audio data is needed. In fact, this is often considered as one of the main bottle necks and the common opinion is that there is "no data like more data". In this light, the requirements for collecting and providing audio databases are outlined. This includes in particular the establishment of a reliable gold standard. Explanatory examples are given for the three types of audio—speech, music, and general sound—by the corpora TUM AVIC, NTWICM, and the FindSounds database.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.
—Sir Arthur Conan Doyle.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
The term ground truth indeed originated in the fields of aerial photographs and satellite imagery.
- 3.
Available at http://www.openaudio.eu.
- 4.
- 5.
The complete annotation by the four individuals is available at http://www.openaudio.eu to ensure reproducibility by others.
- 6.
LyricsDB (http://lyrics.mirkforce.net)
- 7.
- 8.
http://www.findsounds.com/types.html, accessed 25 July 2011.
References
Nieschulz, R., Schuller, B., Geiger, M., Neuss, R.: Aspects of efficient usability engineering. Inf. Technol. Spec. Issue Usability Eng 44(1), 23–30 (2002)
Riccardi, G., Hakkani-Tur, D.: Active learning: theory and applications to automatic speech recognition. IEEE Trans. Speech Audio Process. 13(4), 504–511 (2005)
Schuller, B., Zhang, Z., Weninger, F., Rigoll, G.: Using multiple databases for training in emotion recognition: To unite or to vote? In: Proceedings INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pp. 1553–1556, Florence, Italy, August 2011 (ISCA, ISCA)
Zhang, Z., Weninger, F., Wöllmer, M., Schuller, B.: Unsupervised learning in cross-corpus acoustic emotion recognition. In: Proceedings 12th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2011, pp. 523–528, Big Island, HY, December 2011 (IEEE, IEEE)
Zhang, Z., Schuller, B.: Semi-supervised learning helps in sound event classification. In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 333–336, Kyoto, Japan, March 2012 (IEEE, IEEE)
Schuller, B., Burkhardt, F.: Learning with synthesized speech for automatic emotion recognition. In: Proceedings 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 5150–515, Dallas, TX, March 2010 (IEEE, IEEE)
Grimm, M., Kroschel, K.: Evaluation of natural emotions using self assessment manikins. In: Proceedings of ASRU, pp. 381–385 (2005) (IEEE)
Krippendorff, K.: Content Analysis, An Introduction to Its Methodology, 2nd edn. Sage Publications, Thousand Oaks, CA, U. S. A. (2004)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960)
Landis, J., Koch, G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Fleiss, J.: The measurement of interrater agreement. In: Statistical Methods for Rates and Proportions, Chapter 13, pp. 212–236, 2nd edn. John Wiley & Sons, New York (1981)
Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Imag. Vis. Comput. Special Issue on Visual and Multimodal Analysis of Human Spontaneous Behavior 27(12), 1760–1774 (2009)
Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German Audio-Visual Emotional Speech Database. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 865–868, Hannover, Germany (2008)
Schuller, B., Dorfner, J., Rigoll, G.: Determination of non-prototypical valence and arousal in popular music: Features and performances. EURASIP J. Audio Speech Music Process. Special Issue on Scalable Audio-Content Analysis, 2010(Article ID 735854), 19 (2010)
Schuller, B., Weninger, F., Dorfner, J.: Multi-modal non-prototypical music mood analysis in continuous space: reliability and performances. In: Proceedings 12th International Society for Music Information Retrieval Conference, ISMIR 2011, pp. 759–764, Miami, FL, October 2011 (ISMIR, ISMIR)
Hevner, K.: Experimental studies of the elements of expression in music. Am. J. Psychol. 48, 246–268 (1936)
Farnsworth, P.R.: The Social Psychology of Music. The Dryden Press, New York (1958)
Li, T., Ogihara, M.: Detecting emotion in music. In: Proceedings of ISMIR, pp. 239–240, Baltimore, MD (2003)
Hu, X., Downie, J.S., Laurier, C., Bay, M., Ehmann, A.F.: The 2007 MIREX audio mood classification task: lessons learned. In: Proceedings 9th International Conference on Music Information Retrieval (ISMIR), pp. 462–467, Philadelphia, PA (2008)
Russell, J.A.: The Measurement of Emotions, Volume 4 of Emotion, Theory, Research, and Experience, Chapter Measures of Emotion, pp. 83–111. Academic Press, San Diego (1989)
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)
Thayer, R.E.: The Biopsychology of Mood and Arousal. Oxford University Press, New York (1990)
Liu, D.: Automatic mood detection from acoustic music data. In: Proceedings International Conference on Music, Information Retrieval, pp. 13–17 (2003)
Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 5–18 (2006)
Xiao, Z., Dellandréa, E., Dou, W., Chen, L.: What is the best segment duration for music mood analysis? In: Proceedings of International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 17–24, (2008)
Bartsch, M.A., Wakefield, G.H.: To Catch a Chorus: Using Chroma-Based Representations for Audio Thumbnailing. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2001, pp. 15–18, New Paltz, NY, October 2001
Schuller, B., Rigoll, G., Lang, M.: Hmm-based music retrieval using stereophonic feature information and framelength adaptation. In: Proceedings 4th IEEE International Conference on Multimedia and Expo, ICME 2003, vol. II, pp. 713–716, Baltimore, MD, July 2003 (IEEE, IEEE)
Goto, M.: A chorus section detection method for musical audio signals and its application to a music listening station. IEEE Trans. Audio Speech Lang. Process. 14(5), 1783–1794 (2006)
Müller, M., Kurth, F.: Towards structural analysis of audio recordings in the presence of mucical variations. EURASIP J. Adv. Signal Process. ID 89686, (2007)
S. Steidl, A. Batliner, D. Seppi, and B. Schuller. On the impact of children’s emotional speech on acoustic and language models. EURASIP J. Audio Speech Music Process. Special Issue on Atypical Speech, 2010(Article ID 783954), 2010. pp. 14
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Gabrielsson, A.: Emotion perceived and emotion felt: same or different? Musicae Scientiae, pp. 123–147 (2002)
Steidl, S., Schuller, B., Seppi, D., Batliner, A.: The hinterland of emotions: facing the open-microphone challenge. In: Proceedings 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, vol. I, pp. 690–697, Amsterdam, The Netherlands, September 2009 (HUMAINE Association, IEEE)
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (June 1996)
Rice, S.V., Bailey, S.M.: A web search engine for sound effects. In: Proceedings of 119th AES, New York (2005)
Schuller, B., Hantke, S., Weninger, F., Han, W., Zhang, Z., Narayanan, S.: Automatic recognition of emotion evoked by general sound events. In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 341–344, Kyoto, Japan, March 2012 (IEEE, IEEE)
Gunes, H., Schuller, B., Pantic, M., Cowie, R.: Emotion representation, analysis and synthesis in continuous space: A survey. In: Proceedings International Workshop on Emotion Synthesis, rePresentation, and Analysis in Continuous spacE, EmoSPACE 2011, held in conjunction with the 9th IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, FG 2011, pp. 827–834, Santa Barbara, CA, March 2011 (IEEE, IEEE)
Kim, Y., Schmidt, E., Migneco, R., Morton, B., Richardson, P., Scott, J., Speck, J., Turnbull, D.: Music emotion recognition: a state of the art review. In: Proceedings of ISMIR, pp. 255–266, Utrecht, The Netherlands (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schuller, B. (2013). Audio Data. In: Intelligent Audio Analysis. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36806-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-36806-6_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36805-9
Online ISBN: 978-3-642-36806-6
eBook Packages: EngineeringEngineering (R0)