Abstract
The baseline acoustic feature sets and the methods for robust and incremental audio analysis have been evaluated extensively by the author of this thesis. In this chapter, first, a set of 12 affective speech databases and two music style data-sets is introduced, which are used for a systematic evaluation of the proposed methods and baseline acoustic feature sets. Next, the effectiveness of the proposed noise robust affective speech classification approach is evaluated on two of the affective speech databases in Sect. 6.2. Then, recognition results obtained with all the baseline acoustic feature sets on a large set of 10 speech and two music databases are presented and discussed in Sect. 6.3. Finally, recognition results for continuous, dimensional affect recognition with an incremental recognition method are shown in Sect. 6.4.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The database is publicly available for scientific research for free from http://semaine-db.eu.
- 2.
The names of the raters are as used in the SEMAINE database, therefore the rater IDs are not continuous.
- 3.
30 s preview audio tracks taken from https://secure.ballroomdancers.com/music/style.asp, Nov. 2006; a detailed list of these songs can be downloaded from http://www.audeering.com/research-and-open-source/files/brd.txt.
References
T. Bänziger, M. Mortillaro, K.R. Scherer, Introducing the Geneva multimodal expression corpus for experimental research on emotion perception. Emotion 12(5), 1161–1179 (2012)
T. Bänziger, S. Patel, K.R. Scherer, The role of perceived voice and speech characteristics in vocal emotion communication. J. Nonverbal Behav. 38(1), 31–52 (2014). doi:10.1007/s10919-013-0165-x. ISSN 0191-5886
F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss, A database of german emotional speech, in Proceedings of the INTERSPEECH 2005 (ISCA) (Lisbon, Portugal, 2005), pp. 1517–1520
R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, M. Schröder, Feeltrace: an instrument for recording perceived emotion in real time, in Proceedings of the ISCA Workshop on Speech and Emotion (Newcastle, Northern Ireland, 2000), pp. 19–24
R. Daido, M. Ito, S. Makino, A. Ito, Automatic evaluation of singing enthusiasm for karaoke. Comput. Speech Lang. 28(2), 501–517 (2014)
E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry, M. McRorie, J.C. Martin, L. Devillers, S. Abrilian, A. Batliner, N. Amir, K. Karpouzis, The HUMAINE Database, Lecture Notes in Computer Science (Springer, Heidelberg, 2007)
I.S. Engbert, A.V. Hansen, Documentation of the danish emotional speech database DES. Technical report, Center for PersonKommunikation (Aalborg University, Denmark, 2007)
F. Eyben, B. Schuller, S. Reiter, G. Rigoll, Wearable assistance for the ballroom-dance hobbyist—holistic rhythm analysis and dance-style classification, in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2007 (IEEE) (Bejing, China, July, 2007), pp. 92–95
F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, R. Cowie, On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J. Multimodal User Interfaces (JMUI) 3(1–2), 7–19 (2010). doi:10.1007/s12193-009-0032-6
F. Eyben, M. Wöllmer, M. Valstar, H. Gunes, B. Schuller, M. Pantic, String-based audiovisual fusion of behavioural events for the assessment of dimensional affect, in Proceedings of the IEEE International Workshop on Emotion Synthesis, rePresentation, and Analysis in Continuous spacE (EmoSPACE) 2011, held in conjunction with FG 2011 (Santa Barbara, CA, USA, March, 2011), pp. 322–329
F. Eyben, M. Wöllmer, B. Schuller, A multi-task approach to continuous five-dimensional affect sensing in natural speech, special issue on affective interaction in natural environments. ACM Trans. Interact. Intell. Syst. 2(1), Article no 6, p. 29 (March 2012a)
F. Eyben, B. Schuller, G. Rigoll, Improving Generalisation and Robustness of Acoustic Affect Recognition, in Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI) 2012, ed. by L.-P. Morency, D. Bohus, H.K. Aghajan, J. Cassell, A. Nijholt, J. Epps (ACM, Santa Monica, CA, USA, 2012b), pp. 517–522
F. Eyben, A. Batliner, B. Schuller, Towards a standard set of acoustic features for the processing of emotion in speech. Proc. Meet. Acoust. (POMA) 9(1), 1–12 (2012c)
F. Eyben, F. Weninger, B. Schuller, Affect recognition in real-life acoustic conditions—a new perspective on feature selection, in Proceedings of the INTERSPEECH 2013 (ISCA) (Lyon, France, August 2013), pp. 2044–2048
F. Eyben, G.L. Salomão, J. Sundberg, K.R. Scherer, B. Schuller, Emotion in the singing voice—a deeper look at acoustic features in the light of automatic classification. EURASIP J. Audio, Speech, Music Process, special issue on scalable audio-content analysis 2015 (2015, in press), p. 14
J.R.J. Fontaine, K.R. Scherer, E.B. Roesch, P.C. Ellsworth, The world of emotions is not two-dimensional. Psychol. Sci. 18(2), 1050–1057 (2007)
M. Grimm, K. Kroschel, Evaluation of natural emotions using self assessment manikins, in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2005 (IEEE) (Cancun, Maxico, November, 2005), pp. 381–385
M. Grimm, K. Kroschel, S. Narayanan, Support vector regression for automatic recognition of spontaneous emotions in speech, inProceedings of the ICASSP 2007 (IEEE), vol. 4 (Honolulu, HI, USA, April, 2007a), pp. 1085–1088
M. Grimm, E. Mower, K. Kroschel, S. Narayanan, Primitives based estimation and evaluation of emotions in speech. Speech Commun. 49, 787–800 (2007b)
M. Grimm, K. Kroschel, S. Narayanan, The Vera am Mittag German audio-visual emotional speech database, in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2008 (Hannover, Germany, 2008), pp. 865–868
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). doi:10.1145/1656274.1656278
J.H.L. Hansen, S. Bou-Ghazale, Getting started with SUSAS: a speech under simulated and actual stress database, in Proceedings of the EUROSPEECH-97 (ISCA) (Rhodes, Greece, 1997), pp. 1743–1746
A. Hassan, R. Damper, M. Niranjan, On acoustic emotion recognition: compensating for covariate shift. IEEE Trans. Audio, Speech Lang. Process. 21(7), 1458–1468 (2013)
O. Martin, I. Kotsia, B. Macq, I. Pitas, The eNTERFACE’05 audio-visual emotion database, in Proceedings of the IEEE Workshop on Multimedia Database Management (IEEE) (Atlanta, GA, USA, April 2006)
G. McKeown, M. Valstar, R. Cowie, M. Pantic, M. Schroder, The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affective Comput. 3(1), 5–17 (2012). doi:10.1109/T-AFFC.2011.20. ISSN 1949-3045
T. Nakano, M. Goto, Y. Hiraga, An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features, in Proceedings of the INTERSPEECH 2006 (ISCA) (Pittsburgh, PA, USA, 2006), pp. 1706–1709
J.A. Pérez-Ortiz, F.A. Gers, D. Eck, J. Schmidhuber, Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets. Neural Netw. 16(2), 241–250 (2003). doi:10.1016/s0893-6080(02)00219-8. ISSN 0893-6080
F. Ringeval, A. Sonderegger, J. Sauer, D. Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions, in Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), held in conjunction with FG 2013 (IEEE) (Shanghai, China, April 2013), pp. 1–8
K.R. Scherer, Vocal markers of emotion: comparing induction and acting elicitation. Special issue on paralinguistics in naturalistic speech and language. Comput. Speech Lang. 27(1), 40–58 (2013). doi:10.1016/j.csl.2011.11.003. ISSN 0885-2308
K.R. Scherer, J. Sundberg, L. Tamarit, G.L. Salomão, Comparing the acoustic expression of emotion in the speaking and the singing voice. Comput. Speech Lang. 29(1) (2015). doi:10.1016/j.csl.2013.10.002
B. Schuller, Intelligent Audio Analysis, Signals and Communication Technology (Springer, Heidelberg, 2013)
B. Schuller, F. Eyben, G. Rigoll, Fast and robust meter and tempo recognition for the automatic discrimination of ballroom dance styles, inProceedings of the ICASSP 2007 (IEEE) (Honolulu, HI, USA, April 2007a), pp. 217–220
B. Schuller, M. Wimmer, D. Arsić, G. Rigoll, B. Radig, Audiovisual behavior modeling by combined feature spaces, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 2007 (IEEE), vol. 2 (Honolulu, HI, USA, 2007b), pp. 733–736
B. Schuller, F. Eyben, G. Rigoll, Tango or Waltz?—putting ballroom dance style into tempo detection, special issue on intelligent audio, speech, and music processing applications (Article ID 846135). EURASIP J. Audio, Speech, Music Process. (April 2008). doi:10.1155/2008/846135
B. Schuller, R. Müller, F. Eyben, J. Gast, B. Hörnler, M. Wöllmer, G. Rigoll, A. Höthker, H. Konosu, Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Image Vis. Comput., Spec. Issue Vis. Multimodal Anal. Hum. Spontaneous Behav., 27(12), 1760–1774 (2009a)
B. Schuller, S. Steidl, A. Batliner, F. Jurcicek, The INTERSPEECH 2009 emotion challenge, in Proceedings of the INTERSPEECH 2009 (Brighton, UK, Sept. 2009b), pp. 312–315
B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth, Acoustic emotion recognition:a benchmark comparison of performances, inProceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2009, IEEE (Merano, Italy, December, 2009c), pp. 552–557
B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, The INTERSPEECH 2010 paralinguistic challenge, in Proceedings of the INTERSPEECH 2010 (ISCA) (Makuhari, Japan, September, 2010a), pp. 2794–2797
B. Schuller, R. Zaccarelli, N. Rollet, L. Devillers, CINEMO—a french spoken language resource for complex emotions: facts and baselines, in Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC) 2010, European Language Resources Association (ELRA) (Valletta, Malta, 2010b). ISBN 2-9517408-6-7
B. Schuller, B. Vlasenko, F. Eyben, M. Wöllmer, A. Stuhlsatz, A. Wendemuth, G. Rigoll, Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affective Comput. (TAC) 1(2), 119–131 (2010c). doi:10.1109/T-AFFC.2010.8
B. Schuller, A. Batliner, S. Steidl, D. Seppi, Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge, special issue on sensing emotion and affect—facing realism in speech processing. Speech Commun. 53(9/10), 1062–1087 (2011)
B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism, in Proceedings of the INTERSPEECH 2013 (ISCA) (Lyon, France, 2013), pp. 148–152
M.D. Smucker, J. Allan, B. Carterette, A comparison of statistical significance tests for information retrieval evaluation, in Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM’07) (ACM, 2007), pp. 623–632
S. Steidl, Automatic Classification of Emotion-Related User States in Spontaneous Children’s Speech (Logos, Berlin, 2009)
M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, R. Cowie, Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies, in Proceedings of the INTERSPEECH 2008 (ISCA) (Brisbane, Australia, September, 2008), pp. 597–600
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Eyben, F. (2016). Evaluation. In: Real-time Speech and Music Classification by Large Audio Feature Space Extraction. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-319-27299-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-27299-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27298-6
Online ISBN: 978-3-319-27299-3
eBook Packages: EngineeringEngineering (R0)