Evaluation

Eyben, Florian

doi:10.1007/978-3-319-27299-3_6

Florian Eyben²

Part of the book series: Springer Theses ((Springer Theses))

Abstract

The baseline acoustic feature sets and the methods for robust and incremental audio analysis have been evaluated extensively by the author of this thesis. In this chapter, first, a set of 12 affective speech databases and two music style data-sets is introduced, which are used for a systematic evaluation of the proposed methods and baseline acoustic feature sets. Next, the effectiveness of the proposed noise robust affective speech classification approach is evaluated on two of the affective speech databases in Sect. 6.2. Then, recognition results obtained with all the baseline acoustic feature sets on a large set of 10 speech and two music databases are presented and discussed in Sect. 6.3. Finally, recognition results for continuous, dimensional affect recognition with an incremental recognition method are shown in Sect. 6.4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The database is publicly available for scientific research for free from http://semaine-db.eu.
2.
The names of the raters are as used in the SEMAINE database, therefore the rater IDs are not continuous.
3.
30 s preview audio tracks taken from https://secure.ballroomdancers.com/music/style.asp, Nov. 2006; a detailed list of these songs can be downloaded from http://www.audeering.com/research-and-open-source/files/brd.txt.

References

T. Bänziger, M. Mortillaro, K.R. Scherer, Introducing the Geneva multimodal expression corpus for experimental research on emotion perception. Emotion 12(5), 1161–1179 (2012)
Article Google Scholar
T. Bänziger, S. Patel, K.R. Scherer, The role of perceived voice and speech characteristics in vocal emotion communication. J. Nonverbal Behav. 38(1), 31–52 (2014). doi:10.1007/s10919-013-0165-x. ISSN 0191-5886
Article Google Scholar
F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss, A database of german emotional speech, in Proceedings of the INTERSPEECH 2005 (ISCA) (Lisbon, Portugal, 2005), pp. 1517–1520
Google Scholar
R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, M. Schröder, Feeltrace: an instrument for recording perceived emotion in real time, in Proceedings of the ISCA Workshop on Speech and Emotion (Newcastle, Northern Ireland, 2000), pp. 19–24
Google Scholar
R. Daido, M. Ito, S. Makino, A. Ito, Automatic evaluation of singing enthusiasm for karaoke. Comput. Speech Lang. 28(2), 501–517 (2014)
Article Google Scholar
E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry, M. McRorie, J.C. Martin, L. Devillers, S. Abrilian, A. Batliner, N. Amir, K. Karpouzis, The HUMAINE Database, Lecture Notes in Computer Science (Springer, Heidelberg, 2007)
Google Scholar
I.S. Engbert, A.V. Hansen, Documentation of the danish emotional speech database DES. Technical report, Center for PersonKommunikation (Aalborg University, Denmark, 2007)
Google Scholar
F. Eyben, B. Schuller, S. Reiter, G. Rigoll, Wearable assistance for the ballroom-dance hobbyist—holistic rhythm analysis and dance-style classification, in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2007 (IEEE) (Bejing, China, July, 2007), pp. 92–95
Google Scholar
F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, R. Cowie, On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J. Multimodal User Interfaces (JMUI) 3(1–2), 7–19 (2010). doi:10.1007/s12193-009-0032-6
Google Scholar
F. Eyben, M. Wöllmer, M. Valstar, H. Gunes, B. Schuller, M. Pantic, String-based audiovisual fusion of behavioural events for the assessment of dimensional affect, in Proceedings of the IEEE International Workshop on Emotion Synthesis, rePresentation, and Analysis in Continuous spacE (EmoSPACE) 2011, held in conjunction with FG 2011 (Santa Barbara, CA, USA, March, 2011), pp. 322–329
Google Scholar
F. Eyben, M. Wöllmer, B. Schuller, A multi-task approach to continuous five-dimensional affect sensing in natural speech, special issue on affective interaction in natural environments. ACM Trans. Interact. Intell. Syst. 2(1), Article no 6, p. 29 (March 2012a)
Google Scholar
F. Eyben, B. Schuller, G. Rigoll, Improving Generalisation and Robustness of Acoustic Affect Recognition, in Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI) 2012, ed. by L.-P. Morency, D. Bohus, H.K. Aghajan, J. Cassell, A. Nijholt, J. Epps (ACM, Santa Monica, CA, USA, 2012b), pp. 517–522
Google Scholar
F. Eyben, A. Batliner, B. Schuller, Towards a standard set of acoustic features for the processing of emotion in speech. Proc. Meet. Acoust. (POMA) 9(1), 1–12 (2012c)
Google Scholar
F. Eyben, F. Weninger, B. Schuller, Affect recognition in real-life acoustic conditions—a new perspective on feature selection, in Proceedings of the INTERSPEECH 2013 (ISCA) (Lyon, France, August 2013), pp. 2044–2048
Google Scholar
F. Eyben, G.L. Salomão, J. Sundberg, K.R. Scherer, B. Schuller, Emotion in the singing voice—a deeper look at acoustic features in the light of automatic classification. EURASIP J. Audio, Speech, Music Process, special issue on scalable audio-content analysis 2015 (2015, in press), p. 14
Google Scholar
J.R.J. Fontaine, K.R. Scherer, E.B. Roesch, P.C. Ellsworth, The world of emotions is not two-dimensional. Psychol. Sci. 18(2), 1050–1057 (2007)
Article Google Scholar
M. Grimm, K. Kroschel, Evaluation of natural emotions using self assessment manikins, in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2005 (IEEE) (Cancun, Maxico, November, 2005), pp. 381–385
Google Scholar
M. Grimm, K. Kroschel, S. Narayanan, Support vector regression for automatic recognition of spontaneous emotions in speech, inProceedings of the ICASSP 2007 (IEEE), vol. 4 (Honolulu, HI, USA, April, 2007a), pp. 1085–1088
Google Scholar
M. Grimm, E. Mower, K. Kroschel, S. Narayanan, Primitives based estimation and evaluation of emotions in speech. Speech Commun. 49, 787–800 (2007b)
Article Google Scholar
M. Grimm, K. Kroschel, S. Narayanan, The Vera am Mittag German audio-visual emotional speech database, in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2008 (Hannover, Germany, 2008), pp. 865–868
Google Scholar
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). doi:10.1145/1656274.1656278
Article Google Scholar
J.H.L. Hansen, S. Bou-Ghazale, Getting started with SUSAS: a speech under simulated and actual stress database, in Proceedings of the EUROSPEECH-97 (ISCA) (Rhodes, Greece, 1997), pp. 1743–1746
Google Scholar
A. Hassan, R. Damper, M. Niranjan, On acoustic emotion recognition: compensating for covariate shift. IEEE Trans. Audio, Speech Lang. Process. 21(7), 1458–1468 (2013)
Article Google Scholar
O. Martin, I. Kotsia, B. Macq, I. Pitas, The eNTERFACE’05 audio-visual emotion database, in Proceedings of the IEEE Workshop on Multimedia Database Management (IEEE) (Atlanta, GA, USA, April 2006)
Google Scholar
G. McKeown, M. Valstar, R. Cowie, M. Pantic, M. Schroder, The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affective Comput. 3(1), 5–17 (2012). doi:10.1109/T-AFFC.2011.20. ISSN 1949-3045
Article Google Scholar
T. Nakano, M. Goto, Y. Hiraga, An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features, in Proceedings of the INTERSPEECH 2006 (ISCA) (Pittsburgh, PA, USA, 2006), pp. 1706–1709
Google Scholar
J.A. Pérez-Ortiz, F.A. Gers, D. Eck, J. Schmidhuber, Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets. Neural Netw. 16(2), 241–250 (2003). doi:10.1016/s0893-6080(02)00219-8. ISSN 0893-6080
Google Scholar
F. Ringeval, A. Sonderegger, J. Sauer, D. Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions, in Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), held in conjunction with FG 2013 (IEEE) (Shanghai, China, April 2013), pp. 1–8
Google Scholar
K.R. Scherer, Vocal markers of emotion: comparing induction and acting elicitation. Special issue on paralinguistics in naturalistic speech and language. Comput. Speech Lang. 27(1), 40–58 (2013). doi:10.1016/j.csl.2011.11.003. ISSN 0885-2308
Article Google Scholar
K.R. Scherer, J. Sundberg, L. Tamarit, G.L. Salomão, Comparing the acoustic expression of emotion in the speaking and the singing voice. Comput. Speech Lang. 29(1) (2015). doi:10.1016/j.csl.2013.10.002
Google Scholar
B. Schuller, Intelligent Audio Analysis, Signals and Communication Technology (Springer, Heidelberg, 2013)
Google Scholar
B. Schuller, F. Eyben, G. Rigoll, Fast and robust meter and tempo recognition for the automatic discrimination of ballroom dance styles, inProceedings of the ICASSP 2007 (IEEE) (Honolulu, HI, USA, April 2007a), pp. 217–220
Google Scholar
B. Schuller, M. Wimmer, D. Arsić, G. Rigoll, B. Radig, Audiovisual behavior modeling by combined feature spaces, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 2007 (IEEE), vol. 2 (Honolulu, HI, USA, 2007b), pp. 733–736
Google Scholar
B. Schuller, F. Eyben, G. Rigoll, Tango or Waltz?—putting ballroom dance style into tempo detection, special issue on intelligent audio, speech, and music processing applications (Article ID 846135). EURASIP J. Audio, Speech, Music Process. (April 2008). doi:10.1155/2008/846135
Google Scholar
B. Schuller, R. Müller, F. Eyben, J. Gast, B. Hörnler, M. Wöllmer, G. Rigoll, A. Höthker, H. Konosu, Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Image Vis. Comput., Spec. Issue Vis. Multimodal Anal. Hum. Spontaneous Behav., 27(12), 1760–1774 (2009a)
Google Scholar
B. Schuller, S. Steidl, A. Batliner, F. Jurcicek, The INTERSPEECH 2009 emotion challenge, in Proceedings of the INTERSPEECH 2009 (Brighton, UK, Sept. 2009b), pp. 312–315
Google Scholar
B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth, Acoustic emotion recognition:a benchmark comparison of performances, inProceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2009, IEEE (Merano, Italy, December, 2009c), pp. 552–557
Google Scholar
B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, The INTERSPEECH 2010 paralinguistic challenge, in Proceedings of the INTERSPEECH 2010 (ISCA) (Makuhari, Japan, September, 2010a), pp. 2794–2797
Google Scholar
B. Schuller, R. Zaccarelli, N. Rollet, L. Devillers, CINEMO—a french spoken language resource for complex emotions: facts and baselines, in Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC) 2010, European Language Resources Association (ELRA) (Valletta, Malta, 2010b). ISBN 2-9517408-6-7
Google Scholar
B. Schuller, B. Vlasenko, F. Eyben, M. Wöllmer, A. Stuhlsatz, A. Wendemuth, G. Rigoll, Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affective Comput. (TAC) 1(2), 119–131 (2010c). doi:10.1109/T-AFFC.2010.8
Google Scholar
B. Schuller, A. Batliner, S. Steidl, D. Seppi, Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge, special issue on sensing emotion and affect—facing realism in speech processing. Speech Commun. 53(9/10), 1062–1087 (2011)
Google Scholar
B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism, in Proceedings of the INTERSPEECH 2013 (ISCA) (Lyon, France, 2013), pp. 148–152
Google Scholar
M.D. Smucker, J. Allan, B. Carterette, A comparison of statistical significance tests for information retrieval evaluation, in Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM’07) (ACM, 2007), pp. 623–632
Google Scholar
S. Steidl, Automatic Classification of Emotion-Related User States in Spontaneous Children’s Speech (Logos, Berlin, 2009)
Google Scholar
M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, R. Cowie, Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies, in Proceedings of the INTERSPEECH 2008 (ISCA) (Brisbane, Australia, September, 2008), pp. 597–600
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Human-Machine Communication (MMK), Technische Universität München, Munich, Germany
Florian Eyben

Authors

Florian Eyben
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Eyben .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Eyben, F. (2016). Evaluation. In: Real-time Speech and Music Classification by Large Audio Feature Space Extraction. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-319-27299-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-27299-3_6
Published: 25 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27298-6
Online ISBN: 978-3-319-27299-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics