Skip to main content

Part of the book series: Springer Theses ((Springer Theses))

Abstract

The baseline acoustic feature sets and the methods for robust and incremental audio analysis have been evaluated extensively by the author of this thesis. In this chapter, first, a set of 12 affective speech databases and two music style data-sets is introduced, which are used for a systematic evaluation of the proposed methods and baseline acoustic feature sets. Next, the effectiveness of the proposed noise robust affective speech classification approach is evaluated on two of the affective speech databases in Sect. 6.2. Then, recognition results obtained with all the baseline acoustic feature sets on a large set of 10 speech and two music databases are presented and discussed in Sect. 6.3. Finally, recognition results for continuous, dimensional affect recognition with an incremental recognition method are shown in Sect. 6.4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The database is publicly available for scientific research for free from http://semaine-db.eu.

  2. 2.

    The names of the raters are as used in the SEMAINE database, therefore the rater IDs are not continuous.

  3. 3.

    30 s preview audio tracks taken from https://secure.ballroomdancers.com/music/style.asp, Nov. 2006; a detailed list of these songs can be downloaded from http://www.audeering.com/research-and-open-source/files/brd.txt.

References

  • T. Bänziger, M. Mortillaro, K.R. Scherer, Introducing the Geneva multimodal expression corpus for experimental research on emotion perception. Emotion 12(5), 1161–1179 (2012)

    Article  Google Scholar 

  • T. Bänziger, S. Patel, K.R. Scherer, The role of perceived voice and speech characteristics in vocal emotion communication. J. Nonverbal Behav. 38(1), 31–52 (2014). doi:10.1007/s10919-013-0165-x. ISSN 0191-5886

    Article  Google Scholar 

  • F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss, A database of german emotional speech, in Proceedings of the INTERSPEECH 2005 (ISCA) (Lisbon, Portugal, 2005), pp. 1517–1520

    Google Scholar 

  • R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, M. Schröder, Feeltrace: an instrument for recording perceived emotion in real time, in Proceedings of the ISCA Workshop on Speech and Emotion (Newcastle, Northern Ireland, 2000), pp. 19–24

    Google Scholar 

  • R. Daido, M. Ito, S. Makino, A. Ito, Automatic evaluation of singing enthusiasm for karaoke. Comput. Speech Lang. 28(2), 501–517 (2014)

    Article  Google Scholar 

  • E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry, M. McRorie, J.C. Martin, L. Devillers, S. Abrilian, A. Batliner, N. Amir, K. Karpouzis, The HUMAINE Database, Lecture Notes in Computer Science (Springer, Heidelberg, 2007)

    Google Scholar 

  • I.S. Engbert, A.V. Hansen, Documentation of the danish emotional speech database DES. Technical report, Center for PersonKommunikation (Aalborg University, Denmark, 2007)

    Google Scholar 

  • F. Eyben, B. Schuller, S. Reiter, G. Rigoll, Wearable assistance for the ballroom-dance hobbyist—holistic rhythm analysis and dance-style classification, in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2007 (IEEE) (Bejing, China, July, 2007), pp. 92–95

    Google Scholar 

  • F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, R. Cowie, On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J. Multimodal User Interfaces (JMUI) 3(1–2), 7–19 (2010). doi:10.1007/s12193-009-0032-6

    Google Scholar 

  • F. Eyben, M. Wöllmer, M. Valstar, H. Gunes, B. Schuller, M. Pantic, String-based audiovisual fusion of behavioural events for the assessment of dimensional affect, in Proceedings of the IEEE International Workshop on Emotion Synthesis, rePresentation, and Analysis in Continuous spacE (EmoSPACE) 2011, held in conjunction with FG 2011 (Santa Barbara, CA, USA, March, 2011), pp. 322–329

    Google Scholar 

  • F. Eyben, M. Wöllmer, B. Schuller, A multi-task approach to continuous five-dimensional affect sensing in natural speech, special issue on affective interaction in natural environments. ACM Trans. Interact. Intell. Syst. 2(1), Article no 6, p. 29 (March 2012a)

    Google Scholar 

  • F. Eyben, B. Schuller, G. Rigoll, Improving Generalisation and Robustness of Acoustic Affect Recognition, in Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI) 2012, ed. by L.-P. Morency, D. Bohus, H.K. Aghajan, J. Cassell, A. Nijholt, J. Epps (ACM, Santa Monica, CA, USA, 2012b), pp. 517–522

    Google Scholar 

  • F. Eyben, A. Batliner, B. Schuller, Towards a standard set of acoustic features for the processing of emotion in speech. Proc. Meet. Acoust. (POMA) 9(1), 1–12 (2012c)

    Google Scholar 

  • F. Eyben, F. Weninger, B. Schuller, Affect recognition in real-life acoustic conditions—a new perspective on feature selection, in Proceedings of the INTERSPEECH 2013 (ISCA) (Lyon, France, August 2013), pp. 2044–2048

    Google Scholar 

  • F. Eyben, G.L. Salomão, J. Sundberg, K.R. Scherer, B. Schuller, Emotion in the singing voice—a deeper look at acoustic features in the light of automatic classification. EURASIP J. Audio, Speech, Music Process, special issue on scalable audio-content analysis 2015 (2015, in press), p. 14

    Google Scholar 

  • J.R.J. Fontaine, K.R. Scherer, E.B. Roesch, P.C. Ellsworth, The world of emotions is not two-dimensional. Psychol. Sci. 18(2), 1050–1057 (2007)

    Article  Google Scholar 

  • M. Grimm, K. Kroschel, Evaluation of natural emotions using self assessment manikins, in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2005 (IEEE) (Cancun, Maxico, November, 2005), pp. 381–385

    Google Scholar 

  • M. Grimm, K. Kroschel, S. Narayanan, Support vector regression for automatic recognition of spontaneous emotions in speech, inProceedings of the ICASSP 2007 (IEEE), vol. 4 (Honolulu, HI, USA, April, 2007a), pp. 1085–1088

    Google Scholar 

  • M. Grimm, E. Mower, K. Kroschel, S. Narayanan, Primitives based estimation and evaluation of emotions in speech. Speech Commun. 49, 787–800 (2007b)

    Article  Google Scholar 

  • M. Grimm, K. Kroschel, S. Narayanan, The Vera am Mittag German audio-visual emotional speech database, in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2008 (Hannover, Germany, 2008), pp. 865–868

    Google Scholar 

  • M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). doi:10.1145/1656274.1656278

    Article  Google Scholar 

  • J.H.L. Hansen, S. Bou-Ghazale, Getting started with SUSAS: a speech under simulated and actual stress database, in Proceedings of the EUROSPEECH-97 (ISCA) (Rhodes, Greece, 1997), pp. 1743–1746

    Google Scholar 

  • A. Hassan, R. Damper, M. Niranjan, On acoustic emotion recognition: compensating for covariate shift. IEEE Trans. Audio, Speech Lang. Process. 21(7), 1458–1468 (2013)

    Article  Google Scholar 

  • O. Martin, I. Kotsia, B. Macq, I. Pitas, The eNTERFACE’05 audio-visual emotion database, in Proceedings of the IEEE Workshop on Multimedia Database Management (IEEE) (Atlanta, GA, USA, April 2006)

    Google Scholar 

  • G. McKeown, M. Valstar, R. Cowie, M. Pantic, M. Schroder, The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affective Comput. 3(1), 5–17 (2012). doi:10.1109/T-AFFC.2011.20. ISSN 1949-3045

    Article  Google Scholar 

  • T. Nakano, M. Goto, Y. Hiraga, An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features, in Proceedings of the INTERSPEECH 2006 (ISCA) (Pittsburgh, PA, USA, 2006), pp. 1706–1709

    Google Scholar 

  • J.A. Pérez-Ortiz, F.A. Gers, D. Eck, J. Schmidhuber, Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets. Neural Netw. 16(2), 241–250 (2003). doi:10.1016/s0893-6080(02)00219-8. ISSN 0893-6080

    Google Scholar 

  • F. Ringeval, A. Sonderegger, J. Sauer, D. Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions, in Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), held in conjunction with FG 2013 (IEEE) (Shanghai, China, April 2013), pp. 1–8

    Google Scholar 

  • K.R. Scherer, Vocal markers of emotion: comparing induction and acting elicitation. Special issue on paralinguistics in naturalistic speech and language. Comput. Speech Lang. 27(1), 40–58 (2013). doi:10.1016/j.csl.2011.11.003. ISSN 0885-2308

    Article  Google Scholar 

  • K.R. Scherer, J. Sundberg, L. Tamarit, G.L. Salomão, Comparing the acoustic expression of emotion in the speaking and the singing voice. Comput. Speech Lang. 29(1) (2015). doi:10.1016/j.csl.2013.10.002

    Google Scholar 

  • B. Schuller, Intelligent Audio Analysis, Signals and Communication Technology (Springer, Heidelberg, 2013)

    Google Scholar 

  • B. Schuller, F. Eyben, G. Rigoll, Fast and robust meter and tempo recognition for the automatic discrimination of ballroom dance styles, inProceedings of the ICASSP 2007 (IEEE) (Honolulu, HI, USA, April 2007a), pp. 217–220

    Google Scholar 

  • B. Schuller, M. Wimmer, D. Arsić, G. Rigoll, B. Radig, Audiovisual behavior modeling by combined feature spaces, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 2007 (IEEE), vol. 2 (Honolulu, HI, USA, 2007b), pp. 733–736

    Google Scholar 

  • B. Schuller, F. Eyben, G. Rigoll, Tango or Waltz?—putting ballroom dance style into tempo detection, special issue on intelligent audio, speech, and music processing applications (Article ID 846135). EURASIP J. Audio, Speech, Music Process. (April 2008). doi:10.1155/2008/846135

    Google Scholar 

  • B. Schuller, R. Müller, F. Eyben, J. Gast, B. Hörnler, M. Wöllmer, G. Rigoll, A. Höthker, H. Konosu, Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Image Vis. Comput., Spec. Issue Vis. Multimodal Anal. Hum. Spontaneous Behav., 27(12), 1760–1774 (2009a)

    Google Scholar 

  • B. Schuller, S. Steidl, A. Batliner, F. Jurcicek, The INTERSPEECH 2009 emotion challenge, in Proceedings of the INTERSPEECH 2009 (Brighton, UK, Sept. 2009b), pp. 312–315

    Google Scholar 

  • B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth, Acoustic emotion recognition:a benchmark comparison of performances, inProceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2009, IEEE (Merano, Italy, December, 2009c), pp. 552–557

    Google Scholar 

  • B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, The INTERSPEECH 2010 paralinguistic challenge, in Proceedings of the INTERSPEECH 2010 (ISCA) (Makuhari, Japan, September, 2010a), pp. 2794–2797

    Google Scholar 

  • B. Schuller, R. Zaccarelli, N. Rollet, L. Devillers, CINEMO—a french spoken language resource for complex emotions: facts and baselines, in Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC) 2010, European Language Resources Association (ELRA) (Valletta, Malta, 2010b). ISBN 2-9517408-6-7

    Google Scholar 

  • B. Schuller, B. Vlasenko, F. Eyben, M. Wöllmer, A. Stuhlsatz, A. Wendemuth, G. Rigoll, Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affective Comput. (TAC) 1(2), 119–131 (2010c). doi:10.1109/T-AFFC.2010.8

    Google Scholar 

  • B. Schuller, A. Batliner, S. Steidl, D. Seppi, Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge, special issue on sensing emotion and affect—facing realism in speech processing. Speech Commun. 53(9/10), 1062–1087 (2011)

    Google Scholar 

  • B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism, in Proceedings of the INTERSPEECH 2013 (ISCA) (Lyon, France, 2013), pp. 148–152

    Google Scholar 

  • M.D. Smucker, J. Allan, B. Carterette, A comparison of statistical significance tests for information retrieval evaluation, in Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM’07) (ACM, 2007), pp. 623–632

    Google Scholar 

  • S. Steidl, Automatic Classification of Emotion-Related User States in Spontaneous Children’s Speech (Logos, Berlin, 2009)

    Google Scholar 

  • M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, R. Cowie, Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies, in Proceedings of the INTERSPEECH 2008 (ISCA) (Brisbane, Australia, September, 2008), pp. 597–600

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Eyben .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Eyben, F. (2016). Evaluation. In: Real-time Speech and Music Classification by Large Audio Feature Space Extraction. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-319-27299-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27299-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27298-6

  • Online ISBN: 978-3-319-27299-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics