Advertisement

Cognitive Computation

, Volume 6, Issue 4, pp 892–913 | Cite as

Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech

  • Ingo Siegert
  • David Philippou-Hübner
  • Kim Hartmann
  • Ronald Böck
  • Andreas Wendemuth
Article

Abstract

For successful human–machine-interaction (HCI) the pure textual information and the individual skills, preferences, and affective states of the user must be known. Therefore, as a starting point, the user’s actual affective state has to be recognized. In this work we investigated how additional knowledge, for example age and gender of the user, can be used to improve recognition of affective state. Two methods from automatic speech recognition are used to incorporate age and gender differences in recognition of affective state: speaker group-dependent (SGD) modelling and vocal tract length normalisation (VTLN). The investigations were performed on four corpora with acted and natural affected speech. Different features and two methods of classification (Gaussian mixture models (GMMs) and multi-layer perceptrons (MLPs)) were used. In addition, the effects of channel compensation and contextual characteristics were analysed. The results are compared with our own baseline results and with results reported in the literature. Two hypotheses were tested. First, incorporation of age information further improves speaker group-dependent modelling. Second, acoustic normalization does not achieve the same improvement as achieved by speaker group-dependent modelling, because the age and gender of a speaker affects the way emotions are expressed.

Keywords

Affect recognition Companion systems Vocal tract length normalization Speaker group-dependent classifiers 

Notes

Acknowledgments

The work presented in this article was conducted within the Transregional Collaborative Research Centre SFB/TRR 62 “Companion-Technology for Cognitive Technical Systems” funded by the German Research Foundation (DFG). We also acknowledge the DFG for financing our computing cluster. Portions of the research in this article use the LAST MINUTE Corpus generated under the supervision of Professor Jörg Frommer and Professor Dietmar Rösner.

References

  1. 1.
    Albornoz EM, Milone DH, Rufiner HL. Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang. 2011;25(3):556–70.CrossRefGoogle Scholar
  2. 2.
    Atal B. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am. 1974;55(6):1304–12.PubMedCrossRefGoogle Scholar
  3. 3.
    Bahari M, Van Hamme H. Speaker age estimation using hidden markov model weight supervectors. In: Proceedings of the 11th ISSPA; 2012. p. 517–521.Google Scholar
  4. 4.
    Batliner A, Fischer K, Huber R, Spiker J, North E. Desperately seeking emotions: actors, wizards and human beings. In: Proceedings of the ISCA workshop on speech and emotion; 2000. p. 195–200.Google Scholar
  5. 5.
    Batliner A, Fischer K, Huber R, Spilker J, Nöth E. How to find trouble in communication. Speech Commun. 2003;40(1–2):117–43.CrossRefGoogle Scholar
  6. 6.
    Batliner A, Hacker C, Steidl S, Nöth E, Russell M, Wong M. “You stupid tin box”- children interacting with the AIBO robot: a cross-linguistic emotional speech corpus. In: Proceedings of LREC; 2004. p. 865–868.Google Scholar
  7. 7.
    Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N. Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang. 2011;25(1):4–28.CrossRefGoogle Scholar
  8. 8.
    Becker-Asano C. WASABI : Affect simulation for agents with believable interactivity. Ph.D. thesis, Universität Bielefeld; 2008.Google Scholar
  9. 9.
    Böck R, Hübner D, Wendemuth A. Determining optimal signal features and parameters for hmm-based emotion classification. In: Proceedings of the 15th IEEE mediterranean electrotechnical conference; 2010. p. 1586–1590.Google Scholar
  10. 10.
    Böck R, Limbrecht K, Walter S, Hrabal D, Traue HC, Glüge S, Wendemuth A. Intraindividual and interindividual multimodal emotion analyses in human–machine interaction. In: IEEE interantional multi-disciplinary conference on cognitive methods in situation awareness and decision support; 2012. p. 59–64.Google Scholar
  11. 11.
    Bocklet T, Maier A, Bauer J, Burkhardt F, Noth E. Age and gender recognition for telephone applications based on GMM supervectors and support vector machines. In: Proceedings of IEEE ICASSP’08; 2008. p. 1605–1608.Google Scholar
  12. 12.
    Burkhardt F, Eckert M, Johannsen W, Stegmann J. A database of age and gender annotated telephone speech. In: Proceedings of the 7th LREC. ELRA; 2010.Google Scholar
  13. 13.
    Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B. A database of german emotional speech. In: Proceedings of interspeech; 2005. p. 1516–1520.Google Scholar
  14. 14.
    Busso C, Deng Z, Yildirim S, Bulut M, Lee C, Kazemzadeh A, Lee S, Neumann U, Narayanan S. Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th ICMI. New York, USA: ACM; 2004. p. 205–211.Google Scholar
  15. 15.
    Butler LD, Nolen-Hoeksema S. Gender differences in responses to depressed mood in a college sample. Sex Roles. 1994;30(5–6):331–46.CrossRefGoogle Scholar
  16. 16.
    Cohen J, Kamm T, Andreou AG. Vocal tract normalization in speech recognition: compensating for systematic speaker variability. J Acoust Soc Am. 1995;97(5):3246–7.CrossRefGoogle Scholar
  17. 17.
    Cowie R, Cornelius RR. Describing the emotional states that are expressed in speech. Speech Commun. 2003;40(1–2):5–32.CrossRefGoogle Scholar
  18. 18.
    Cullen A, Harte N. Feature sets for automatic classification of dimensional affect. In: IET Irish signals and systems conference (ISSC 2012); 2012. p. 1–6.Google Scholar
  19. 19.
    Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process. 1980;28(4):357–66.CrossRefGoogle Scholar
  20. 20.
    Dellwo V, Leemann A, Kolly MJ. Speaker idiosyncratic rhythmic features in the speech signal. In: Proceedings of Interspeech; 2012. Portland, Oregon.Google Scholar
  21. 21.
    de Veth J, Boves L. On the efficiency of classical rasta filtering for continuous speech recognition: keeping the balance between acoustic pre-processing and acoustic modelling. Speech Commun. 2003;39(3–4):269–86.CrossRefGoogle Scholar
  22. 22.
    Dmitrieva E, Gelman V. The relationship between the perception of emotional intonation of speech in conditions of interference and the acoustic parameters of speech signals in adults of different gender and age. Neurosci Behav Physiol. 2012;42:920–8.CrossRefGoogle Scholar
  23. 23.
    Dobrišek S, Gajšek R, Mihelič F, Pavešić N, Štruc V. Towards efficient multi-modal emotion recognition. Int J Adv Robot Syst. 2013;10:53. doi: 10.5772/54002.
  24. 24.
    Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, McRorie M, Martin JC, Devillers L, Abrilian S, Batliner A, Amir N, Karpouzis K. The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Proceedings of ACII’07; 2007. p. 488–500.Google Scholar
  25. 25.
    Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C. Multimodal databases of everyday emotion: facing up to complexity. In: European conference on speech communication and technology; 2005. p. 813–816.Google Scholar
  26. 26.
    Dumouchel P, Dehak N, Attabi Y, Dehak R, Boufaden N. Cepstral and long-term features for emotion recognition. In: Proceedings of Interspeech;2009. p. 344–347.Google Scholar
  27. 27.
    Ekman P. Handbook of cognition and emotion, chap. basic emotions. Sussex, UK: Wiley; 2005. p. 45–60.Google Scholar
  28. 28.
    Emori T, Shinoda K. Rapid vocal tract length normalization using maximum likelihood estimation. In: Proceedings of EUROSPEECH 2001, 7th European conference on speech communication and technology. Denmark: Aalborg; 2001. p. 1649–1652.Google Scholar
  29. 29.
    Engberg IS, Hansen AV. Documentation of the danish emotional speech database (DES). Technical report. Denmark: Center for Person, Kommunikation, Aalborg University; 1996. Internal aau report.Google Scholar
  30. 30.
    Frommer J, Michaelis B, Rösner D, Wendemuth A, Friesen R, Haase, M, Kunze M, Andrich R, Lange J, Panning A, Siegert I. Towards emotion and affect detection in the multimodal last minute corpus. In: Proceedings of the 8th LREC; 2012.Google Scholar
  31. 31.
    Frommer J, Rösner D, Haase M, Lange J, Friesen R, Otto M. Detection and avoidance of failures in dialogues—wizard of oz experiment operator’s manual. Pabst Science Publishers; 2012.Google Scholar
  32. 32.
    Gajsek R, Zibert J, Justin T, Struc V, Vesnicer B, Mihelic F. Gender and affect recognition based on gmm and gmm-ubm modeling with relevance map estimation. In: Proceedings of interspeech; 2010. p. 2810–2813.Google Scholar
  33. 33.
    Giuliani D, Gerosa M. Investigating recognition of children’s speech. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP ’03), vol. 2; 2003. p. II-137-40.Google Scholar
  34. 34.
    Glüge S, Böck R, Wendemuth A. Segmented-memory recurrent neural networks versus hidden markov models in emotion recognition from speech. In: Proceedings of the 3rd international joint conference on computational intelligence. Paris, France; 2011. p. 308–315.Google Scholar
  35. 35.
    Gnjatović M, Rösner D. On the role of the NIMITEK corpus in developing an emotion adaptive spoken dialogue system. In: Proceedings of the 7th LREC. Marrakech, Morocco; 2008.Google Scholar
  36. 36.
    Grimm M, Kroschel K, Narayanan S. The vera am mittag german audio–visual emotional speech database. In: Proceedings of ICME; 2008. p. 865–868.Google Scholar
  37. 37.
    Gross J, Carstensen L, Pasupathi M, Tsai J, Skorpen C, Hsu A. Emotion and aging: experience, expression, and control. Psychol Aging. 1997;12(4):590–9.PubMedCrossRefGoogle Scholar
  38. 38.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The weka data mining software: an update. SIGKDD Explor Newsl. 2009;11(1):10–8.CrossRefGoogle Scholar
  39. 39.
    Hartmann K, Siegert I, Philippou-Hübner D, Wendemuth A. Emotion detection in HCI: from speech features to emotion space. In: Proceedings of the 12th IFAC, IFIP, IFORS, IEA symposium on analysis, design, and evaluation of human–machine systems. Las Vegas, USA; 2013.Google Scholar
  40. 40.
    Hassan A, Damper RI, Niranjan M. On acoustic emotion recognition: compensating for covariate shift. IEEE Trans Audio Speech Lang Process. 2013;21(7):1458–1468.Google Scholar
  41. 41.
    Hermansky H, Morgan N. Rasta processing of speech. IEEE Trans Speech Audio Process. 1994;2(4):578–89.CrossRefGoogle Scholar
  42. 42.
    Ho CH. Speaker modelling for voice conversion. Ph.D. thesis, Department of Electronic and Computer Engineering, Brunel University, London; 2001.Google Scholar
  43. 43.
    Hubeika V. Estimation of gender and age from recorded speech. In: Proceedings of the ACM student research competition. Czech Technical University; 2006. p. 25–32.Google Scholar
  44. 44.
    Kelly F, Harte N. Effects of long-term ageing on speaker verification. In: Proceedings of the COST 2101 European conference on Biometrics and ID management. Berlin: Springer; 2011. p. 113–124.Google Scholar
  45. 45.
    Kinnunen T, Li H. An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 2010;52(1):12–40.CrossRefGoogle Scholar
  46. 46.
    Kockmann M, Burget L, Černocký JH. Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Commun. 2011;53(9–10):1172–85.CrossRefGoogle Scholar
  47. 47.
    Lee L, Rose R. Speaker normalization using efficient frequency warping procedures. IEEE Int Conf Acoust Speech Signal Process. 1996;1:353–6.Google Scholar
  48. 48.
    Lee L, Rose R. A frequency warping approach to speaker normalization. IEEE Trans Speech Audio Process. 1998;6(1):49–60.CrossRefGoogle Scholar
  49. 49.
    Lee MW, Kwak KC. Performance comparison of gender and age group recognition for human-robot interaction. IJACSA. 2012;3:207–11.Google Scholar
  50. 50.
    Lee S, Potamianos A, Narayanan S. Analysis of children’s speech: duration, pitch and formants. In: Proceedings of interspeech, vol 1; 1997. p. 473–476.Google Scholar
  51. 51.
    Li M, Han K, Narayanan S. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput Speech Lang. 2012;27:151–67.CrossRefGoogle Scholar
  52. 52.
    Li M, Jung CS, Han KJ. Combining five acoustic level modeling methods for automatic speaker age and gender recognition. In: Proceedings of interspeech; 2010. p. 2826–2829.Google Scholar
  53. 53.
    Lipovčan LK, Prizmić Z, Franc R. Age and gender differences in affect regulation strategies. Drustvena istrazivanja: J Gen Social Issues. 2009;18(6):1075–88.Google Scholar
  54. 54.
    Martin O, Kotsia I, Macq B, Pitas I. The enterface’05 audio–visual emotion database. In: Proceedings of the 22nd international conference on data engineering workshops. Washington, DC, USA: IEEE Computer Society; 2006.Google Scholar
  55. 55.
    Massaro D, Egan P. Perceiving affect from the voice and the face. Psychon Bull Rev. 1996;3(2):215–21.PubMedCrossRefGoogle Scholar
  56. 56.
    McDougall, W.: An introduction to social psychology. Adamant Media Corporation, Chestnut Hill, USA, Facsimile reprint of a 1912 edition by John W. Boston: Luce & Co.; 2001.Google Scholar
  57. 57.
    McKeown G, Valstar M, Cowie R, Pantic M. The semaine corpus of emotionally coloured character interactions. In: Proceedings of ICME; 2010. p. 1079–1084.Google Scholar
  58. 58.
    McRae K, Ochsner KN, Mauss IB, Gabrieli JJD, Gross JJ. Gender differences in emotion regulation: an fMRI study of cognitive reappraisal. Group Process Intergroup Relat. 2008;11(2):143–62.CrossRefGoogle Scholar
  59. 59.
    Meinedo H, Trancoso I. Age and gender detection in the i-dash project. ACM Trans Speech Lang Process. 2011;7(4):13:1–16.CrossRefGoogle Scholar
  60. 60.
    Mengistu KT. Robust acoustic and semantic modeling in a telephone-based spoken dialog system. Ph.D. thesis, Otto von Guericke University Magdeburg; 2009.Google Scholar
  61. 61.
    Morris JD. SAM: the self-assessment manikin an efficient cross-cultural measurement of emotional response. J Advert Res. 1995;35(6):63–8.Google Scholar
  62. 62.
    Mower E, Metallinou A, Lee C, Kazemzadeh A, Busso C, Lee S, Narayanan S. Interpreting ambiguous emotional expressions. In: 3rd Internatinal conference on affective computing and intelligent interaction and workshops (ACII); 2009.Google Scholar
  63. 63.
    Neiberg D, Elenius K, Laskowski K. A database of german emotional speech. In: Proceedings of interspeech; 2006. p. 809–812.Google Scholar
  64. 64.
    Olson DL, Delen D. Advanced data mining techniques. Berlin: Springer; 2008.Google Scholar
  65. 65.
    Paleari M, Huet B, Chellali R. Towards multimodal emotion recognition: a new approach. In: Proceedings of the ACM international conference on image and video retrieval. New York, NY, USA: ACM; 2010. p. 174–181.Google Scholar
  66. 66.
    Palm G, Glodek M. Towards emotion recognition in human computer interaction. In: Neural nets and surroundings, smart innovation, systems and technologies, vol. 19. Berlin: Springer; 2013. p. 323–36.Google Scholar
  67. 67.
    Panning A, Siegert I, Al-Hamadi A, Wendemuth A, Rösner D, Frommer J, Krell G, Michaelis B. Multimodal affect recognition in spontaneous HCI environment. In: IEEE international conference on signal processing, communications and computing; 2012. p. 430–435.Google Scholar
  68. 68.
    Plutchik R. Emotion, a psychoevolutionary synthesis. New York: Harper & Row; 1980.Google Scholar
  69. 69.
    Potamianos A, Narayanan S. A review of the acoustic and linguistic properties of children’s speech. In: IEEE 9th workshop on multimedia signal processing (MMSP 2007); 2007. p. 22–25.Google Scholar
  70. 70.
    Rabiner L, Cheng MJ, Rosenberg AE, McGonegal CA. A comparative performance study of several pitch detection algorithms. IEEE Trans ASSP. 1976;24:399–417.CrossRefGoogle Scholar
  71. 71.
    Rao K, Koolagudi S, Vempada R. Emotion recognition from speech using global and local prosodic features. IJST. 2013;16(2):143–60.Google Scholar
  72. 72.
    Rosenberg A. Classifying skewed data: importance weighting to optimize average recall. In: Proceedings of Interspeech; 2012.Google Scholar
  73. 73.
    Rösner D, Friesen R, Otto M, Lange J, Haase M, Frommer J. Intentionality in interacting with companion systems—an empirical approach. In: Human–computer interaction, towards mobile and intelligent interaction environments, LNCS, vol. 6763. Berlin, Heidelberg: Springer; 2011. p. 593–602.Google Scholar
  74. 74.
    Rösner D, Frommer J, Andrich R, Friesen R, Haase M, Kunze M, Lange J, Otto M. LAST MINUTE: a novel corpus to support emotion, sentiment and social signal processing. In: 4th International workshop on corpora for research on emotion sentiment and social signals—ES3. ELRA; 2012. p. 82–89.Google Scholar
  75. 75.
    Rösner D, Frommer J, Friesen R, Haase M, Lange J, Otto M. LAST MINUTE: a multimodal corpus of speech-based user-companion interactions. In: Proceedings of the 8th LREC; 2012. p. 96–103.Google Scholar
  76. 76.
    Ruvolo P, Fasel I, Movellan JR. A learning approach to hierarchical feature selection and aggregation for audio classification. Pattern Recogn Lett. 2010;31(12):1535–42.CrossRefGoogle Scholar
  77. 77.
    Scherer K. Appraisal considered as a process of multilevel sequential checking. In: Scherer KR, Schorr A, Johnstone T, editors. Appraisal processes in emotion: theory, methods, research. Oxford: Oxford University Press; 2001. p. 92–120.Google Scholar
  78. 78.
    Scherer K, Dan E, Flykt A. What determines a feeling’s position in affective space? A case for appraisal. Cogn Emot. 2006;20:92–113.CrossRefGoogle Scholar
  79. 79.
    Schiel F. Automatic phonetic transcription of non-prompted speech. In: Proceedings of the XIVth international congress of phonetic sciences, ICPhS99. San Francisco; 1999. p. 607–610.Google Scholar
  80. 80.
    Schuller B, Batliner A, Steidl S, Seppi D. Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 2011;53(9–10):1062–87.CrossRefGoogle Scholar
  81. 81.
    Schuller B, Müller R, Eyben F, Gast J, Hörnler B, Wöllmer M, Rigoll G, Höthker A, Konosu H. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput. 2009;27(12):1760–74.CrossRefGoogle Scholar
  82. 82.
    Schuller B, Seppi D, Batliner A, Maier A, Steidl S. Towards more reality in the recognition of emotional speech. In: IEEE international conference on acoustics, speech and signal processing, vol. 4; 2007. p. IV-941–IV-944.Google Scholar
  83. 83.
    Schuller B, Steidl S, Batliner A. The interspeech 2009 emotion challenge. In: Proceedings of INTERSPEECH’2009. Brighton, UK: ISCA; 2009. p. 312–315.Google Scholar
  84. 84.
    Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A. Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE automatic speech recognition and understanding workshop, ASRU 2009. Merano, Italy; 2009 . p. 552–557.Google Scholar
  85. 85.
    Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G. Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput. 2010;I:119–31.CrossRefGoogle Scholar
  86. 86.
    Schuller B, Wöllmer M, Eyben F, Rigoll G. Spectral or voice quality? Feature type relevance for the discrimination of emotion pairs. In: Hancil S, editor. The role of prosody in affective speech, linguistic insights, studies in language and communication, vol. 97. Frankfurt am Main: Peter Lang Publishing Group; 2009. p. 285–307.Google Scholar
  87. 87.
    Schwenker F, Scherer S, Schmidt M, Schels M, Glodek M. Multiple classifier systems for the recogonition of human emotions. In: Gayar N, Kittler J, Roli F, editors. Multiple classifier systems, LNCS, vol. 5997. Berlin: Springer; 2010. p. 315–24.CrossRefGoogle Scholar
  88. 88.
    Shahin I. Gender-dependent emotion recognition based on HMMs and SPHMMs. Int J Speech Technol. 2013;16(2):133–41.CrossRefGoogle Scholar
  89. 89.
    Siegert I, Böck R, Philippou-Hübner D, Wendemuth A. Investigation of hierarchical classification for simultaneous gender and age recognitions. In: Proceedings of the 23. ESSV; 2012.Google Scholar
  90. 90.
    Siegert I, Böck R, Wendemuth A. Inter-rater reliability for emotion annotation in human–computer interaction—comparison and methodological improvements. J Multimodal User Interfaces. 2013;8(1):17–28.Google Scholar
  91. 91.
    Siegert I, Böck R, Philippou-Hübner D, Vlasenko B, Wendemuth A. Appropriate emotional labeling of non-acted speech using basic emotions, Geneva emotion wheel and self assessment manikins. In: Proceedings of ICME; 2011.Google Scholar
  92. 92.
    Siegert I, Böck R, Wendemuth A. The influence of context knowledge for multimodal annotation on natural material. In: Joint Proceedings of the IVA 2012 workshops. Otto von Guericke University Magdeburg; 2012. p. 25–32.Google Scholar
  93. 93.
    Siegert I, Glodek M, Panning A, Krell G, Schwenker F, Al-Hamadi A, Wendemuth A. Using speaker group-dependent modelling to improve fusion of fragmentary classifier decisions. In: IEEE international conference on cybernetics (CYBCONF); 2013. p. 132–137.Google Scholar
  94. 94.
    Siegert I, Hartmann K, Böck R, Wendemuth A. Speaker group-dependent modelling for affect recognition from speech. In: ERM4HCI 2013: The 1st workshop on emotion representation and modelling in human–computer-interaction-systems; 2013.Google Scholar
  95. 95.
    Steidl S, Batliner A, Nöth E, Hornegger J. Quantification of segmentation and f0 errors and their effect on emotion recognition. In: Sojka P, Horák A, Kopeček I, Pala K, editors. Text, speech and dialogue, vol. 5246., Lecture Notes in Computer ScienceBerlin: Springer; 2008. p. 525–34.CrossRefGoogle Scholar
  96. 96.
    Suzuki M, Tsuchiya S, Ren F. A novel emotion recognizer from speech using both prosodic and linguistic features. In: König A, Dengel A, Hinkelmann K, Kise K, Howlett R, Jain L, editors. Knowledge-based and intelligent information and engineering systems, vol. 6881., Lecture Notes in Computer ScienceBerlin: Springer; 2011. p. 456–65.CrossRefGoogle Scholar
  97. 97.
    Takahashi K. Remarks on emotion recognition from biopotential signals. In: 2nd International conference on autonomous robots and agents; 2004. p. 186–191.Google Scholar
  98. 98.
    Tan L, Karnjanadecha M. Pitch detection algorithm: autocorrelation method and AMDF. In: Proceedings of the 3rd international symposium on communications and information technology; 2003. pp. 551–556.Google Scholar
  99. 99.
    Truong KP, van Leeuwen DA, de Jong FM. speech-based recognition of self-reported and observed emotion in a dimensional space. Speech Commun. 2012;54(9):1049–63.CrossRefGoogle Scholar
  100. 100.
    Truong KP, Neerincx MA, van Leeuwen DA. Assessing agreement of observer- and self-annotations in spontaneous multimodal emotion data. In: Proceedings of interspeech; 2008. p. 318–321.Google Scholar
  101. 101.
    Vaughan B, Kosidis S, Cullen C, Wang Y. Task-based mood induction procedures for the elicitation of natural emotional responses. In: The 4th international conference on cybernetics and information technologies, systems and applications. Orlando, Florida; 2007.Google Scholar
  102. 102.
    Vergin R, Farhat A, O’Shaughnessy D. Robust gender-dependent acoustic–phonetic modelling in continuous speech recognition based on a new automatic male/female classification. In: 4th International conference on spoken language processing; 1996. p. 1081–1084.Google Scholar
  103. 103.
    Viterbi A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory. 1967;13(2):260–9.CrossRefGoogle Scholar
  104. 104.
    Vogt T, André E. Improving automatic emotion recognition from speech via gender differentiation. In: Proceedings of the 5th LREC; 2006.Google Scholar
  105. 105.
    Walter S, Scherer S, Schels M, Glodek M, Hrabal D, Schmidt M, Böck R, Limbrecht K, Traue H, Schwenker F. Multimodal emotion classification in naturalistic user behavior. In: Human–computer interaction, towards mobile and intelligent interaction environments, LNCS, vol. 6763. Berlin, Heidelberg: Springer; 2011. p. 603–11.Google Scholar
  106. 106.
    Wegmann S, McAllaster D, Orloff J, Peskin B. Speaker normalization on conversational telephone speech. In: Proceedings of IEEE ICASSP’96, vol. 1; 1996. p. 339–341.Google Scholar
  107. 107.
    Wendemuth A, Biundo S. A companion technology for cognitive technical systems. In: Cognitive behavioural systems, LNCS, vol. 7403. Berlin, Heidelberg: Springer; 2012. p. 89–103.Google Scholar
  108. 108.
    Wong E, Sridharan S. Utilise vocal tract length normalisation for robust automatic language identification. In: Proceedings of the 9th Australian international conference on speech science and technology. Melbourne, Victoria, Australia; 2002.Google Scholar
  109. 109.
    Wundt W. Vorlesungen über die Menschen- und Tierseele. 4th ed. Leipzig: L. Voss; 1906.Google Scholar
  110. 110.
    Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P. The HTK book (for HTK Version 3.4). Cambridge: Cambridge University Press; 2006.Google Scholar
  111. 111.
    Zeng Z, Pantic M, Roisman GI, Huang TS. A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell. 2009;31:39–58.PubMedCrossRefGoogle Scholar
  112. 112.
    Zeng Z, Tu J, Pianfetti BM, Huang TS. Audio–visual affective expression recognition through multistream fused HMM. Trans Multimed. 2008;10(4):570–7.CrossRefGoogle Scholar
  113. 113.
    Zhan P, Waibel A. Vocal tract length normalization for large vocabulary continuous speech recognition. Technical report, CMU-CS-97-148. Carnegie Mellon University; 1997.Google Scholar
  114. 114.
    Zhang S, Li L, Zhao Z. Audio–visual emotion recognition based on facial expression and affective speech. In: Wang F, Lei J, Lau R, Zhang J, editors. Multimedia and signal processing, communications in computer and information science, vol. 346. Berlin: Springer; 2012. p. 46–52.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Ingo Siegert
    • 1
  • David Philippou-Hübner
    • 1
  • Kim Hartmann
    • 1
  • Ronald Böck
    • 1
  • Andreas Wendemuth
    • 1
  1. 1.IIKT and CBBSOtto von Guericke University MagdeburgMagdeburgGermany

Personalised recommendations