International Journal of Speech Technology

, Volume 10, Issue 2–3, pp 109–119 | Cite as

Challenges in speech-based human–computer interfaces

  • Wolfgang MinkerEmail author
  • Johannes Pittermann
  • Angela Pittermann
  • Petra-Maria Strauß
  • Dirk Bühler


In this article we present an overview of our current research activities falling into the scope of developing advanced spoken language dialogue systems. These systems need to react flexibly and adaptively depending on the current status of the user and the situation of use. In particular, they require emotion recognition and adaptive dialogue management techniques. Advanced dialogue systems also need proactive capabilities to act as intelligent assistants to their users.


Emotion recognition Reasoning Dialogue management Intelligence Spoken language dialogue systems 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. André, E., Rehm, M., Minker, W., & Bühler, D. (2004). Endowing spoken language dialogue systems with emotional intelligence. In Tutorial and research workshop affective dialogue systems (pp. 178–187). Irsee (Germany), June 2004. Google Scholar
  2. Barthelmess, P., & Ellis, C. A. (2005). The Neem platform: an evolvable framework for perceptual collaborative applications. Journal of Intelligent Information Systems, 2, September 2005. Google Scholar
  3. Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345. Google Scholar
  4. Bosma, W., & André, E. (2004). Exploiting emotions to disambiguate dialogue acts. In Proceedings of the 9th international conference on intelligent user interface (IUI) (pp. 85–92). ACM Press. Google Scholar
  5. Bry, F., & Yahya, A. (1996). Minimal model generation with positive unit hyper-resolution tableaux. In P. Miglioli, U. Moscato, D. Mundici & M. Ornaghi (Eds.), Proceedings of theorem proving with tableaux and related methods, 5th international workshop, TABLEAUX’96, Terrasini, Palermo, Italy. Springer. Google Scholar
  6. Bühler, D., & Hamerich, S. (2004). Towards embedding VoiceXML applications through compilation. In Workshop Dialogsysteme mit XML-Technologien, Berliner XML Tage. Berlin, Germany, October 2004. Google Scholar
  7. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Proceedings of ICSLP (pp. 1517–1520), September 2005. Google Scholar
  8. Chu-Carroll, J., & Carberry, S. (2000). Conflict resolution in collaborative planning dialogues. International Journal of Human-Computer Studies, 53, 969–1015. zbMATHCrossRefGoogle Scholar
  9. ECMA (1999). ECMA-262: ECMAscript language specification. European Computer Manufacturers’ Association (ECMA). Google Scholar
  10. Ferguson, G., & Allen, J. (1998). TRIPS: An integrated intelligent problem-solving assistant. In AAAI/IAAI (pp. 567–572). Google Scholar
  11. Fiscus, J. G. (1997). A post-processing system to yield reduced word error rates: recogniser output voting error reduction (ROVER). In Proceedings of the IEEE workshop on automatic speech recognition and understanding (pp. 347–352). Santa Barbara, USA. Google Scholar
  12. Larsson, S., & Traum, D. R. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language Engineering, 6(3–4), 323–340. CrossRefGoogle Scholar
  13. Litman, D. J., & Pan, S. (2002). Designing and evaluating an adaptive spoken dialogue system. User Modeling and User-Adapted Interaction, 12, 111–137. zbMATHCrossRefGoogle Scholar
  14. Minker, W., Pittermann, J., Pittermann, A., Strauss, P.-M., & Bühler, D. (2006). Next-generation human–computer interfaces—towards intelligent, adaptive and proactive spoken language dialogue systems. In 2nd IEE international conference on intelligent environments, Athens (Greece), July 2006. Google Scholar
  15. Minker, W., Pittermann, J., Pittermann, A., Strauss, P.-M., & Bühler, D. (2008). Speech communication at the leading edge. Hauppauge: Nova Science Publishers. Intelligent and Empathic Speech Interfaces, Chapter 3. Google Scholar
  16. Pittermann, A., & Pittermann, J. (2006). Getting bored with HTK? Using HMMs for emotion recognition. In 8th international conference on signal processing (ICSP), Guilin, China, November 2006. Google Scholar
  17. Pittermann, J., & Pittermann, A. (2006). Integrating emotion recognition into an adaptive spoken language dialogue system. In 2nd IET international conference on intelligent environments, Athens, Greece, July 2006. Google Scholar
  18. Pittermann, J., & Pittermann, A. (2007). A data-oriented approach to integrate emotions in adaptive dialogue management. In International conference on intelligent user interfaces (IUI), (pp. 270–273), Honolulu, USA, January 2007. Google Scholar
  19. Pittermann, J., Pittermann, A., Meng, H., & Minker, W. (2007). Towards an emotion-sensitive spoken dialogue system—classification and dialogue modeling. In 3rd IET international conference on intelligent environments. Ulm, Germany, September 2007. Google Scholar
  20. Pittermann, J., Rittinger, A., & Minker, W. (2005). Flexible dialogue management in intelligent human–machine interfaces. In The IEE international workshop on intelligent environments, Univ. of Essex, Colchester, UK. Google Scholar
  21. Qu, Y., & Green, N. (2002). A constraint-based approach for cooperative information-seeking dialogues. In Proceedings of international natural language generation conference, INLG02, New York, NY. Google Scholar
  22. Reeves, B., & Nass, C. (1996). The media equation: how people treat computers, television, and new media like real people and places. Cambridge: Cambridge University Press. Google Scholar
  23. Renals, S. (2004). Ami: augmented multi-party interaction. Google Scholar
  24. Stiefelhagen, R., Steusloff, H., & Waibel, A. (2004). CHIL—computers in the human interaction loop. In Proceedings of NIST ICASSP meeting recognition workshop, Montreal, Canada. Google Scholar
  25. Strauß, P.-M. (2006). A SLDS for perception and interaction in multi-user environments. In Proceedings of the 2nd IET international conference on intelligent environments 2006, Athens, Greece. Google Scholar
  26. Strauß, P.-M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., & Weidenbacher, U. (2006). Wizard-of-Oz data collection for perception and interaction in multi-user environments. In 5th international conference on language resources and evaluation (LREC), Genova, Italy. Google Scholar
  27. Strauß, P.-M., Hoffmann, H., & Scherer, S. (2007). Evaluation and user acceptance of a dialogue system using wizard-of-oz recordings. In Proceedings of the 3rd IET international conference on intelligent environments 2007, Ulm, Germany. Google Scholar
  28. Strauß, P.-M., & Jahn, M. (2007). Using frame semantics on a domain dependent corpus. In Workshop on modeling and representation in computational semantics (MRCS), Hyderabad, India. Google Scholar
  29. Weidenbacher, U., Layher, G., Bayerl, P., & Neumann, H. (2006). Detection of head pose and gaze direction for human–computer interaction. In International tutorial and research workshop on perception and interactive technologies (PIT 2006), Kloster Irsee, Germany, LNCS (vol. 4021, pp. 9–19). Berlin: Springer. Google Scholar
  30. Young, S. (1994). The HTK Hidden Markov Model Toolkit: Design and philosophy. Cambridge University Engineering Department, UK, Tech. Rep. CUED/F-INFENG/TR152. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Wolfgang Minker
    • 1
    Email author
  • Johannes Pittermann
    • 1
  • Angela Pittermann
    • 1
  • Petra-Maria Strauß
    • 1
  • Dirk Bühler
    • 1
  1. 1.Institute of Information TechnologyUlm UniversityUlmGermany

Personalised recommendations