Attention and Emotion Based Adaption of Dialog Systems

  • Sebastian HommelEmail author
  • Ahmad Rabie
  • Uwe Handmann
Part of the Topics in Intelligent Engineering and Informatics book series (TIEI, volume 3)


In this work methods are described, which are used for an individual adaption of a dialog system. Anyway, an automatic real-time capable visual user attention estimation for a face to face human machine interaction is described. Furthermore, an emotion estimation is presented, which combines a visual and an acoustic method. Both, the attention estimation and the visual emotion estimation based on Active Appearance Models (AAMs). Certainly, for the attention estimation Multilayer Perceptrons (MLPs) are used to map the Active Appearance Parameters (AAM-Parameters) onto the current head pose. Afterwards, the chronology of the head poses is classified as attention or inattention. In the visual emotion estimation the AAM-Parameter will be classified by a Support-Vector-Machine (SVM). The acoustic emotion estimation also use a SVM to classifies emotion related audio signal features into the 5 basis emotions (neutral, happy, sad, anger, surprise). Afterward, a Bayes network is used to combine the results of the visual and the acoustic estimation in the decision level. The visual attention estimation as well as the emotion estimation will be used in service robotic to allow a more natural and human like dialog. Furthermore, the human head pose is very efficient interpreted as head nodding or shaking by the use of adaptive statistical moments. Especially, the head movement of many demented people are restricted, so they often only use their eyes to look around. For that reason, this work examine a simple gaze estimation with the help of an ordinary webcam. Moreover, a full body user re-identification method is described, which allows an individual state estimation of several people for hight dynamic situations. In this work an appearance based method is described, which allows a fast people re-identification over a short time span to allow the usage of individual parameter.


Multilayer Perceptron Bayes network active appearance model support-vector-machine 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ba, S., Odobez, J.: Recognizing Visual Focus of Attention from Head Pose in Natural Meetings (2009)Google Scholar
  2. 2.
    Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information. In: Proc. Int. Conf. Multimodal Interfaces (2004)Google Scholar
  3. 3.
    Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Raouzaiou, A., Karpouzis, K.: Modeling naturalistic affective states via facial and vocal expressions recognition. In: Proc. Int. Conf. Multimodal Interfaces, pp. 146–154. ACM, New York (2006)Google Scholar
  4. 4.
    Castrillón, M., Déniz, O., Guerra, C., Hernández, M.: ENCARA2: Real-time detection of multiple faces at different resolutions in video streams. Journal of Visual Communication and Image Representation 18(2), 130–140 (2007)CrossRefGoogle Scholar
  5. 5.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active Appearance Models. PAMI 23(6), 681–685 (2001)CrossRefGoogle Scholar
  6. 6.
    Grießbach, G.: Weiterentwicklung und Anwendung komplexer adaptiver Schätzalgorithmen in der Biosignalanalyse, der Bildverarbeitung und der Klassifikation zur EBG-Analyse kognitiver Prozesse. DFG-Antrag Gr1 55511-2 (1998)Google Scholar
  7. 7.
    Handmann, U., Hommel, S., Brauckmann, M., Dose, M.: Face Detection and Person Identification on Mobile Platforms. Springer Tracts in Advanced Robotics (STAR). Springer, Germany (2012)Google Scholar
  8. 8.
    Hegel, F., Spexard, T., Vogt, T., Horstmann, G., Wrede, B.: Playing a different imitation game: Interaction with an empathicandroid robot. In: Proc. Int. Conf. Humanoid Robots, pp. 56–61 (2006)Google Scholar
  9. 9.
    Hommel, S.: Zeitliche Analyse von Emotionen auf Basis von Active Appearance Modellen. GRIN Verlag GmbH (2010)Google Scholar
  10. 10.
    Hommel, S., Handmann, U.: AAM based Continuous Facial Expression Recognition for Face Image Sequences. In: 2011 12th IEEE International Symposium on Computational Intelligence and Informatics (CINTI), Budapest, Hungary, pp. 189–194 (2011)Google Scholar
  11. 11.
    Hommel, S., Handmann, U.: Realtime AAM based User Attention Estimation. In: 2011 IEEE 9th International Symposium Intelligent Systems and Informatics (SISY), Subotica, Serbia, pp. 201–206 (2011)Google Scholar
  12. 12.
    King, L.M., Taylor, P.B.: Hands-free Head-movement Gesture Recognition using Artificial Neural Networks and the Magnified Gradient Function. In: IEEE Conf. of the Engineering in Medicine and Biology Society, pp. 2063–2066 (2005)Google Scholar
  13. 13.
    Lu, P., Zhang, M., Zhu, X., Wang, Y.: Head nod and shake recognition based on multi-view model and hidden Markov model. In: Computer Graphics, Imaging and Vision: New Trends, pp. 61–64 (2005)Google Scholar
  14. 14.
    Morency, L.-P., Trevor, D.: Head gesture recognition in intelligent interfaces: the role of context in improving recognition. In: Proceedings of the 11th International Conference on Intelligent User Interfaces, IUI 2006, pp. 32–38. ACM, New York (2006)CrossRefGoogle Scholar
  15. 15.
    Müller, S., Schröter, C., Gross, H.-M.: Aspects of user specific dialog adaptation for an autonomous robot. In: IWK (2010)Google Scholar
  16. 16.
    Prisacariu, V., Reid, I.: fastHOG - a real-time GPU implementation of HOG. Department of Engineering Science, Oxford University, Tech. Rep. 2310/09 (2009)Google Scholar
  17. 17.
    Paleari, M., Lisetti, C.L.: Toward multimodal fusion of affective cues. In: Proc. ACM int. Workshop on Human-Centered Multimedia, pp. 99–10. ACM, New York (2006)Google Scholar
  18. 18.
    Rabie, A., Handmann, U.: Fusion of Audio- and Visual Cues for Real-Life Emotional Human Robot Interaction. In: DAGM 2011 (2011)Google Scholar
  19. 19.
    Rabie, A., Lang, C., Hanheide, M., Castrillon-Santana, M., Sagerer, G.: Lang, Ch., Hanheide, M., Castrillon-Santana, M., Sagerer, G.: Automatic Initialization for Facial Analysis in Interactive Robotics. In: Proc. Int. Conf. Computer Vision Systems, Santorini, Greece (2008)Google Scholar
  20. 20.
    Scharr, H.: Optimal operators in digital image processing. Ph.D. thesis, Interdisciplinary Center for Scientific Computer, Ruprecht-Karls-Universität, Heidelberg (2000)Google Scholar
  21. 21.
    Silva, L.D., Chi, P.: Bimodal Emotion Recognition. In: Fourth IEEE Int. Conf. on Automatic Face and Gesture Recognition (2000)Google Scholar
  22. 22.
    Smith, K., Ba, S.O., Odobez, J.-M., Gatica-Perez, D.: Tracking the Visual Focus of Attention for a Varying Number of Wandering People. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 1212–1229 (2008)CrossRefGoogle Scholar
  23. 23.
    Smith, P., Member, S., Shah, M., Lobo, N.D.V.: Determining Driver Visual Attention with One Camera. IEEE Trans. on Intelligent Transportation Systems 4 (2003)CrossRefGoogle Scholar
  24. 24.
    Speckmann, E.J., Hescheler, J., Köhling, R.: Repetitorium Physiologie, 2nd edn. Urban & Fischer Verlag (2008)Google Scholar
  25. 25.
    Stiefelhagen, R., Finke, M., Yang, J., Waibel, A.: From Gaze to Focus of Attention (1998)Google Scholar
  26. 26.
    Stricker, R., Martin, C., Gross, H.-M.: Increasing the Robustness of 2D Active Appearance Models for Real-World Applications. In: Fritz, M., Schiele, B., Piater, J.H. (eds.) ICVS 2009. LNCS, vol. 5815, pp. 364–373. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  27. 27.
    Trefflich, B.: Videogestützte Überwachung der Fahreraufmerksamkeit und Adaption von Faherassistenzsystemen. Technische Universität Ilmenau (2009)Google Scholar
  28. 28.
    Viola, P., Jones, M.: Robust Real-time Object Detection. In: Second Int. Workshop on Statistical and Computational Theories of Vision-Modeling, Learning, Computing, and Sympling (2001)Google Scholar
  29. 29.
    Vogt, T., André, E., Bee, N.: EmoVoice — A Framework for Online Recognition of Emotions from Voice. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds.) PIT 2008. LNCS (LNAI), vol. 5078, pp. 188–199. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  30. 30.
    Wallhoff, F.: Facial Expressions and Emotion Database. Technische Universität München (2006),
  31. 31.
    Zeng, Z., Pantic, M., Roisman, M.I., Huang, T.S.: A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Transaction on Pattern Analysis and Macine Intellegence 31, 39–58 (2009)CrossRefGoogle Scholar
  32. 32.
    Zeng, Z., Tu, J., Liu, M., Zhang, T., Rizzolo, N., Zhang, Z., Huang, T.S., Roth, D., Levinson, S.: Bimodal HCI-related Affect Recognition. In: ICMI (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Computer Science InstituteUniversity of Applied Sciences Ruhr WestBottropGermany

Personalised recommendations