Simulating Human-Robot Interactions for Dialogue Strategy Learning

  • Grégoire Milliez
  • Emmanuel Ferreira
  • Michelangelo Fiore
  • Rachid Alami
  • Fabrice Lefèvre
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8810)


Many robotic projects use simulation as a faster and easier way to develop, evaluate and validate software components compared with on-board real world settings. In the human-robot interaction field, some recent works have attempted to integrate humans in the simulation loop. In this paper we investigate how such kind of robotic simulation software can be used to provide a dynamic and interactive environment to both collect a multimodal situated dialogue corpus and to perform an efficient reinforcement learning-based dialogue management optimisation procedure. Our proposition is illustrated by a preliminary experiment involving real users in a Pick-Place-Carry task for which encouraging results are obtained.


Automatic Speech Recognition Belief State Dialogue System Dialogue Policy Partially Observable Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lucignano, L., Cutugno, F., Rossi, S., Finzi, A.: A dialogue system for multimodal human-robot interaction. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 197–204. ACM (2013)Google Scholar
  2. 2.
    Stiefelhagen, R., Ekenel, H.K., Fugen, C., Gieselmann, P., Holzapfel, H., Kraft, F., Nickel, K., Voit, M., Waibel, A.: Enabling multimodal human–robot interaction for the karlsruhe humanoid robot. IEEE Transactions on Robotics 23(5), 840–851 (2007)CrossRefGoogle Scholar
  3. 3.
    Byron, D.K., Fosler-Lussier, E.: The osu quake 2004 corpus of two-party situated problem-solving dialogs. In: Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2006) (2006)Google Scholar
  4. 4.
    Prommer, T., Holzapfel, H., Waibel, A.: Rapid simulation-driven reinforcement learning of multimodal dialog strategies in human-robot interaction. In: INTERSPEECH (2006)Google Scholar
  5. 5.
    Rieser, V., Lemon, O.: Learning effective multimodal dialogue strategies from wizard-of-oz data: Bootstrapping and evaluation. In: ACL, pp. 638–646 (2008)Google Scholar
  6. 6.
    Rusu, R.B., Maldonado, A., Beetz, M., Gerkey, B.P.: Extending Player/Stage/Gazebo towards cognitive robots acting in ubiquitous sensor-equipped environments. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) Workshop for Network Robot Systems (2007)Google Scholar
  7. 7.
    Nakaoka, S., Hattori, S., Kanehiro, F., Kajita, S., Hirukawa, H.: Constraint-based dynamics simulator for humanoid robots with shock absorbing mechanisms. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2007) (2007)Google Scholar
  8. 8.
    Diankov, R.: Automated Construction of Robotic Manipulation Programs. PhD thesis, Carnegie Mellon University, Robotics Institute (August 2010)Google Scholar
  9. 9.
    Freese, M., Singh, S., Ozaki, F., Matsuhira, N.: Virtual robot experimentation platform V-REP: A versatile 3D robot simulator. In: Ando, N., Balakirsky, S., Hemker, T., Reggiani, M., von Stryk, O. (eds.) SIMPAR 2010. LNCS, vol. 6472, pp. 51–62. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Lewis, M., Wang, J., Hughes, S.: Usarsim: Simulation for the study of human-robot interaction. Journal of Cognitive Engineering and Decision Making 2007, 98–120 (2007)CrossRefGoogle Scholar
  11. 11.
    Echeverria, G., Lemaignan, S., Degroote, A., Lacroix, S., Karg, M., Koch, P., Lesire, C., Stinckwich, S.: Simulating complex robotic scenarios with MORSE. In: Noda, I., Ando, N., Brugali, D., Kuffner, J.J. (eds.) SIMPAR 2012. LNCS, vol. 7628, pp. 197–208. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Lemaignan, S., Hanheide, M., Karg, M., Khambhaita, H., Kunze, L., Lier, F., Lütkebohle, I., Milliez, G.: Simulation and HRI recent perspectives with the MORSE simulator. In: Brugali, D., Broenink, J., Kroeger, T., MacDonald, B. (eds.) SIMPAR 2014. LNCS (LNAI), vol. 8810, pp. 13–24. Springer, Heidelberg (2014)Google Scholar
  13. 13.
    Milliez, G., Warnier, M., Clodic, A., Alami, R.: A framework for endowing interactive robot with reasoning capabilities about perspective-taking and belief management. In: Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication (2014)Google Scholar
  14. 14.
    Alili, S., Montreuil, V., Alami, R.: HATP Task Planner for social behavior control in Autonomous Robotic Systems for HRI. In: The 9th International Symposium on Distributed Autonomous Robotic Systems (2008)Google Scholar
  15. 15.
    Nau, D., Au, T.C., Ilghami, O., Kuter, U., Murdock, J.W., Wu, D., Yaman, F.: SHOP2: An HTN Planning System. Journal of Artificial Intelligence Research, 379–404 (2003)Google Scholar
  16. 16.
    Sisbot, E.A., Clodic, A., Alami, R., Ransan, M.: Supervision and Motion Planning for a Mobile Manipulator Interacting with Humans (2008)Google Scholar
  17. 17.
    Fiore, M., Clodic, A., Alami, R.: On planning and task achievement modalities for human-robot collaboration. In: International Symposium on Experimental Robotics, Marrakech/Essaouira, June 15-18 (2014)Google Scholar
  18. 18.
    Sutton, R., Barto, A.: Reinforcement learning: An introduction. IEEE Transactions on Neural Networks 9(5), 1054–1054 (1998)CrossRefGoogle Scholar
  19. 19.
    Young, S., Gašić, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., Yu, K.: The hidden information state model: A practical framework for pomdp-based spoken dialogue management. Computer Speech and Language 24(2), 150–174 (2010)CrossRefGoogle Scholar
  20. 20.
    Thomson, B., Young, S.: Bayesian update of dialogue state: A pomdp framework for spoken dialogue systems. Computer Speech and Language 24(4), 562–588 (2010)CrossRefGoogle Scholar
  21. 21.
    Pinault, F., Lefèvre, F.: Unsupervised clustering of probability distributions of semantic graphs for pomdp based spoken dialogue systems with summary space. In: IJCAI 7th KRPDS Workshop (2011)Google Scholar
  22. 22.
    Roy, N., Pineau, J., Thrun, S.: Spoken dialogue management using probabilistic reasoning. In: ACL (2000)Google Scholar
  23. 23.
    Gašić, M., Jurčíček, F., Keizer, S., Mairesse, F., Thomson, B., Yu, K., Young, S.: Gaussian processes for fast policy optimisation of pomdp-based dialogue managers. In: SIGDIAL (2010)Google Scholar
  24. 24.
    Sungjin, L., Eskenazi, M.: Incremental sparse bayesian method for online dialog strategy learning. Journal on Selected Topics in Signal Processing 6, 903–916 (2012)CrossRefGoogle Scholar
  25. 25.
    Daubigney, L., Geist, M., Chandramohan, S., Pietquin, O.: A comprehensive reinforcement learning framework for dialogue management optimization. Journal on Selected Topics in Signal Processing 6(8), 891–902 (2012)CrossRefGoogle Scholar
  26. 26.
    Geist, M., Pietquin, O.: Kalman temporal differences. Journal of Artificial Intelligence Research (JAIR) 39, 483–532 (2010)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Ferreira, E., Lefèvre, F.: Social signal and user adaptation in reinforcement learning-based dialogue management. In: Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication, pp. 61–69. ACM (2013)Google Scholar
  28. 28.
    Ferreira, E., Lefèvre, F.: Expert-based reward shaping and exploration scheme for boosting policy learning of dialogue management. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 108–113. IEEE (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Grégoire Milliez
    • 1
    • 2
  • Emmanuel Ferreira
    • 3
  • Michelangelo Fiore
    • 1
    • 2
  • Rachid Alami
    • 1
    • 2
  • Fabrice Lefèvre
    • 3
  1. 1.CNRS, LAASToulouseFrance
  2. 2.Université de Toulouse, UPS, INSA, INP, ISAE, LAASToulouseFrance
  3. 3.LIA, Université d’AvignonAvignon Cedex 9France

Personalised recommendations