Autonomous Robots

, Volume 42, Issue 5, pp 1103–1115 | Cite as

Efficient behavior learning in human–robot collaboration

  • Thibaut Munzer
  • Marc Toussaint
  • Manuel LopesEmail author
Part of the following topical collections:
  1. Special Issue: Learning for Human-Robot Collaboration


We present a novel method for a robot to interactively learn, while executing, a joint human–robot task. We consider collaborative tasks realized by a team of a human operator and a robot helper that adapts to the human’s task execution preferences. Different human operators can have different abilities, experiences, and personal preferences so that a particular allocation of activities in the team is preferred over another. Our main goal is to have the robot learn the task and the preferences of the user to provide a more efficient and acceptable joint task execution. We cast concurrent multi-agent collaboration as a semi-Markov decision process and show how to model the team behavior and learn the expected robot behavior. We further propose an interactive learning framework and we evaluate it both in simulation and on a real robotic setup to show the system can effectively learn and adapt to human expectations.


Interactive learning Human–robot collaboration Relational learning 



This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013 and by the EU FP7-ICT project 3rdHand under Grant Agreement No. 610878.


  1. Akrour, R., Schoenauer, M., & Sebag, M. (2011). Preference-based policy learning. In ECML/PKDD Springer.Google Scholar
  2. Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1), 285–297.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34(1), 1.MathSciNetzbMATHGoogle Scholar
  4. Džeroski, S., De Raedt, L., & Driessens, K. (2001). Relational reinforcement learning. Machine Learning, 43(1–2), 7–52.CrossRefzbMATHGoogle Scholar
  5. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. In Annals of statistics (pp. 1189–1232).Google Scholar
  6. Grollman, D. H, & Jenkins, O. C. (2007) Dogged learning for robots. In ICRA.Google Scholar
  7. Jain, A., Wojcik, B., Joachims, T., & Saxena, A. (2013). Learning trajectory preferences for manipulators via iterative improvement. In NIPS.Google Scholar
  8. Kersting, K., Otterlo, M. V., & De Raedt, L. (2004). Bellman goes relational. In ICML.Google Scholar
  9. Knox, W. B., Stone, P., & Breazeal, C. (2013). Training a robot via human feedback: A case study. In ICSR.Google Scholar
  10. Koppula, H. S., Jain, A., & Saxena, A. (2016). Anticipatory planning for human–robot teams. In ISER.Google Scholar
  11. Lang, T., & Toussaint, M. (2010). Planning with noisy probabilistic relational rules. Journal of Artificial Intelligence Research, 39(1), 1–49.zbMATHGoogle Scholar
  12. Lee, M. K., Forlizzi, J., Kiesler, S., Rybski, P., Antanitis, J., & Savetsila, S. (2012). Personalization in HRI: A longitudinal field experiment. In HRI.Google Scholar
  13. Lopes, M., Melo, F., & Montesano, L. (2009). Active learning for reward estimation in inverse reinforcement learning. In ECML/PKDD.Google Scholar
  14. Marek, V., & Truszczyński, W. (1999) Stable models and an alternative logic programming paradigm. In The logic programming paradigm: A 25-year perspective.Google Scholar
  15. Mason, M., & Lopes, M. (2011) Robot self-initiative and personalization by learning through repeated interactions. In HRI.Google Scholar
  16. Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., & Hagita, N. (2008). Adapting robot behavior for human–robot interaction. Transactions on Robotics, 24(4), 911–916.CrossRefGoogle Scholar
  17. Munzer, T., Piot, B., Geist, M., Pietquin, O., & Lopes, M. (2015). Inverse reinforcement learning in relational domains. In IJCAI.Google Scholar
  18. Natarajan, S., Joshi, S., Tadepalli, P., Kersting, K., & Shavlik, J. (2011). Imitation learning in relational domains: A functional-gradient boosting approach. In IJCAI.Google Scholar
  19. Nikolaidis, S., & Shah, J. (2013). Human–robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy. In HRI.Google Scholar
  20. Nikolaidis, S., Gu, K., Ramakrishnan, R., & Shah, J. (2014). Efficient model learning for human–robot collaborative tasks. arXiv preprint arXiv:1405.6341.
  21. Rohanimanesh, K., & Mahadevan, S. (2005) Coarticulation: An approach for generating concurrent plans in markov decision processes. In ICML.Google Scholar
  22. Shivaswamy, P. K., & Joachims, T. (2012). Online structured prediction via coactive learning. In ICML.Google Scholar
  23. Toussaint, M., Munzer, T., Mollard, Y., Wu, L. Y., Vien, N. A., & Lopes, M. (2016). Relational activity processes for modeling concurrent cooperation. In ICRA.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.InriaBordeauxFrance
  2. 2.Machine Learning and Robotics LabUniversity of StuttgartStuttgartGermany
  3. 3.INESC-IDLisboaPortugal
  4. 4.Instituto Superior TecnicoLisboaPortugal

Personalised recommendations