Efficient behavior learning in human–robot collaboration
- 691 Downloads
We present a novel method for a robot to interactively learn, while executing, a joint human–robot task. We consider collaborative tasks realized by a team of a human operator and a robot helper that adapts to the human’s task execution preferences. Different human operators can have different abilities, experiences, and personal preferences so that a particular allocation of activities in the team is preferred over another. Our main goal is to have the robot learn the task and the preferences of the user to provide a more efficient and acceptable joint task execution. We cast concurrent multi-agent collaboration as a semi-Markov decision process and show how to model the team behavior and learn the expected robot behavior. We further propose an interactive learning framework and we evaluate it both in simulation and on a real robotic setup to show the system can effectively learn and adapt to human expectations.
KeywordsInteractive learning Human–robot collaboration Relational learning
This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013 and by the EU FP7-ICT project 3rdHand under Grant Agreement No. 610878.
- Akrour, R., Schoenauer, M., & Sebag, M. (2011). Preference-based policy learning. In ECML/PKDD Springer.Google Scholar
- Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. In Annals of statistics (pp. 1189–1232).Google Scholar
- Grollman, D. H, & Jenkins, O. C. (2007) Dogged learning for robots. In ICRA.Google Scholar
- Jain, A., Wojcik, B., Joachims, T., & Saxena, A. (2013). Learning trajectory preferences for manipulators via iterative improvement. In NIPS.Google Scholar
- Kersting, K., Otterlo, M. V., & De Raedt, L. (2004). Bellman goes relational. In ICML.Google Scholar
- Knox, W. B., Stone, P., & Breazeal, C. (2013). Training a robot via human feedback: A case study. In ICSR.Google Scholar
- Koppula, H. S., Jain, A., & Saxena, A. (2016). Anticipatory planning for human–robot teams. In ISER.Google Scholar
- Lee, M. K., Forlizzi, J., Kiesler, S., Rybski, P., Antanitis, J., & Savetsila, S. (2012). Personalization in HRI: A longitudinal field experiment. In HRI.Google Scholar
- Lopes, M., Melo, F., & Montesano, L. (2009). Active learning for reward estimation in inverse reinforcement learning. In ECML/PKDD.Google Scholar
- Marek, V., & Truszczyński, W. (1999) Stable models and an alternative logic programming paradigm. In The logic programming paradigm: A 25-year perspective.Google Scholar
- Mason, M., & Lopes, M. (2011) Robot self-initiative and personalization by learning through repeated interactions. In HRI.Google Scholar
- Munzer, T., Piot, B., Geist, M., Pietquin, O., & Lopes, M. (2015). Inverse reinforcement learning in relational domains. In IJCAI.Google Scholar
- Natarajan, S., Joshi, S., Tadepalli, P., Kersting, K., & Shavlik, J. (2011). Imitation learning in relational domains: A functional-gradient boosting approach. In IJCAI.Google Scholar
- Nikolaidis, S., & Shah, J. (2013). Human–robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy. In HRI.Google Scholar
- Nikolaidis, S., Gu, K., Ramakrishnan, R., & Shah, J. (2014). Efficient model learning for human–robot collaborative tasks. arXiv preprint arXiv:1405.6341.
- Rohanimanesh, K., & Mahadevan, S. (2005) Coarticulation: An approach for generating concurrent plans in markov decision processes. In ICML.Google Scholar
- Shivaswamy, P. K., & Joachims, T. (2012). Online structured prediction via coactive learning. In ICML.Google Scholar
- Toussaint, M., Munzer, T., Mollard, Y., Wu, L. Y., Vien, N. A., & Lopes, M. (2016). Relational activity processes for modeling concurrent cooperation. In ICRA.Google Scholar