Interactively shaping robot behaviour with unlabeled human instructions

Najar, Anis; Sigaud, Olivier; Chetouani, Mohamed

doi:10.1007/s10458-020-09459-6

Interactively shaping robot behaviour with unlabeled human instructions

Published: 08 May 2020

Volume 34, article number 35, (2020)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

512 Accesses
6 Citations
5 Altmetric
Explore all metrics

Abstract

In this paper, we propose a framework that enables a human teacher to shape a robot behaviour by interactively providing it with unlabeled instructions. We ground the meaning of instruction signals in the task-learning process, and use them simultaneously for guiding the latter. We implement our framework as a modular architecture, named TICS (Task-Instruction-Contingency-Shaping) that combines different information sources: a predefined reward function, human evaluative feedback and unlabeled instructions. This approach provides a novel perspective for robotic task learning that lies between Reinforcement Learning and Supervised Learning paradigms. We evaluate our framework both in simulation and with a real robot. The experimental results demonstrate the effectiveness of our framework in accelerating the task-learning process and in reducing the number of required teaching signals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interactive Robot Learning: An Overview

Learning from Humans

Coaching Robots: Online Behavior Learning from Human Subjective Feedback

Notes

The TM, IM, and CM are called Models to indicate that these are learning components, in which a model (of the task, instructions and contingency) is learned. By contrast, in the shaping component SC, there is no learning; the shaping method is determined in advance.
We also consider this task for the experiment with the real robot (cf. Sect. 7).
With myopic discounting (\(\gamma =0\)) [19], the Q-values play the same role as policy parameters in Actor-Critic. So, this method is still compatible with our view about evaluative feedback as information about the policy.
https://dev.windows.com/en-us/kinect, accessed 20-12-2014. We use a modified version of the Kinect V2 ROS client/server provided by the Personal Robotics Laboratory of Carnegie Mellon University. https://github.com/personalrobotics/, Last accessed 20-12-2014.
The first author of this paper.
https://youtu.be/TK9SwFedtUc.

References

Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.
Article Google Scholar
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC–13(5), 834–846.
Article Google Scholar
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359.
Article Google Scholar
Branavan, S. R. K., Chen, H., Zettlemoyer, L. S., & Barzilay, R. (2009). Reinforcement learning for mapping instructions to actions. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 1, ACL ’09 (pp. 82–90). Stroudsburg, PA, USA. Association for Computational Linguistics.
Branavan, S. R. K., Zettlemoyer, L. S., & Barzilay, R. (2010). Reading between the lines: Learning to map high-level instructions to commands. In Proceedings of the 48th annual meeting of the association for computational linguistics, ACL ’10 (pp. 1268–1277). Stroudsburg, PA, USA. Association for Computational Linguistics.
Chernova, S., & Thomaz, A. L. (2014). Robot learning from human teachers. Synthesis Lectures on Artificial Intelligence and Machine Learning, 8(3), 1–121.
Article Google Scholar
Clouse, J. A., & Utgoff, P. E. (1992). A teaching method for reinforcement learning. In Proceedings of the ninth international workshop on machine learning, ML ’92 (pp. 92–110). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Cruz, F., Twiefel, J., Magg, S., Weber, C., & Wermter, S. (2015). Interactive reinforcement learning through speech guidance in a domestic scenario. In 2015 international joint conference on neural networks (IJCNN), (pp. 1–8).
Doncieux, S., Bredeche, N., Mouret, J.-B., & Eiben, A. E. G. (2015). Evolutionary robotics: What, why, and where to. Frontiers in Robotics and AI, 2, 4.
Article Google Scholar
Feng, S., Whitman, E., Xinjilefu, X., & Atkeson, C. G. (2014). Optimization based full body control for the atlas robot. In 2014 14th IEEE-RAS international conference on humanoid robots (Humanoids) (pp. 120–127). IEEE.
García, J., & Fernández, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1), 1437–1480.
MathSciNet MATH Google Scholar
Griffith, S., Subramanian, K., Scholz, J., Isbell, C. L., & Thomaz, A. (2013). Policy shaping: Integrating human feedback with reinforcement learning. In Proceedings of the 26th international conference on neural information processing systems, NIPS’13 (pp. 2625–2633). USA: Curran Associates Inc.
Grizou, J., Lopes, M., & Oudeyer, P. Y. (2013). Robot learning simultaneously a task and how to interpret human instructions. In 2013 IEEE third joint international conference on development and learning and epigenetic robotics (ICDL) (pp. 1–8).
Grze, M., & Kudenko, D. (2010). Online learning of shaping rewards in reinforcement learning. Neural Networks, 23(4), 541–550.
Article Google Scholar
Ho, M. K., Littman, M. L., Cushman, F., & Austerweil, J. L. (2015). Teaching with rewards and punishments: Reinforcement or communication? In Proceedings of the 37th annual meeting of the cognitive science society.
Isbell, C., Shelton, C. R., Kearns, M., Singh, S., & Stone, P. (2001). A social reinforcement learning agent. In Proceedings of the fifth international conference on autonomous agents, AGENTS ’01 (pp. 377–384). New York, NY, USA: ACM.
Knox, W. B., Breazeal, C., & Stone, P. (2012). Learning from feedback on actions past and intended. In In Proceedings of 7th ACM/IEEE international conference on human–robot interaction, late-breaking reports session (HRI 2012).
Knox, W. B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the fifth international conference on knowledge capture, K-CAP ’09 (pp. 9–16). New York, NY, USA: ACM.
Knox, W. B., & Stone, P. (2012). Reinforcement learning from human reward: Discounting in episodic tasks. In 2012 IEEE RO-MAN: The 21st IEEE international symposium on robot and human interactive communication (pp. 878–885).
Knox, W. B., & Stone, P. (2012). Reinforcement learning from simultaneous human and MDP reward. In Proceedings of the 11th international conference on autonomous agents and multiagent systems, AAMAS ’12 (Vol. 1, pp. 475–482) Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems.
Knox, W. B., Stone, P., & Breazeal, C. (2013). Training a robot via human feedback: A case study. In Proceedings of the 5th International Conference on Social Robotics, ICSR 2013 (Vol. 8239, pp. 460–470). New York, NY, USA: Springer.
Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274.
Article Google Scholar
Konidaris, G., & Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of the 23rd international conference on machine learning, ICML ’06 (pp. 489–496). New York, NY, USA. ACM.
Konidaris, G., & Hayes, G. (2004). Estimating future reward in reinforcement learning animats using associative learning. In From animals to animals 8: Proceedings of the eighth international conference on the simulation of adaptive behavior (pp. 297–304). MIT Press.
Loftin, R., MacGlashan, J., Peng, B., Taylor, M. E., Littman, M. L., Huang, J., & Roberts, D. L. (2014). A strategy-aware technique for learning behaviors from discrete human feedback. In Proceedings of the twenty-eighth AAAI conference on artificial intelligence, AAAI’14 (pp. 937–943). Québec City, Québec, Canada. AAAI Press.
Loftin, R., Peng, B., Macglashan, J., Littman, M. L., Taylor, M. E., Huang, J., et al. (2016). Learning behaviors via human-delivered discrete feedback: Modeling implicit feedback strategies to speed up learning. Autonomous Agents and Multi-Agent Systems, 30(1), 30–59.
Article Google Scholar
MacGlashan, J., Babes-Vroman, M., desJardins, M., Littman, M., Muresan, S., Squire, S., Tellex, S., Arumugam, D., & Yang, L. (2015). Grounding English commands to reward functions. In Proceedings of robotics: Science and systems.
MacGlashan, J., Ho, M. K., Loftin, R., Peng, B., Wang, G., Roberts, D. L., Taylor, M. E., & Littman, M. L. (2017). Interactive learning from policy-dependent human feedback. In ICML.
Marthi, B. (2007). Automatic Shaping and Decomposition of Reward Functions. In Proceedings of the 24th international conference on machine learning, ICML ’07 (pp. 601–608). New York, NY, USA: ACM.
Mathewson, K. W., & Pilarski, P. M. (2016). Simultaneous control and human feedback in the training of a robotic agent with actor-critic reinforcement learning. arXiv preprintarXiv:1606.06979.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. A. (2013). Playing atari with deep reinforcement learning. CoRR, arXiv:abs/1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.
Article Google Scholar
Nicolescu, M. N., & Mataric, M. J. (2003). Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proceedings of the second international joint conference on autonomous agents and multiagent systems, AAMAS ’03 (pp. 241–248). New York, NY, USA: ACM.
Pradyot, K. V. N., Manimaran, S. S., & Ravindran, B. (2012). Instructing a reinforcement learner. In Proceedings of the twenty-fifth international Florida artificial intelligence research society conference (pp. 23–25). Marco Island, Florida.
Pradyot, K. V. N., Manimaran, S. S., Ravindran, B., & Natarajan, S. (2012). Integrating human instructions and reinforcement learners: An SRL approach. In Proceedings of the UAI workshop on statistical relational AI.
Quigley, M., Conley, K., Gerkey, B. P., Faust, J., Foote, T., Leibs, J., Wheeler, R., & Ng, A. Y. (2009). ROS: An open-source robot operating system. In ICRA workshop on open source software.
Rosenstein, M. T., Barto, A. G., Si, J., Barto, A., Powell, W., & Wunsch, D. (2004). Supervised actor-critic reinforcement learning. In Handbook of learning and approximate dynamic programming (pp. 359–380.) Wiley. https://doi.org/10.1002/9780470544785.ch14.
Rybski, P. E., Yoon, K., Stolarz, J., & Veloso, M. M. (2007). Interactive robot task training through dialog and demonstration. In 2007 2nd ACM/IEEE international conference on human–robot interaction (HRI) (pp. 49–56).
Sigaud, O., & Buffet, O. (2010). Markov decision processes in artificial intelligence. New York: Wiley.
MATH Google Scholar
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.
Article Google Scholar
Suay, H. B., & Chernova, S. (2011). Effect of human guidance and state space size on Interactive Reinforcement Learning. In 2011 RO-MAN (pp. 1–6).
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.
MATH Google Scholar
Tenorio-Gonzalez, A. C., Morales, E. F., & Villaseor-Pineda, L. (2010). Dynamic reward shaping: Training a robot by voice. In A. Kuri-Morales and G. R. Simari (Eds.), Advances in artificial intelligence IBERAMIA 2010: 12th Ibero-American conference on AI, Baha Blanca, Argentina, November 1–5, 2010. Proceedings (pp. 483–492.) Berlin: Springer. https://doi.org/10.1007/978-3-642-16952-6_49.
Thomaz, A. L., & Breazeal, C. (2006). Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. In Proceedings of the 21st national conference on artificial intelligence, AAAI’06. (Vol. 1, pp. 1000–1005). Boston, Massachusetts. AAAI Press.
Thomaz, A. L., & Breazeal, C. (2006). Transparency and socially guided machine learning. In the 5th international conference on developmental learning.
Thomaz, A. L., & Breazeal, C. (2007). Robot learning via socially guided exploration. In 2007 IEEE 6th international conference on development and learning (pp. 82–87).
Thomaz, A. L., Hoffman, G., & Breazeal, C. (2006). Reinforcement learning with human teachers: Understanding how people want to teach robots. In ROMAN 2006—The 15th IEEE international symposium on robot and human interactive communication (pp. 352–357).
Utgoff, P. E., & Clouse, J. A. (1991). Two kinds of training information for evaluation function learning. In Proceedings of the ninth annual conference on artificial intelligence (pp. 596–600). Morgan Kaufmann.
Vogel, A., & Jurafsky, D. (2010). Learning to follow navigational directions. In Proceedings of the 48th annual meeting of the association for computational linguistics, ACL ’10 (pp. 806–814). Stroudsburg, PA, USA: Association for Computational Linguistics.
Vollmer, A.-L., Wrede, B., Rohlfing, K. J., & Oudeyer, P.-Y. (2016). Pragmatic frames for teaching and learning in human–robot interaction: Review and challenges. Frontiers in Neurorobotics, 10, 10.
Article Google Scholar
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Neurosciences Cognitives Computationnelles (LNC2), INSERM U960, Paris, France
Anis Najar
Institute for Intelligent Systems and Robotics, CNRS UMR 7222, Sorbonne Université, Paris, France
Olivier Sigaud & Mohamed Chetouani

Authors

Anis Najar
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Sigaud
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Chetouani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anis Najar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the Romeo2 project.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1393 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Najar, A., Sigaud, O. & Chetouani, M. Interactively shaping robot behaviour with unlabeled human instructions. Auton Agent Multi-Agent Syst 34, 35 (2020). https://doi.org/10.1007/s10458-020-09459-6

Download citation

Published: 08 May 2020
DOI: https://doi.org/10.1007/s10458-020-09459-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interactively shaping robot behaviour with unlabeled human instructions

Abstract

Access this article

Similar content being viewed by others

Interactive Robot Learning: An Overview

Learning from Humans

Coaching Robots: Online Behavior Learning from Human Subjective Feedback

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 1393 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Interactively shaping robot behaviour with unlabeled human instructions

Abstract

Access this article

Similar content being viewed by others

Interactive Robot Learning: An Overview

Learning from Humans

Coaching Robots: Online Behavior Learning from Human Subjective Feedback

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 1393 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation