Skip to main content

Simultaneously learning actions and goals from demonstration

Abstract

Our research aim is to develop interactions and algorithms for learning from naïve human teachers through demonstration. We introduce a novel approach to leverage the goal-oriented nature of human teachers by learning an action model and a goal model simultaneously from the same set of demonstrations. We use robot motion data to learn an action model for executing the skill. We use a generic set of perceptual features to learn a goal model and use it to monitor the executed action model. We evaluate our approach with data from 8 naïve teachers demonstrating two skills to the robot. We show that the goal models in the perceptual feature space are consistent across users and correctly recognize demonstrations in cross-validation tests. We additionally observe that a subset of users were not able to teach a successful action model whereas all of them were able to teach a mostly successful goal model. When the learned action models are executed on the robot, the success was on average 66.25 %. Whereas the goal models were on average 90 % correct at deciding on success/failure of the executed action, which we call monitoring.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    We only assume that a segmentation for objects is available.

  2. 2.

    http://pointclouds.org/.

  3. 3.

    Integrating more robust object segmentation into the perception pipeline is left for future work. Since our experiments in this article are performed on a batch of demonstrations offline, robust online tracking is not our current focus. This will become important in our future work when we want learning to be an incremental online process, and we believe solutions exist for obtaining a robust segmentation for this purpose.

  4. 4.

    This object based representation is fairly common in robotics.

  5. 5.

    These details are given for completeness but another model selection procedure can be used as well.

  6. 6.

    Although tractable approximate methods exist for DBNs.

  7. 7.

    They are mirrored since the participant is standing across the table in one and standing next to the robot in the other.

  8. 8.

    Participant 1 has only provided between 2 or 3 keyframes per demonstration whereas other participants provided 4–6. As a result, participant 1s goal model was not able to recognize the demonstrations of other users.

  9. 9.

    The cross-validation tests how similar the demonstrations are but not how the action itself is modelled.

  10. 10.

    In an interactive scenario, the teacher might realize this and fix it with their follow-up demonstrations.

  11. 11.

    Fast enough to have a fluid interaction with the user.

  12. 12.

    Expert in the sense of demonstrations, not necessarily the underlying algorithms.

  13. 13.

    We can represent cyclic behaviors with the current action model but currently have no means to decide on when to stop the cycle, see Sect. 4.6.

References

  1. Abbeel, P., & Ng, A. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference on Machine Learning (ICML) (pp. 1–8).

  2. Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research, 29(13), 1608–1639.

    Article  Google Scholar 

  3. Akgun, B., & Thomaz, A. (2013). Learning constraints with keyframes. In Robotics: Science and Systems: Workshop on Robot Manipulation.

  4. Akgun, B., Cakmak, M., Jiang, K., & Thomaz, L. A. (2012a). Keyframe-based learning from demonstration. International Journal of Social Robotics, 4(4), 343–355.doi: 10.1007s12369-012-0160-0

  5. Akgun, B., Cakmak, M., Wook Yoo, J., & Thomaz, LA. (2012b). Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective. In ACM/IEEE International Conference on Human-Robot Interaction (HRI) (pp. 391–398).

  6. Akgun, B., Subramanian, K., & Thomaz, A. (2012c). Novel interaction strategies for learning from teleoperation. In AAAI Fall Symposia 2012, Robots Learning Interactively from Human Teachers.

  7. Argall, B., Chernova, S., Veloso, M. M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.

    Article  Google Scholar 

  8. Atkeson, CG., & Schaal, S. (1997). Robot learning from demonstration. In Proceedings of 14th International Conference on Machine Learning, Morgan Kaufmann (pp. 12–20).

  9. Baum, L., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41, 164–171.

    Article  MathSciNet  MATH  Google Scholar 

  10. Bitzer, S., Howard, M., & Vijayakumar, S. (2010). Using dimensionality reduction to exploit constraints in reinforcement learning. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 3219–3225).

  11. Cakmak, M. (2012). Guided teaching interactions with robots: Embodied queries and teaching heuristics. PhD thesis, Georgia Institute of Technology.

  12. Calinon, S., Guenter, F., & Billard, A. (2007). On learning, representing and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, Part B Special Issue on Robot Learning by Observation, Demonstration and Imitation, 37(2), 286–298.

    Article  Google Scholar 

  13. Chao, C., Cakmak, M., & Thomaz, A. (2011). Towards grounding concepts for transfer in goal learning from demonstration. In Proceedings of the Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EpiRob), IEEE, vol. 2 (pp. 1–6).

  14. Chernova, S., & Thomaz, A. L. (2014). Robot learning from human teachers. San Rafael, CA: Morgan & Claypool Publishers.

    Google Scholar 

  15. Csibra, G. (2003). Teleological and referential understanding of action in infancy. Philosophical Transactions of the Royal Society of London, 358, 447–458.

    Article  Google Scholar 

  16. Dantam, N., Essa, I., & Stilman, M. (2012). Linguistic transfer of human assembly tasks to robots. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

  17. Ekvall, S., & Kragic, D. (2008). Robot learning from demonstration: A task-level planning approach. International Journal of Advanced Robotic Systems, 5(3), 223–234.

    Google Scholar 

  18. Hovland, G., Sikka P., & McCarragher, B. (1996). Skill acquisition from human demonstration using a hidden Markov model. In 1996 IEEE International Conference on Robotics and Automation, vol 3, (pp. 2706–2711). IEEE

  19. Hsiao K., & Lozano-Perez, T. (2006). Imitation learning of whole-body grasps. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 5657–5662). IEEE

  20. Jäkel, R., Schmidt-Rohr, S. R., Rühl, S. W., Kasper, A., Xue, Z., & Dillmann, R. (2012). Learning of planning models for dexterous manipulation based on human demonstrations. International Journal of Social Robotics, Special Issue on Robot Learning from Demonstration, 4, 437–448.

    Google Scholar 

  21. Jenkins, O., Mataric, M., Weber, S., et al. (2000). Primitive-based movement classification for humanoid imitation. In Proceedings of 1st IEEE-RAS International Conference on Humanoid Robotics (Humanoids-2000).

  22. Khansari-Zadeh, S. M., & Billard, A. (2011). Learning stable non-linear dynamical systems with Gaussian mixture models. IEEE Transaction on Robotics, 27, 943–957.

    Article  Google Scholar 

  23. Kormushev, P., Calinon, S., & Caldwell, D.G. (2010). Robot motor skill coordination with em-based reinforcement learning. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

  24. Kulić, D., Ott, C., Lee, D., Ishikawa, J., & Nakamura, Y. (2012). Incremental learning of full body motion primitives and their sequencing through human motion observation. The International Journal of Robotics Research, 31(3), 330–345.

    Article  Google Scholar 

  25. Levas, A., & Selfridge, M. (1984). A user-friendly high-level robot teaching system. In Proceedings of the IEEE International Conference on Robotics, Atlanta, Georgia (pp. 413–416).

  26. Meltzoff, A. N., & Decety, J. (2003). What imitation tells us about social cognition: A rapprochement between developmental psychology and cognitive neuroscience. Philosophical Transactions of the Royal Society of London, 358, 491–500. doi:10.1098/rstb.2002.1261.

    Article  Google Scholar 

  27. Miyamoto, H., Schaal, S., Gandolfo, F., Gomi, H., Koike, Y., Osu, R., et al. (1996). A kendama learning robot based on bi-directional theory. Neural Networks, 9, 1281–1302.

    Article  Google Scholar 

  28. Mülling, K., Kober, J., Kroemer, O., & Peters, J. (2013). Learning to select and generalize striking movements in robot table tennis. The International Journal of Robotics Research, 32(3), 263–279.

    Article  Google Scholar 

  29. Nicolescu, M.N., & Matarić, M.J. (2003). Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proceedings of the 2nd International Conference on AAMAS. Melbourne, Australia.

  30. Niekum, S., Chitta, S., Marthi, B., Osentoski, S., & Barto, A.G. (2013). Incremental semantically grounded learning from demonstration. In Robotics: Science and Systems, 9.

  31. Niekum, S., Osentoski, S., Konidaris, G.D., Chitta, S., Marthi, B., & Barto, A. G. (2015). Learning grounded finite-state representations from unstructured demonstrations. The International Journal of Robotics Research, 34(2), 131–157.

  32. Parent, R. (2002). Computer animation: Algorithms and techniques. Morgan Kaufmann series in computer graphics and geometric modeling. San Francisco: Morgan Kaufmann.

    Google Scholar 

  33. Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In IEEE International Conference on Robotics and Automation.

  34. Pastor, P., Kalakrishnan, M., Chitta, S., Theodorou, E., & Schaal, S. (2011). Skill learning and task outcome prediction for manipulation. In 2011 IEEE International Conference on Robotics and Automation (ICRA).

  35. Ratliff, N., Ziebart, B., Peterson, K., Bagnell, J.A., Hebert, M., Dey, A.K., & Srinivasa, S. (2009). Inverse optimal heuristic control for imitation learning. In Proceedings of AISTATS (pp. 424–431).

  36. Rusu, RB., Bradski, G., Thibaux, R., & Hsu, J. (2010). Fast 3D recognition and pose using the viewpoint feature histogram. In Proceedings of the 23rd IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Taipei, Taiwan.

  37. Suay, H. B., Toris, R., & Chernova, S. (2012). A practical comparison of three robot learning from demonstration algorithms. Journal of Social Robotics, Special Issue on Robot Learning from Demonstration, 4, 319–330.

    Google Scholar 

  38. Thomaz, A. L., & Breazeal, C. (2008a). Experiments in socially guided exploration: Lessons learned in building robots that learn with and without human teachers. Connection Science, Special Issue on Social Learning in Embodied Agents, 20, 91–110.

    Google Scholar 

  39. Thomaz, A. L., & Breazeal, C. (2008b). Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence Journal, 172, 716–737.

    Article  Google Scholar 

  40. Trevor, A.J.B., Gedikli, S., Rusu, R.B., & Christensen, H.I. (2013). Efficient organized point cloud segmentation with connected components. In 3rd Workshop on Semantic Perception, Mapping and Exploration (SPME). Karlsruhe, Germany.

Download references

Acknowledgments

This work has been supported by US National Science Foundation CAREER award #0953181, and by US Office of Naval Research grant #N000141410120.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Baris Akgun.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Akgun, B., Thomaz, A. Simultaneously learning actions and goals from demonstration. Auton Robot 40, 211–227 (2016). https://doi.org/10.1007/s10514-015-9448-x

Download citation

Keywords

  • Learning from demonstration
  • Goal learning
  • Human–robot interaction