Creating Brain-Like Intelligence pp 103-138

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5436) | Cite as

Learning Actions through Imitation and Exploration: Towards Humanoid Robots That Learn from Humans

  • David B. Grimes
  • Rajesh P. N. Rao

Abstract

A prerequisite for achieving brain-like intelligence is the ability to rapidly learn new behaviors and actions. A fundamental mechanism for rapid learning in humans is imitation: children routinely learn new skills (e.g., opening a door or tying a shoe lace) by imitating their parents; adults continue to learn by imitating skilled instructors (e.g., in tennis). In this chapter, we propose a probabilistic framework for imitation learning in robots that is inspired by how humans learn from imitation and exploration. Rather than relying on complex (and often brittle) physics-based models, the robot learns a dynamic Bayesian network that captures its dynamics directly in terms of sensor measurements and actions during an imitation-guided exploration phase. After learning, actions are selected based on probabilistic inference in the learned Bayesian network. We present results demonstrating that a 25-degree-of-freedom humanoid robot can learn dynamically stable, full-body imitative motions simply by observing a human demonstrator.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Turing, A.: Computing machinery and intelligence. Mind 59, 433–460 (1950)CrossRefGoogle Scholar
  2. 2.
    McCarthy, J., Minsky, M., Rochester, N., Shannon, C.: A proposal for the dartmouth summer research project on artificial intelligence (1955)Google Scholar
  3. 3.
    Meltzoff, A.N.: Elements of a developmental theory of imitation. In: The imitative mind: Development, evolution, and brain bases, pp. 19–41. Cambridge University Press, Cambridge (2002)CrossRefGoogle Scholar
  4. 4.
    Doya, K., Ishii, S., Pouget, A., Rao, R.P.N. (eds.): Bayesian Brain: Probabilistic Approaches to Neural Coding. MIT Press, Cambridge (2007)Google Scholar
  5. 5.
    Rao, R.P.N., Olshausen, B.A., Lewicki, M.S. (eds.): Probabilistic Models of the Brain: Perception and Neural Function, Perception and Neural Function. MIT Press, Cambridge (2002)Google Scholar
  6. 6.
    Rao, R.P.N., Shon, A.P., Meltzoff, A.N.: A Bayesian model of imitation in infants and robots. In: Imitation and Social Learning in Robots, Humans, and Animals. Cambridge University Press, Cambridge (2005)Google Scholar
  7. 7.
    Kuniyoshi, Y., Inaba, M., Inoue, H.: Learning by watching: Extracting reusable task knowledge from visual observation of human performance. Transaction on Robotics and Automation 10(6), 799–822 (1994)CrossRefGoogle Scholar
  8. 8.
    Takahashi, Y., Hikita, K., Asada, M.: Incremental purposive behavior acquisition based on self-interpretation of instructions by coach. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), pp. 686–693. IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  9. 9.
    Schaal, S., Ijspeert, A., Billard, A.: Computational approaches to motor learning by imitation. The Neuroscience of Social Interaction 1(1431), 199–218 (2004)Google Scholar
  10. 10.
    Inamura, T., Toshima, I., Nakamura, Y.: Acquiring motion elements for bi-directional computation of motion recognition and generation. In: Experimental Robotics VIII, pp. 372–381. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Ijspeert, A.J., Nakanishi, J., Schaal, S.: Trajectory formation for imitation with nonlinear dynamical systems. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2001), pp. 752–757. IEEE Press, Los Alamitos (2001)Google Scholar
  12. 12.
    Billard, A., Mataric, M.: Learning human arm movements by imitation: Evaluation of a biologically-inspired connectionist architecture. Robotics and Autonomous Systems 37(941), 145–160 (2001)CrossRefGoogle Scholar
  13. 13.
    Calinon, S., Guenter, F., Billard, A.: On learning, representing and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, Part B. Special issue on robot learning by observation, demonstration and imitation 37(2), 286–298 (2007)CrossRefGoogle Scholar
  14. 14.
    Demiris, J., Hayes, G.: A robot controller using learning by imitation. In: Proceedings of the 2nd International Symposium on Intelligent Robotic Systems (IROS 1994). IEEE Press, Los Alamitos (1994)Google Scholar
  15. 15.
    Schaal, S.: Learning from demonstration. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems 9 (NIPS 1996), vol. 9, p. 1040. MIT Press, Cambridge (1997)Google Scholar
  16. 16.
    Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 12–20 (1997)Google Scholar
  17. 17.
    Watkins, C.: Learning from Delayed Rewards. PhD thesis, Cambridge University (1989)Google Scholar
  18. 18.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  19. 19.
    Price, B.: Accelerating Reinforcement Learning with Imitation. PhD thesis, University of British Columbia (2003)Google Scholar
  20. 20.
    Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 663–670 (2000)Google Scholar
  21. 21.
    Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: Proceedings of the Twenty-first International Conference on Machine Learning (ICML 2005) (2005)Google Scholar
  22. 22.
    Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cognitive Science 3(6), 233–242 (1999)CrossRefGoogle Scholar
  23. 23.
    Calinon, S., Guenter, F., Billard, A.: Goal-directed imitation in a humanoid robot. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2005). IEEE Press, Los Alamitos (2005)Google Scholar
  24. 24.
    Webots: Commercial Mobile Robot Simulation Software, http://www.cyberbotics.com
  25. 25.
    Featherstone, R.: Robot Dynamics Algorithms. Springer, Heidelberg (1987)Google Scholar
  26. 26.
    Luh, J.Y.S., Walker, M.W., Paul, R.P.C.: On-line computational scheme for mechanical manipulators. Dynamic Systems Measurement and Control 102 (1980)Google Scholar
  27. 27.
    Chang, K.S., Khatib, O.: Efficient algorithm for extended operational space inertia matrix. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 1999). IEEE Press, Los Alamitos (1999)Google Scholar
  28. 28.
    Marhefka, D., Orin, D.: Simulation of contact using a nonlinear damping model. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 1996). IEEE Press, Los Alamitos (1996)Google Scholar
  29. 29.
    Lotstedt, P.: Numerical simulation of time-dependent contact friction problems in rigid body mechanics. SIAM Journal on Scientific Statistical Computing 5(2), 370–393 (1984)CrossRefGoogle Scholar
  30. 30.
    Stewart, D., Trinkle, J.: An implicit time-stepping scheme for rigid body dynamics with coulomb friction. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2000). IEEE Press, Los Alamitos (2000)Google Scholar
  31. 31.
    Kuffner, J.J., Nishiwaki, K., Kagami, S., Inaba, M., Inoue, H.: Motion planning for humanoid robots under obstacle and dynamic balance constraints. In: Proceedings of the IEEE International Conf. Robotics and Automation (ICRA 2001), pp. 692–698. IEEE Press, Los Alamitos (2001)Google Scholar
  32. 32.
    Frank, A.A., McGhee, R.B.: Some considerations realation to the design of autopilots for legged vehicles. Terramechanics 6, 23–25 (1969)CrossRefGoogle Scholar
  33. 33.
    Vukobratovic, M., Borovac, B.: Zero-moment point - thirty five years of its life. International Journal of Humanoid Robotics 1(1), 157–173 (2004)CrossRefGoogle Scholar
  34. 34.
    Park, J., Rhee, Y.: ZMP trajectory generation for reduced trunk motions of biped robots. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 1998). IEEE Press, Los Alamitos (1998)Google Scholar
  35. 35.
    Huang, Q., Kajita, S., Koyachi, N., Kaneko, K., Yokoi, K., Arai, H., Komoriya, K., Tanie, K.: A high stability, smooth walking pattern for a biped robot. In: Proceedings of the IEEE International Conf. Robotics and Automation (ICRA 1999). IEEE Press, Los Alamitos (1999)Google Scholar
  36. 36.
    Kagami, S., Kanehiro, F., Tamiya, Y., Inaba, M., Inoue, H.: Autobalancer: an online dynamic balance compensation scheme for humanoid robots. In: Proceedings of the International Workshop on Algorithmic Foundation of Robotics, pp. 329–340 (2000)Google Scholar
  37. 37.
    Park, J., Kim, K.: Biped robot walking using gravity-compensated inverted pendulum mode and computed torque control. In: Proceedings of the IEEE International Conf. Robotics and Automation (ICRA 1998). IEEE Press, Los Alamitos (1998)Google Scholar
  38. 38.
    Yamaguchi, Takanishi, A., Kato, I.: Development of a biped walking robot compensating for three-axis moment by trunk motion. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 1993), pp. 561–566. IEEE Press, Los Alamitos (1993)CrossRefGoogle Scholar
  39. 39.
    Yamane, K., Nakamura, Y.: Dynamics filter - concept and implementation of on-line motion generator for human figures. IEEE Transactions on Robotics and Automation 19(3), 421–432 (2003)CrossRefGoogle Scholar
  40. 40.
    Ko, J., Klein, D., Fox, D., Hahnel, D.: GP-UKF: Unscented Kalman filters with gaussian process prediction and observation models. In: Proceedings of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS 2007). IEEE Press, Los Alamitos (2007)Google Scholar
  41. 41.
    Shon, A.P., Verma, D., Rao, R.P.N.: Active imitation learning. In: Proceedings of the American Association for Artificial Intelligence (AAAI 2007) (2007)Google Scholar
  42. 42.
    Barbic, J., Safonova, A., Pan, J.Y., Faloutsos, C., Hodgins, J.K., Pollard, N.S.: Segmenting motion capture data into distinct behaviors. In: Proceedings of Graphics Interface (GI 2004), University of Waterloo, Waterloo, Ontario, Canada, Canadian Human-Computer Communications Society, pp. 185–194 (2004)Google Scholar
  43. 43.
    Muller, M., Roder, T.: Motion templates for automatic classification and retrieval of motion capture data. In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation (SCA 2006), Aire-la-Ville, Switzerland, Eurographics Association, pp. 137–146 (2006)Google Scholar
  44. 44.
    Seth, A., Pandy, M.G.: A nonlinear tracking method of computing net joint torques for human movement. In: Proceedings of the 26th Annual International Conference of the Engineering in Medicine and Biology Society (2004)Google Scholar
  45. 45.
    Sung, H.G.: Gaussian Mixture Regression and Classification. PhD thesis, Rice University (2004)Google Scholar
  46. 46.
    Welling, M., Kurihara, K.: Bayesian K-means as a Maximization-Expectation algorithm. In: Proceedings of the SIAM conference on Data Mining (2005)Google Scholar
  47. 47.
    Scott, D., Szewczyk, W.: From kernels to mixtures. Technometrics 43(3), 323–335 (2001)CrossRefGoogle Scholar
  48. 48.
    Kreutz, M., Reimetz, A.M., Sendhoff, B., Weihs, C., von Seelen, W.: Structure optimization of density estimation models applied to regression problems with dynamic noise. In: Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, pp. 237–242. Morgan Kaufmann, San Francisco (1999)Google Scholar
  49. 49.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. MIT Press, Cambridge (2001)Google Scholar
  50. 50.
    Park, J.D., Darwiche, A.: Complexity results and approximation strategies for map explanations. Journal of Artififical Intelligence Research (JAIR) 21, 101–133 (2004)Google Scholar
  51. 51.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)Google Scholar
  52. 52.
    Weiss, Y.: Correctness of local probability propagation in graphical models with loops. Neural Computation 12(1), 1–41 (2000)CrossRefPubMedGoogle Scholar
  53. 53.
    Sudderth, E.B., Ihler, A.T., Freeman, W.T., Willsky, A.S.: Nonparametric belief propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2003), pp. 605–612 (2003)Google Scholar
  54. 54.
    Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47(2), 498–519 (2001)CrossRefGoogle Scholar
  55. 55.
    Carreira-Perpinan, M.A.: Mode-finding for mixtures of gaussian distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 22(11), 1318–1323 (2000)CrossRefGoogle Scholar
  56. 56.
    Hwang, J., Lay, S., Lippman, A.: Nonparametric multivariate density estimation: a comparative study. IEEE Transactions on Signal Processing 42(10), 2795–2810 (1994)CrossRefGoogle Scholar
  57. 57.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, Boca Raton (1986)CrossRefGoogle Scholar
  58. 58.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)Google Scholar
  59. 59.
    Vicon: Vicon MX Motion Capture System, http://www.vicon.com
  60. 60.
    Lawrence, N.D.: Gaussian process latent variable models for visualization of high dimensional data. In: Advances in Neural Information Processing Systems 15 (NIPS 2002). MIT Press, Cambridge (2003)Google Scholar
  61. 61.
    Grochow, K., Martin, S.L., Hertzmann, A., Popovic, Z.: Style-based inverse kinematics. In: Proceedings of the ACM Transactions on Graphics, SIGGRAPH 2004 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • David B. Grimes
    • 1
  • Rajesh P. N. Rao
    • 1
  1. 1.University of WashingtonSeattleUSA

Personalised recommendations