Adaptive Optimal Feedback Control with Learned Internal Dynamics Models

  • Djordje Mitrovic
  • Stefan Klanke
  • Sethu Vijayakumar
Part of the Studies in Computational Intelligence book series (SCI, volume 264)


Optimal Feedback Control (OFC) has been proposed as an attractive movement generation strategy in goal reaching tasks for anthropomorphic manipulator systems. Recent developments, such as the Iterative Linear Quadratic Gaussian (ILQG) algorithm, have focused on the case of non-linear, but still analytically available, dynamics. For realistic control systems, however, the dynamics may often be unknown, difficult to estimate, or subject to frequent systematic changes. In this chapter, we combine the ILQG framework with learning the forward dynamics for simulated arms, which exhibit large redundancies, both, in kinematics and in the actuation. We demonstrate how our approach can compensate for complex dynamic perturbations in an online fashion. The specific adaptive framework introduced lends itself to a computationally more efficient implementation of the ILQG optimisation without sacrificing control accuracy – allowing the method to scale to large DoF systems.


Joint Angle Humanoid Robot Joint Torque Optimal Feedback Control Forward Dynamic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abbeel, P., Quigley, M., Ng, A.Y.: Using inaccurate models in reinforcement learning. In: Proc. Int. Conf. on Machine Learning (ICML), pp. 1–8 (2006)Google Scholar
  2. 2.
    Atkeson, C.G.: Randomly sampling actions in dynamic programming. In: Proc. Int. Symp. on Approximate Dynamic Programming and Reinforcement Learning, pp. 185–192 (2007)Google Scholar
  3. 3.
    Atkeson, C.G., Moore, A., Schaal, S.: Locally weighted learning for control. AI Review 11, 75–113 (1997)Google Scholar
  4. 4.
    Atkeson, C.G., Schaal, S.: Learning tasks from a single demonstration. In: Proc. Int. Conf. on Robotics and Automation (ICRA), Albuquerque, New Mexico, pp. 1706–1712 (1997)Google Scholar
  5. 5.
    Bertsekas, D.P.: Dynamic programming and optimal control. Athena Scientific, Belmont (1995)zbMATHGoogle Scholar
  6. 6.
    Conradt, J., Tevatia, G., Vijayakumar, S., Schaal, S.: On-line learning for humanoid robot systems. In: Proc. Int. Conf. on Machine Learning (ICML), pp. 191–198 (2000)Google Scholar
  7. 7.
    Corke, P.I.: A robotics toolbox for MATLAB. IEEE Robotics and Automation Magazine 3(1), 24–32 (1996)CrossRefGoogle Scholar
  8. 8.
    D’Souza, A., Vijayakumar, S., Schaal, S.: Learning inverse kinematics. In: Proc. Int. Conf. on Intelligence in Robotics and Autonomous Systems (IROS), Hawaii, pp. 298–303 (2001)Google Scholar
  9. 9.
    Dyer, P., McReynolds, S.: The Computational Theory of Optimal Control. Academic Press, New York (1970)Google Scholar
  10. 10.
    Flash, T., Hogan, N.: The coordination of arm movements: an experimentally confirmed mathematical model. Journal of Neuroscience 5, 1688–1703 (1985)Google Scholar
  11. 11.
    Grebenstein, M., van der Smagt, P.: Antagonism for a highly anthropomorphic hand-arm system. Advanced Robotics 22(1), 39–55 (2008)CrossRefGoogle Scholar
  12. 12.
    Jacobson, D.H., Mayne, D.Q.: Differential Dynamic Programming. Elsevier, New York (1970)zbMATHGoogle Scholar
  13. 13.
    Katayama, M., Kawato, M.: Virtual trajectory and stiffness ellipse during multijoint arm movement predicted by neural inverse model. Biological Cybernetics 69, 353–362 (1993)zbMATHGoogle Scholar
  14. 14.
    Klanke, S., Vijayakumar, S., Schaal, S.: A library for locally weighted projection regression. Journal of Machine Learning Research 9, 623–626 (2008)MathSciNetGoogle Scholar
  15. 15.
    Li, W.: Optimal Control for Biological Movement Systems. PhD dissertation, University of California, San Diego (2006)Google Scholar
  16. 16.
    Li, W., Todorov, E.: Iterative linear-quadratic regulator design for nonlinear biological movement systems. In: Proc. 1st Int. Conf. Informatics in Control, Automation and Robotics (2004)Google Scholar
  17. 17.
    Li, W., Todorov, E.: Iterative linearization methods for approximately optimal control and estimation of non-linear stochastic system. International Journal of Control 80(9), 14391–14453 (2007)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Nguyen-Tuong, D., Peters, J., Seeger, M., Schoelkopf, B.: Computed torque control with nonparametric regressions techniques. In: American Control Conference (2008)Google Scholar
  19. 19.
    Özkaya, N., Nordin, M.: Fundamentals of biomechanics: equilibrium, motion, and deformation. Van Nostrand Reinhold, New York (1991)Google Scholar
  20. 20.
    Schaal, S.: Learning Robot Control. In: The handbook of brain theory and neural networks, pp. 983–987. MIT Press, Cambridge (2002)Google Scholar
  21. 21.
    Shadmehr, R., Mussa-Ivaldi, F.A.: Adaptive representation of dynamics during learning of a motor task. The Journal of Neurosciene 14(5), 3208–3224 (1994)Google Scholar
  22. 22.
    Shadmehr, R., Wise, S.P.: The Computational Neurobiology of Reaching and Ponting. MIT Press, Cambridge (2005)Google Scholar
  23. 23.
    Stengel, R.F.: Optimal control and estimation. Dover Publications, New York (1994)zbMATHGoogle Scholar
  24. 24.
    Thrun, S.: Monte carlo POMDPs. In: Advances in Neural Information Processing Systems (NIPS), pp. 1064–1070 (2000)Google Scholar
  25. 25.
    Todorov, E.: Optimality principles in sensorimotor control. Nature Neuroscience 7(9), 907–915 (2004)CrossRefGoogle Scholar
  26. 26.
    Todorov, E., Jordan, M.: Optimal feedback control as a theory of motor coordination. Nature Neuroscience 5, 1226–1235 (2002)CrossRefGoogle Scholar
  27. 27.
    Todorov, E., Jordan, M.: A minimal intervention principle for coordinated movement. In: Advances in Neural Information Processing Systems (NIPS), pp. 27–34. MIT Press, Cambridge (2003)Google Scholar
  28. 28.
    Todorov, E., Li, W.: A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems. In: Proc. of the American Control Conference (2005)Google Scholar
  29. 29.
    Uno, Y., Kawato, M., Suzuki, R.: Formation and control of optimal trajectories in human multijoint arm movements: minimum torque-change model. Biological Cybernetics 61, 89–101 (1989)CrossRefGoogle Scholar
  30. 30.
    Vijayakumar, S., D’Souza, A., Schaal, S.: Incremental online learning in high dimensions. Neural Computation 17, 2602–2634 (2005)CrossRefMathSciNetGoogle Scholar
  31. 31.
    Vijayakumar, S., D’Souza, A., Shibata, T., Conradt, J., Schaal, S.: Statistical learning for humanoid robots. Autonomous Robots 12(1), 55–69 (2002)zbMATHCrossRefGoogle Scholar
  32. 32.
    Wolf, S., Hirzinger, G.: A new variable stiffness design: Matching requirements of the next robot generation. In: Proc. Int. Conf. on Robotics and Automation (ICRA), pp. 1741–1746 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Djordje Mitrovic
    • 1
  • Stefan Klanke
    • 1
  • Sethu Vijayakumar
    • 1
  1. 1.School of InformaticsUniversity of EdinburghEdinburghUnited Kingdom

Personalised recommendations