Abstract
We informally review our approach to using trajectory optimization to accelerate dynamic programming. Dynamic programming provides a way to design globally optimal control laws for nonlinear systems. However, the curse of dimensionality, the exponential dependence of memory and computation resources needed on the dimensionality of the state and control, limits the application of dynamic programming in practice. We explore trajectory-based dynamic programming, which combines many local optimizations to accelerate the global optimization of dynamic programming. We are able to solve problems with less resources than grid-based approaches, and to solve problems we couldn’t solve before using tabular or global function approximation approaches.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bellman, R.: Dynamic Programming (1957); reprinted by Dover 2003
Bertsekas, D.P.: Dynamic Programming and Optimal Control. Athena Scientific (1995)
Larson, R.L.: State Increment Dynamic Programming. Elsevier, New York (1968)
Dyer, P., McReynolds, S.R.: The Computation and Theory of Optimal Control. Academic Press, New York (1970)
Jacobson, D.H., Mayne, D.Q.: Differential Dynamic Programming. Elsevier, New York (1970)
Atkeson, C.G.: Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 663–670. Morgan Kaufmann Publishers, Inc. (1994)
Atkeson, C.G., Morimoto, J.: Non-parametric representation of a policies and value functions: A trajectory-based approach. In: Advances in Neural Information Processing Systems, vol. 15. MIT Press (2003)
Abbott, M.B.: An Introduction to the Method of Characteristics. Thames & Hudson (1966)
Isaacs, R.: Differential Games. Dover (1965)
Lewin, J.: Differential Games. Spinger (1994)
Breitner, M.: Robust optimal on-board reentry guidance of a European space shuttle: Dynamic game approach and guidance synthesis with neural networks. In: Reithmeier, E. (ed.) Complex Dynamical Processes with Incomplete Information. Birkhauser, Basel (1999)
Munos, R.: Munos home, http://www.researchers.lille.inria.fr/~munos/ (2006)
Atkeson, C.G., Stephens, B.: Random sampling of states in dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(4), 924–929 (2008)
Atkeson, C.G.: Randomly sampling actions in dynamic programming. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL (2007)
Atkeson, C.G., Stephens, B.: Multiple balance strategies from one optimization criterion. In: IEEE-RAS International Conference on Humanoid Robots, Humanoids (2007)
Stephens, B.: Integral control of humanoid balance. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2007)
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 369–376. The MIT Press, Cambridge (1995)
Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE-NN 12, 694–703 (2001)
Murray, J.J., Cox, C., Lendaris, G.G., Saeks, R.: Adaptive dynamic programming. IEEE Transactions on Systems, Man. and Cybernetics, Part C: Applications and Reviews 32(2), 140–153 (2002)
Grossman, R.L., Valsamis, D., Qin, X.: Persistent stores and hybrid systems. In: Proceedings of the 32nd Conference on Decision and Control, pp. 2298–2302 (1993)
Schierman, J.D., Ward, D.G., Hull, J.R., Gandhi, N., Oppenheimer, M.W., Doman, D.B.: Integrated adaptive guidance and control for re-entry vehicles with flight test results. Journal of Guidance, Control, and Dynamics 27(6), 975–988 (2004)
Frazzoli, E., Dahleh, M.A., Feron, E.: Maneuver-based motion planning for nonlinear systems with symmetries. IEEE Transactions on Robotics 21(6), 1077–1091 (2005)
Ramamoorthy, S., Kuipers, B.J.: Qualitative hybrid control of dynamic bipedal walking. In: Proceedings of the Robotics: Science and Systems Conference, pp. 89–96. MIT Press, Cambridge (2006)
Stolle, M., Tappeiner, H., Chestnutt, J., Atkeson, C.G.: Transfer of policies based on trajectory libraries. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2007)
Safonova, A., Hodgins, J.K.: Construction and optimal search of interpolated motion graphs. In: SIGGRAPH (2007)
Tedrake, R.: LQR-Trees: Feedback motion planning on sparse randomized trees. In: Proceedings of Robotics: Science and Systems (RSS), p. 8 (2009)
Reist, P., Tedrake, R.: Simulation-based LQR-trees with input and state constraints. In: IEEE International Conference on Robotics and Automation, ICRA (2010)
Milam, M., Mushambi, K., Murray, R.: NTG - a library for real-time trajectory generation (2002), http://www.cds.caltech.edu/murray/software/2002antg.html
Werbos, P.: Personal communication (2007)
Todorov, E., Tassa, Y.: Iterative local dynamic programming. In: 2nd IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pp. 90–95 (2009)
Altamimi, A., Abu-Khalaf, M., Lewis, F.L.: Adaptive critic designs for discrete-time zero-sum games with application to H-infinity control. IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics 37(1), 240–247 (2007)
Altamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica 43, 473–481 (2007)
Morimoto, J., Zeglin, G., Atkeson, C.G.: Minmax differential dynamic programming: Application to a biped walking robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2003)
Si, J., Barto, A.G., Powell, W.B., Wunsch II, D.: Handbook of Learning and Approximate Dynamic Programming. Wiley-IEEE Press (2004)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Lewis, F.L., Syrmos, V.L.: Optimal Control, 2nd edn. Wiley Interscience (1995)
Atkeson, C.G., Schaal, S.: Learning tasks from a single demonstration. In: Proceedings of the 1997 IEEE International Conference on Robotics and Automation (ICRA 1997), pp. 1706–1712 (1997)
Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: Proc. 14th International Conference on Machine Learning, pp. 12–20. Morgan Kaufmann (1997)
Atkeson, C.G.: Nonparametric model-based reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 10, pp. 1008–1014. MIT Press, Cambridge (1998)
Liu, C., Su, J.: Biped walking control using offline and online optimization. In: 30th Chinese Control Conference (2011)
Tassa, Y., Erez, T., Todorov, E.: Synthesis and stabilization of complex behaviors through online trajectory optimization. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Atkeson, C.G., Liu, C. (2013). Trajectory-Based Dynamic Programming. In: Mombaur, K., Berns, K. (eds) Modeling, Simulation and Optimization of Bipedal Walking. Cognitive Systems Monographs, vol 18. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36368-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-36368-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36367-2
Online ISBN: 978-3-642-36368-9
eBook Packages: EngineeringEngineering (R0)