Trajectory-Based Dynamic Programming

Atkeson, Christopher G.; Liu, Chenggang

doi:10.1007/978-3-642-36368-9_1

Christopher G. Atkeson³ &
Chenggang Liu⁴

Part of the book series: Cognitive Systems Monographs ((COSMOS,volume 18))

2487 Accesses
4 Citations
3 Altmetric

Abstract

We informally review our approach to using trajectory optimization to accelerate dynamic programming. Dynamic programming provides a way to design globally optimal control laws for nonlinear systems. However, the curse of dimensionality, the exponential dependence of memory and computation resources needed on the dimensionality of the state and control, limits the application of dynamic programming in practice. We explore trajectory-based dynamic programming, which combines many local optimizations to accelerate the global optimization of dynamic programming. We are able to solve problems with less resources than grid-based approaches, and to solve problems we couldn’t solve before using tabular or global function approximation approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bellman, R.: Dynamic Programming (1957); reprinted by Dover 2003
Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control. Athena Scientific (1995)
Google Scholar
Larson, R.L.: State Increment Dynamic Programming. Elsevier, New York (1968)
MATH Google Scholar
Dyer, P., McReynolds, S.R.: The Computation and Theory of Optimal Control. Academic Press, New York (1970)
MATH Google Scholar
Jacobson, D.H., Mayne, D.Q.: Differential Dynamic Programming. Elsevier, New York (1970)
MATH Google Scholar
Atkeson, C.G.: Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 663–670. Morgan Kaufmann Publishers, Inc. (1994)
Google Scholar
Atkeson, C.G., Morimoto, J.: Non-parametric representation of a policies and value functions: A trajectory-based approach. In: Advances in Neural Information Processing Systems, vol. 15. MIT Press (2003)
Google Scholar
Abbott, M.B.: An Introduction to the Method of Characteristics. Thames & Hudson (1966)
Google Scholar
Isaacs, R.: Differential Games. Dover (1965)
Google Scholar
Lewin, J.: Differential Games. Spinger (1994)
Google Scholar
Breitner, M.: Robust optimal on-board reentry guidance of a European space shuttle: Dynamic game approach and guidance synthesis with neural networks. In: Reithmeier, E. (ed.) Complex Dynamical Processes with Incomplete Information. Birkhauser, Basel (1999)
Google Scholar
Munos, R.: Munos home, http://www.researchers.lille.inria.fr/~munos/ (2006)
Atkeson, C.G., Stephens, B.: Random sampling of states in dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(4), 924–929 (2008)
Article Google Scholar
Atkeson, C.G.: Randomly sampling actions in dynamic programming. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL (2007)
Google Scholar
Atkeson, C.G., Stephens, B.: Multiple balance strategies from one optimization criterion. In: IEEE-RAS International Conference on Humanoid Robots, Humanoids (2007)
Google Scholar
Stephens, B.: Integral control of humanoid balance. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2007)
Google Scholar
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 369–376. The MIT Press, Cambridge (1995)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE-NN 12, 694–703 (2001)
Article Google Scholar
Murray, J.J., Cox, C., Lendaris, G.G., Saeks, R.: Adaptive dynamic programming. IEEE Transactions on Systems, Man. and Cybernetics, Part C: Applications and Reviews 32(2), 140–153 (2002)
Article Google Scholar
Grossman, R.L., Valsamis, D., Qin, X.: Persistent stores and hybrid systems. In: Proceedings of the 32nd Conference on Decision and Control, pp. 2298–2302 (1993)
Google Scholar
Schierman, J.D., Ward, D.G., Hull, J.R., Gandhi, N., Oppenheimer, M.W., Doman, D.B.: Integrated adaptive guidance and control for re-entry vehicles with flight test results. Journal of Guidance, Control, and Dynamics 27(6), 975–988 (2004)
Article Google Scholar
Frazzoli, E., Dahleh, M.A., Feron, E.: Maneuver-based motion planning for nonlinear systems with symmetries. IEEE Transactions on Robotics 21(6), 1077–1091 (2005)
Article Google Scholar
Ramamoorthy, S., Kuipers, B.J.: Qualitative hybrid control of dynamic bipedal walking. In: Proceedings of the Robotics: Science and Systems Conference, pp. 89–96. MIT Press, Cambridge (2006)
Google Scholar
Stolle, M., Tappeiner, H., Chestnutt, J., Atkeson, C.G.: Transfer of policies based on trajectory libraries. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2007)
Google Scholar
Safonova, A., Hodgins, J.K.: Construction and optimal search of interpolated motion graphs. In: SIGGRAPH (2007)
Google Scholar
Tedrake, R.: LQR-Trees: Feedback motion planning on sparse randomized trees. In: Proceedings of Robotics: Science and Systems (RSS), p. 8 (2009)
Google Scholar
Reist, P., Tedrake, R.: Simulation-based LQR-trees with input and state constraints. In: IEEE International Conference on Robotics and Automation, ICRA (2010)
Google Scholar
Milam, M., Mushambi, K., Murray, R.: NTG - a library for real-time trajectory generation (2002), http://www.cds.caltech.edu/murray/software/2002antg.html
Werbos, P.: Personal communication (2007)
Google Scholar
Todorov, E., Tassa, Y.: Iterative local dynamic programming. In: 2nd IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pp. 90–95 (2009)
Google Scholar
Altamimi, A., Abu-Khalaf, M., Lewis, F.L.: Adaptive critic designs for discrete-time zero-sum games with application to H-infinity control. IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics 37(1), 240–247 (2007)
Article Google Scholar
Altamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica 43, 473–481 (2007)
Article MathSciNet Google Scholar
Morimoto, J., Zeglin, G., Atkeson, C.G.: Minmax differential dynamic programming: Application to a biped walking robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2003)
Google Scholar
Si, J., Barto, A.G., Powell, W.B., Wunsch II, D.: Handbook of Learning and Approximate Dynamic Programming. Wiley-IEEE Press (2004)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Lewis, F.L., Syrmos, V.L.: Optimal Control, 2nd edn. Wiley Interscience (1995)
Google Scholar
Atkeson, C.G., Schaal, S.: Learning tasks from a single demonstration. In: Proceedings of the 1997 IEEE International Conference on Robotics and Automation (ICRA 1997), pp. 1706–1712 (1997)
Google Scholar
Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: Proc. 14th International Conference on Machine Learning, pp. 12–20. Morgan Kaufmann (1997)
Google Scholar
Atkeson, C.G.: Nonparametric model-based reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 10, pp. 1008–1014. MIT Press, Cambridge (1998)
Google Scholar
Liu, C., Su, J.: Biped walking control using offline and online optimization. In: 30th Chinese Control Conference (2011)
Google Scholar
Tassa, Y., Erez, T., Todorov, E.: Synthesis and stabilization of complex behaviors through online trajectory optimization. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Christopher G. Atkeson
Department of Automation, Shanghai Jiao Tong University, Shanghai, China
Chenggang Liu

Authors

Christopher G. Atkeson
View author publications
You can also search for this author in PubMed Google Scholar
Chenggang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher G. Atkeson .

Editor information

Editors and Affiliations

Interdisziplinäres Zentrum, Universität Heidelberg, 7 ave du, Colonel Roche, Heidelberg, 31077, Germany
Katja Mombaur
FB Informatik, AG Robotersysteme, TU Kaiserslautern, Kaiserslautern, Germany
Karsten Berns

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Atkeson, C.G., Liu, C. (2013). Trajectory-Based Dynamic Programming. In: Mombaur, K., Berns, K. (eds) Modeling, Simulation and Optimization of Bipedal Walking. Cognitive Systems Monographs, vol 18. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36368-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-36368-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36367-2
Online ISBN: 978-3-642-36368-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics