Skip to main content

Trajectory-Based Dynamic Programming

  • Chapter

Part of the book series: Cognitive Systems Monographs ((COSMOS,volume 18))

Abstract

We informally review our approach to using trajectory optimization to accelerate dynamic programming. Dynamic programming provides a way to design globally optimal control laws for nonlinear systems. However, the curse of dimensionality, the exponential dependence of memory and computation resources needed on the dimensionality of the state and control, limits the application of dynamic programming in practice. We explore trajectory-based dynamic programming, which combines many local optimizations to accelerate the global optimization of dynamic programming. We are able to solve problems with less resources than grid-based approaches, and to solve problems we couldn’t solve before using tabular or global function approximation approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bellman, R.: Dynamic Programming (1957); reprinted by Dover 2003

    Google Scholar 

  2. Bertsekas, D.P.: Dynamic Programming and Optimal Control. Athena Scientific (1995)

    Google Scholar 

  3. Larson, R.L.: State Increment Dynamic Programming. Elsevier, New York (1968)

    MATH  Google Scholar 

  4. Dyer, P., McReynolds, S.R.: The Computation and Theory of Optimal Control. Academic Press, New York (1970)

    MATH  Google Scholar 

  5. Jacobson, D.H., Mayne, D.Q.: Differential Dynamic Programming. Elsevier, New York (1970)

    MATH  Google Scholar 

  6. Atkeson, C.G.: Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 663–670. Morgan Kaufmann Publishers, Inc. (1994)

    Google Scholar 

  7. Atkeson, C.G., Morimoto, J.: Non-parametric representation of a policies and value functions: A trajectory-based approach. In: Advances in Neural Information Processing Systems, vol. 15. MIT Press (2003)

    Google Scholar 

  8. Abbott, M.B.: An Introduction to the Method of Characteristics. Thames & Hudson (1966)

    Google Scholar 

  9. Isaacs, R.: Differential Games. Dover (1965)

    Google Scholar 

  10. Lewin, J.: Differential Games. Spinger (1994)

    Google Scholar 

  11. Breitner, M.: Robust optimal on-board reentry guidance of a European space shuttle: Dynamic game approach and guidance synthesis with neural networks. In: Reithmeier, E. (ed.) Complex Dynamical Processes with Incomplete Information. Birkhauser, Basel (1999)

    Google Scholar 

  12. Munos, R.: Munos home, http://www.researchers.lille.inria.fr/~munos/ (2006)

  13. Atkeson, C.G., Stephens, B.: Random sampling of states in dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(4), 924–929 (2008)

    Article  Google Scholar 

  14. Atkeson, C.G.: Randomly sampling actions in dynamic programming. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL (2007)

    Google Scholar 

  15. Atkeson, C.G., Stephens, B.: Multiple balance strategies from one optimization criterion. In: IEEE-RAS International Conference on Humanoid Robots, Humanoids (2007)

    Google Scholar 

  16. Stephens, B.: Integral control of humanoid balance. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2007)

    Google Scholar 

  17. Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 369–376. The MIT Press, Cambridge (1995)

    Google Scholar 

  18. Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE-NN 12, 694–703 (2001)

    Article  Google Scholar 

  19. Murray, J.J., Cox, C., Lendaris, G.G., Saeks, R.: Adaptive dynamic programming. IEEE Transactions on Systems, Man. and Cybernetics, Part C: Applications and Reviews 32(2), 140–153 (2002)

    Article  Google Scholar 

  20. Grossman, R.L., Valsamis, D., Qin, X.: Persistent stores and hybrid systems. In: Proceedings of the 32nd Conference on Decision and Control, pp. 2298–2302 (1993)

    Google Scholar 

  21. Schierman, J.D., Ward, D.G., Hull, J.R., Gandhi, N., Oppenheimer, M.W., Doman, D.B.: Integrated adaptive guidance and control for re-entry vehicles with flight test results. Journal of Guidance, Control, and Dynamics 27(6), 975–988 (2004)

    Article  Google Scholar 

  22. Frazzoli, E., Dahleh, M.A., Feron, E.: Maneuver-based motion planning for nonlinear systems with symmetries. IEEE Transactions on Robotics 21(6), 1077–1091 (2005)

    Article  Google Scholar 

  23. Ramamoorthy, S., Kuipers, B.J.: Qualitative hybrid control of dynamic bipedal walking. In: Proceedings of the Robotics: Science and Systems Conference, pp. 89–96. MIT Press, Cambridge (2006)

    Google Scholar 

  24. Stolle, M., Tappeiner, H., Chestnutt, J., Atkeson, C.G.: Transfer of policies based on trajectory libraries. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2007)

    Google Scholar 

  25. Safonova, A., Hodgins, J.K.: Construction and optimal search of interpolated motion graphs. In: SIGGRAPH (2007)

    Google Scholar 

  26. Tedrake, R.: LQR-Trees: Feedback motion planning on sparse randomized trees. In: Proceedings of Robotics: Science and Systems (RSS), p. 8 (2009)

    Google Scholar 

  27. Reist, P., Tedrake, R.: Simulation-based LQR-trees with input and state constraints. In: IEEE International Conference on Robotics and Automation, ICRA (2010)

    Google Scholar 

  28. Milam, M., Mushambi, K., Murray, R.: NTG - a library for real-time trajectory generation (2002), http://www.cds.caltech.edu/murray/software/2002antg.html

  29. Werbos, P.: Personal communication (2007)

    Google Scholar 

  30. Todorov, E., Tassa, Y.: Iterative local dynamic programming. In: 2nd IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pp. 90–95 (2009)

    Google Scholar 

  31. Altamimi, A., Abu-Khalaf, M., Lewis, F.L.: Adaptive critic designs for discrete-time zero-sum games with application to H-infinity control. IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics 37(1), 240–247 (2007)

    Article  Google Scholar 

  32. Altamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica 43, 473–481 (2007)

    Article  MathSciNet  Google Scholar 

  33. Morimoto, J., Zeglin, G., Atkeson, C.G.: Minmax differential dynamic programming: Application to a biped walking robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2003)

    Google Scholar 

  34. Si, J., Barto, A.G., Powell, W.B., Wunsch II, D.: Handbook of Learning and Approximate Dynamic Programming. Wiley-IEEE Press (2004)

    Google Scholar 

  35. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  36. Lewis, F.L., Syrmos, V.L.: Optimal Control, 2nd edn. Wiley Interscience (1995)

    Google Scholar 

  37. Atkeson, C.G., Schaal, S.: Learning tasks from a single demonstration. In: Proceedings of the 1997 IEEE International Conference on Robotics and Automation (ICRA 1997), pp. 1706–1712 (1997)

    Google Scholar 

  38. Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: Proc. 14th International Conference on Machine Learning, pp. 12–20. Morgan Kaufmann (1997)

    Google Scholar 

  39. Atkeson, C.G.: Nonparametric model-based reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 10, pp. 1008–1014. MIT Press, Cambridge (1998)

    Google Scholar 

  40. Liu, C., Su, J.: Biped walking control using offline and online optimization. In: 30th Chinese Control Conference (2011)

    Google Scholar 

  41. Tassa, Y., Erez, T., Todorov, E.: Synthesis and stabilization of complex behaviors through online trajectory optimization. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christopher G. Atkeson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Atkeson, C.G., Liu, C. (2013). Trajectory-Based Dynamic Programming. In: Mombaur, K., Berns, K. (eds) Modeling, Simulation and Optimization of Bipedal Walking. Cognitive Systems Monographs, vol 18. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36368-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36368-9_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36367-2

  • Online ISBN: 978-3-642-36368-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics