A Note on Liao’s Recurrent Neural-Network Learning for Discrete Multi-stage Optimal Control Problems

Abstract

The roots of neural-network backpropagation (BP) may be traced back to classical optimal-control gradient procedures developed in early 1960s. Hence, BP can directly apply to a general discrete N-stage optimal control problem that consists of N stage costs plus a terminal state cost. In this journal (Liao in Neural Process Lett 10:195–200, 1999), given such a multi-stage optimal control problem, Liao has turned it into a problem involving a terminal state cost only (via classical transformation), and then claimed that BP on the transformed problem leads to new recurrent neural network learning. The purpose of this paper is three-fold: First, the classical terminal-cost transformation yields no particular benefit for BP. Second, two simulation examples (with and without time lag) demonstrated by Liao can be regarded naturally as deep feed-forward neural-network learning rather than as recurrent neural-network learning from the perspective of classical optimal-control gradient methods. Third, BP can readily deal with a general history-dependent optimal control problem (e.g., involving time-lagged state and control variables) owing to Dreyfus’s 1973 extension of BP. Throughout the paper, we highlight systematic BP derivations by employing the recurrence relation of nominal cost-to-go action-value functions based on the stage-wise concept of dynamic programming.

This is a preview of subscription content, log in to check access.

Fig. 1

References

  1. 1.

    Bellman RE, Dreyfus SE (1962) Applied dynamic programming. Princeton University Press, Princeton, pp 348–353

    Google Scholar 

  2. 2.

    Bliss GA (1946) Lectures on the calculus of variations, Chapter VII. The University of Chicago Press, Chicago

    Google Scholar 

  3. 3.

    Bryson AE (1961) A gradient method for optimizing multi-stage allocation processes. In: Proceedings of Harvard University symposium on digital computers and their applications, pp 125–135

  4. 4.

    Demmel JW (1997) Applied numerical linear algebra. SIAM, Philadelphia

    Google Scholar 

  5. 5.

    Dreyfus SE (1962) The numerical solution of variational problems. J Math Anal Appl 5(1):30–45

    MathSciNet  Article  Google Scholar 

  6. 6.

    Dreyfus SE (1966) The numerical solution of non-linear optimal control problems. In: Greenspan D (ed) Numerical solutions of nonlinear differential equations: proceedings of an advanced symposium. Wiley, London, pp 97–113

  7. 7.

    Dreyfus SE (1973) The computational solution of optimal control problems with time lag. IEEE Trans Automat Control 18(4):383–385

    MathSciNet  Article  Google Scholar 

  8. 8.

    Dreyfus S, Law A (1977) The art and theory of dynamic programming. Academic Press, London, pp 103–105

    Google Scholar 

  9. 9.

    Dreyfus SE (1990) Artificial neural networks, back propagation, and the Kelley–Bryson gradient procedure. J Guid Control Dyn 13(5):926–928

    MathSciNet  Article  Google Scholar 

  10. 10.

    Kelley HJ (1960) Gradient theory of optimal flight paths. Am Rocket Soc J 30(10):941–954

    MATH  Google Scholar 

  11. 11.

    Liao LZ (1999) A recurrent neural network for \(N\)-stage optimal control problems. Neural Process Lett 10:195–200

    Article  Google Scholar 

  12. 12.

    Liao LZ, Shoemaker CA (1999) Convergence in unconstrained discrete-time differential dynamic programming. IEEE Trans Autom Control 36:692–706

    MathSciNet  Article  Google Scholar 

  13. 13.

    Mizutani E, Dreyfus S, Nishio K (2000) On derivation of MLP backpropagation from the Kelley–Bryson optimal-control gradient formula and its application. In: Proceedings of the IEEE international conference on neural networks, Como, Italy (vol 2), pp 167–172 (see also http://queue.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/hidteach.m)

  14. 14.

    Mizutani E, Dreyfus SE (2006) On derivation of stage-wise second-order backpropagation by invariant imbedding for multi-stage neural-network learning. In: Proceedings of the the IEEE World congress on computational intelligence, Vancouver, CANADA, pp 4762–4769

  15. 15.

    Mizutani E (2015) On Pantoja’s problem allegedly showing a distinction between differential dynamic programming and stage-wise Newton methods. Int J Control 8(9):1702–1722

    Article  Google Scholar 

  16. 16.

    Mizutani E, Dreyfus SE (2017) Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains. Ann Oper Res 258(1):107–131

    MathSciNet  Article  Google Scholar 

  17. 17.

    Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, London

    Google Scholar 

  18. 18.

    Parisini T, Zoppoli R (1991) Neural networks for the solution of \(N\)-stage optimal control problems. In: Kohonen T, Makisara K, Simula O, Kangas J (eds) Artif Neural Netw. Elsevier Science Publishers B.V., North-Holland, pp 333–338

    Google Scholar 

  19. 19.

    Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing, vol 1. MIT Press, Cambridge, pp 318–362

    Google Scholar 

  20. 20.

    Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  21. 21.

    Sabouri KJ, Effati S, Pakdaman M (2017) A neural network approach for solving a class of fractional optimal control problems. Neural Process Lett 45:59–74

    Article  Google Scholar 

  22. 22.

    Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge

    Google Scholar 

  23. 23.

    Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Netw 1(4):339–356

    Article  Google Scholar 

  24. 24.

    Wilamowski BM, Yu H (2010) Neural network learning without backpropagation. IEEE Trans Neural Netw 21(11):1793–1803

    Article  Google Scholar 

Download references

Acknowledgements

Eiji Mizutani would like to thank Stuart Dreyfus (UC Berkeley) for numerous invaluable discussions on neural network learning and dynamic programming for more than two decades. The work is partially supported by the Ministry of Science and Technology, Taiwan (Grant: 106-2221-E-011-146-MY2).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Eiji Mizutani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mizutani, E. A Note on Liao’s Recurrent Neural-Network Learning for Discrete Multi-stage Optimal Control Problems. Neural Process Lett 50, 3009–3018 (2019). https://doi.org/10.1007/s11063-019-09986-8

Download citation

Keywords

  • Backpropagation
  • Optimal control gradient methods
  • Deep feed-forward neural-network learning