The roots of neural-network backpropagation (BP) may be traced back to classical optimal-control gradient procedures developed in early 1960s. Hence, BP can directly apply to a general discrete N-stage optimal control problem that consists of N stage costs plus a terminal state cost. In this journal (Liao in Neural Process Lett 10:195–200, 1999), given such a multi-stage optimal control problem, Liao has turned it into a problem involving a terminal state cost only (via classical transformation), and then claimed that BP on the transformed problem leads to new recurrent neural network learning. The purpose of this paper is three-fold: First, the classical terminal-cost transformation yields no particular benefit for BP. Second, two simulation examples (with and without time lag) demonstrated by Liao can be regarded naturally as deep feed-forward neural-network learning rather than as recurrent neural-network learning from the perspective of classical optimal-control gradient methods. Third, BP can readily deal with a general history-dependent optimal control problem (e.g., involving time-lagged state and control variables) owing to Dreyfus’s 1973 extension of BP. Throughout the paper, we highlight systematic BP derivations by employing the recurrence relation of nominal cost-to-go action-value functions based on the stage-wise concept of dynamic programming.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Bellman RE, Dreyfus SE (1962) Applied dynamic programming. Princeton University Press, Princeton, pp 348–353
Bliss GA (1946) Lectures on the calculus of variations, Chapter VII. The University of Chicago Press, Chicago
Bryson AE (1961) A gradient method for optimizing multi-stage allocation processes. In: Proceedings of Harvard University symposium on digital computers and their applications, pp 125–135
Demmel JW (1997) Applied numerical linear algebra. SIAM, Philadelphia
Dreyfus SE (1962) The numerical solution of variational problems. J Math Anal Appl 5(1):30–45
Dreyfus SE (1966) The numerical solution of non-linear optimal control problems. In: Greenspan D (ed) Numerical solutions of nonlinear differential equations: proceedings of an advanced symposium. Wiley, London, pp 97–113
Dreyfus SE (1973) The computational solution of optimal control problems with time lag. IEEE Trans Automat Control 18(4):383–385
Dreyfus S, Law A (1977) The art and theory of dynamic programming. Academic Press, London, pp 103–105
Dreyfus SE (1990) Artificial neural networks, back propagation, and the Kelley–Bryson gradient procedure. J Guid Control Dyn 13(5):926–928
Kelley HJ (1960) Gradient theory of optimal flight paths. Am Rocket Soc J 30(10):941–954
Liao LZ (1999) A recurrent neural network for \(N\)-stage optimal control problems. Neural Process Lett 10:195–200
Liao LZ, Shoemaker CA (1999) Convergence in unconstrained discrete-time differential dynamic programming. IEEE Trans Autom Control 36:692–706
Mizutani E, Dreyfus S, Nishio K (2000) On derivation of MLP backpropagation from the Kelley–Bryson optimal-control gradient formula and its application. In: Proceedings of the IEEE international conference on neural networks, Como, Italy (vol 2), pp 167–172 (see also http://queue.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/hidteach.m)
Mizutani E, Dreyfus SE (2006) On derivation of stage-wise second-order backpropagation by invariant imbedding for multi-stage neural-network learning. In: Proceedings of the the IEEE World congress on computational intelligence, Vancouver, CANADA, pp 4762–4769
Mizutani E (2015) On Pantoja’s problem allegedly showing a distinction between differential dynamic programming and stage-wise Newton methods. Int J Control 8(9):1702–1722
Mizutani E, Dreyfus SE (2017) Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains. Ann Oper Res 258(1):107–131
Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, London
Parisini T, Zoppoli R (1991) Neural networks for the solution of \(N\)-stage optimal control problems. In: Kohonen T, Makisara K, Simula O, Kangas J (eds) Artif Neural Netw. Elsevier Science Publishers B.V., North-Holland, pp 333–338
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing, vol 1. MIT Press, Cambridge, pp 318–362
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Sabouri KJ, Effati S, Pakdaman M (2017) A neural network approach for solving a class of fractional optimal control problems. Neural Process Lett 45:59–74
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge
Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Netw 1(4):339–356
Wilamowski BM, Yu H (2010) Neural network learning without backpropagation. IEEE Trans Neural Netw 21(11):1793–1803
Eiji Mizutani would like to thank Stuart Dreyfus (UC Berkeley) for numerous invaluable discussions on neural network learning and dynamic programming for more than two decades. The work is partially supported by the Ministry of Science and Technology, Taiwan (Grant: 106-2221-E-011-146-MY2).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Mizutani, E. A Note on Liao’s Recurrent Neural-Network Learning for Discrete Multi-stage Optimal Control Problems. Neural Process Lett 50, 3009–3018 (2019). https://doi.org/10.1007/s11063-019-09986-8
- Optimal control gradient methods
- Deep feed-forward neural-network learning