Summary
Control system theory has been based on certain well understood and accepted techniques such as transfer function-based methods, adaptive control, robust control, nonlinear systems theory and state-space methods. Besides these classical techniques, in recent decades, many successful results have been obtained by incorporating artificial neural networks in classical control structures. Due to their universal approximation property, neural network structures are the perfect candidates for designing controllers for complex nonlinear systems. These successful results have caused a number of control engineers to focus their interest on the results and algorithms of the machine learning and computational intelligence community and, at the same time, to find new inspiration in the biological neural structures of living organisms in their most evolved and complex form: the human brain. In this chapter we discuss two algorithms that were developed, based on a biologically inspired structure, with the purpose of learning the optimal state feedback controller for a linear system, while at the same time performing continuous-time online control for the system at hand. Moreover, since the algorithms are related to the reinforcement learning techniques in which an agent tries to maximize the total amount of reward received while interacting with an unknown environment, the optimal controller will be obtained while only making use of the input-to-state system dynamics. Mathematically speaking, the solution of the algebraic Riccati equation underlying the optimal control problem will be obtained without making use of any knowledge of the system internal dynamics. The two algorithms are built on iteration between the policy evaluation and policy update steps until updating the control policy no longer improves the system performance. Both algorithms can be characterized as direct adaptive optimal control types since the optimal control solution is determined without using an explicit, a priori obtained, model of the system internal dynamics. The effectiveness of the algorithms is shown and their performances compared while finding the optimal state feedback dynamics of an F-16 autopilot.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A. Al-Tamimi, M. Abu-Khalaf, and F. L. Lewis, Model-free \(Q\)-learning designs for discrete-time zero-sum games with application to \(H\)-infinity control, Automatica, Vol 43, No. 3, pp. 473–482, 2007.
A. Al-Tamimi and F. Lewis, Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof, Proc. of IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, pp. 38–43, April 2007.
A. Al-Chalabi, M. R. Turner, and R. S. Delamond, The Brain–-A Beginner’s Guide, Oneworld Publications, Oxford, England, 2006.
L. Baird, Reinforcement learning in continuous time: Advantage updating, Proceedings of the International Conference on Neural Networks, Orlando, FL, June 1994.
R. Bellman, Dynamic Programming, Dover Publications, Mincola, NY, 2003.
D. P. Bertsekas, Dynamic programming and suboptimal control: A survey from ADP to MPC, Proceedings of CDC’05, 2005.
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, and MA, 1996.
S. J. Bradtke, B. E. Ydestie, and A. G. Barto, Adaptive linear quadratic control using policy iteration, Proceedings of the American Control Conference, pp. 3475–3476, Baltimore, MD, June 1994.
J.W. Brewer, Kronecker products and matrix calculus in system theory, IEEE Trans. on Circuits and Systems, Vol. 25, No. 9, 1978.
G. Buzsaki, Rhythms of the Brain, Oxford University Press, London, 2006.
K. Doya, What are the computations of the cerebellum, the basal ganglia and the cerebral cortex, Neural Networks, Vol. 12, pp. 961–974, 1999.
K. Doya, Complementary roles of basal ganglia and cerebellum in learning and motor control, Current Opinion in Neurobiology, Vol. 10, pp. 732–739, 2000.
K. Doya, Reinforcement learning in continuous time and space, Neural Computation, vol. 12, pp. 219–245, MIT Press, Cambridge, MA, 2000.
K.Doya, H. Kimura, and M. Kawato, Neural mechanisms of learning and control, IEEE Control Systems Magazine, pp. 42–54, Aug. 2001.
S. Ferrari, and R. Stengel, An adaptive critic global controller, Proceedings of the American Control Conference, pp. 2665–2670, Anchorage, AK, 2002.
G. Hewer, An iterative technique for the computation of the steady state gains for the discrete optimal regulator, IEEE Trans. on Automatic Control, Vol. 16, pp. 382–384, 1971.
R. A. Howard, Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA, 1960.
D. Kleinman, On an Iterative technique for Riccati equation computations, IEEE Trans. on Automatic Control, Vol. 13, pp. 114–115, 1968.
T. Landelius, Reinforcement learning and distributed local model synthesis, PhD Dissertation, Linkoping University, Sweden, 1997.
D. S. Levine and W. R. Elsberry, eds., Optimality in Biological and Artificial Networks?, Lawrence Erlbaum Assoc., Mahwah, NJ, 1997.
D. S. Levine, V. R. Brown, and V. T. Shirey, eds., Oscillations in Neural Systems, Lawrence Erlbaum Assoc., Mahwah, NJ, 2000.
F. L. Lewis and V. L. Syrmos, Optimal Control, John Wiley, New York, 1995.
X. Liu and S. N. Balakrishnan, Convergence analysis of adaptive critic based optimal control, Proceedings of the American Control Conference, pp. 1929–1933, Chicago, IL, 2000.
J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, Adaptive dynamic programming, IEEE Trans. on Systems, Man and Cybernetics, Vol. 32, No. 2, pp. 140–153, 2002.
J. Si, A. Barto, W. Powel, and D. Wunch, Handbook of Learning and Approximate Dynamic Programming, John Wiley, Hoboken, NJ, 2004.
B. L. Stevens and F. L. Lewis, Aircraft Control and Simulation, Wiley, Hoboken, NJ, 2nd Edition, 2003.
R. S. Sutton and A. G. Barto, Reinforcement Learning–An Introduction, MIT Press, Cambridge, MA, 1998.
R. Sutton, Learning to predict by the method of temporal differences, Machine Learning, Vol. 3, pp. 9–44, 1988.
D. Vrabie, M. Abu-Khalaf, F. Lewis, and Y. Wang, Continuous-time ADP for linear systems with unknown dynamics, Proc. of IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, pp. 247–253, Hawaii, April 2007.
D. Vrabie, O. Pastravanu, and F. Lewis, Policy iteration for continuous-time systems with unknown internal dynamics, Proc. of Mediterranean Conference on Control and Automation, MED’07, Athens, June 2007.
C.J. C. H. Watkins, Learning from delayed rewards. PhD Thesis, University of Cambridge, England, 1989.
P. Werbos, Neural networks for control and system identification, IEEE Proc. CDC89, IEEE, 1989.
P. J. Werbos, Approximate dynamic programming for real-time control and neural modeling, Handbook of Intelligent Control, D. A. White and D. A. Sofge eds., Van Nostrand, New York, 1992.
D.A. White and D.A. Sofge, eds., Handbook of Intelligent Control, Van Nostrand Reinhold, New York, 1992.
B. Widrow and E. Walach, Adaptive Inverse Control, Prentice-Hall, Englewood Cliffs, NJ, 1995.
Acknowledgments
The authors wish to acknowledge Dr. Murad Abu-Khalaf and Prof. Octavian Pastravanu for their ideas, criticism and support during the research that led to the results presented here. This work was supported by the National Science Foundation ECS-0501451 and the Army Research Office W91NF-05-1-0314.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Birkhäuser Boston, a part of Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Vrabie, D., Lewis, F. (2008). Direct Adaptive Optimal Control: Biologically Inspired Feedback Control. In: Won, CH., Schrader, C., Michel, A. (eds) Advances in Statistical Control, Algebraic Systems Theory, and Dynamic Systems Characteristics. Systems & Control: Foundations & Applications. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-0-8176-4795-7_10
Download citation
DOI: https://doi.org/10.1007/978-0-8176-4795-7_10
Published:
Publisher Name: Birkhäuser, Boston, MA
Print ISBN: 978-0-8176-4794-0
Online ISBN: 978-0-8176-4795-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)