Direct Adaptive Optimal Control: Biologically Inspired Feedback Control

Vrabie, Draguna; Lewis, Frank

doi:10.1007/978-0-8176-4795-7_10

Draguna Vrabie⁴ &
Frank Lewis⁴

Part of the book series: Systems & Control: Foundations & Applications ((SCFA))

942 Accesses
1 Citations

Summary

Control system theory has been based on certain well understood and accepted techniques such as transfer function-based methods, adaptive control, robust control, nonlinear systems theory and state-space methods. Besides these classical techniques, in recent decades, many successful results have been obtained by incorporating artificial neural networks in classical control structures. Due to their universal approximation property, neural network structures are the perfect candidates for designing controllers for complex nonlinear systems. These successful results have caused a number of control engineers to focus their interest on the results and algorithms of the machine learning and computational intelligence community and, at the same time, to find new inspiration in the biological neural structures of living organisms in their most evolved and complex form: the human brain. In this chapter we discuss two algorithms that were developed, based on a biologically inspired structure, with the purpose of learning the optimal state feedback controller for a linear system, while at the same time performing continuous-time online control for the system at hand. Moreover, since the algorithms are related to the reinforcement learning techniques in which an agent tries to maximize the total amount of reward received while interacting with an unknown environment, the optimal controller will be obtained while only making use of the input-to-state system dynamics. Mathematically speaking, the solution of the algebraic Riccati equation underlying the optimal control problem will be obtained without making use of any knowledge of the system internal dynamics. The two algorithms are built on iteration between the policy evaluation and policy update steps until updating the control policy no longer improves the system performance. Both algorithms can be characterized as direct adaptive optimal control types since the optimal control solution is determined without using an explicit, a priori obtained, model of the system internal dynamics. The effectiveness of the algorithms is shown and their performances compared while finding the optimal state feedback dynamics of an F-16 autopilot.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

A. Al-Tamimi, M. Abu-Khalaf, and F. L. Lewis, Model-free \(Q\)-learning designs for discrete-time zero-sum games with application to \(H\)-infinity control, Automatica, Vol 43, No. 3, pp. 473–482, 2007.
Article MathSciNet MATH Google Scholar
A. Al-Tamimi and F. Lewis, Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof, Proc. of IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, pp. 38–43, April 2007.
Google Scholar
A. Al-Chalabi, M. R. Turner, and R. S. Delamond, The Brain–-A Beginner’s Guide, Oneworld Publications, Oxford, England, 2006.
Google Scholar
L. Baird, Reinforcement learning in continuous time: Advantage updating, Proceedings of the International Conference on Neural Networks, Orlando, FL, June 1994.
Google Scholar
R. Bellman, Dynamic Programming, Dover Publications, Mincola, NY, 2003.
MATH Google Scholar
D. P. Bertsekas, Dynamic programming and suboptimal control: A survey from ADP to MPC, Proceedings of CDC’05, 2005.
Google Scholar
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, and MA, 1996.
Google Scholar
S. J. Bradtke, B. E. Ydestie, and A. G. Barto, Adaptive linear quadratic control using policy iteration, Proceedings of the American Control Conference, pp. 3475–3476, Baltimore, MD, June 1994.
Google Scholar
J.W. Brewer, Kronecker products and matrix calculus in system theory, IEEE Trans. on Circuits and Systems, Vol. 25, No. 9, 1978.
Google Scholar
G. Buzsaki, Rhythms of the Brain, Oxford University Press, London, 2006.
Book MATH Google Scholar
K. Doya, What are the computations of the cerebellum, the basal ganglia and the cerebral cortex, Neural Networks, Vol. 12, pp. 961–974, 1999.
Article Google Scholar
K. Doya, Complementary roles of basal ganglia and cerebellum in learning and motor control, Current Opinion in Neurobiology, Vol. 10, pp. 732–739, 2000.
Article Google Scholar
K. Doya, Reinforcement learning in continuous time and space, Neural Computation, vol. 12, pp. 219–245, MIT Press, Cambridge, MA, 2000.
Google Scholar
K.Doya, H. Kimura, and M. Kawato, Neural mechanisms of learning and control, IEEE Control Systems Magazine, pp. 42–54, Aug. 2001.
Google Scholar
S. Ferrari, and R. Stengel, An adaptive critic global controller, Proceedings of the American Control Conference, pp. 2665–2670, Anchorage, AK, 2002.
Google Scholar
G. Hewer, An iterative technique for the computation of the steady state gains for the discrete optimal regulator, IEEE Trans. on Automatic Control, Vol. 16, pp. 382–384, 1971.
Article Google Scholar
R. A. Howard, Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA, 1960.
MATH Google Scholar
D. Kleinman, On an Iterative technique for Riccati equation computations, IEEE Trans. on Automatic Control, Vol. 13, pp. 114–115, 1968.
Article Google Scholar
T. Landelius, Reinforcement learning and distributed local model synthesis, PhD Dissertation, Linkoping University, Sweden, 1997.
Google Scholar
D. S. Levine and W. R. Elsberry, eds., Optimality in Biological and Artificial Networks?, Lawrence Erlbaum Assoc., Mahwah, NJ, 1997.
Google Scholar
D. S. Levine, V. R. Brown, and V. T. Shirey, eds., Oscillations in Neural Systems, Lawrence Erlbaum Assoc., Mahwah, NJ, 2000.
Google Scholar
F. L. Lewis and V. L. Syrmos, Optimal Control, John Wiley, New York, 1995.
Google Scholar
X. Liu and S. N. Balakrishnan, Convergence analysis of adaptive critic based optimal control, Proceedings of the American Control Conference, pp. 1929–1933, Chicago, IL, 2000.
Google Scholar
J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, Adaptive dynamic programming, IEEE Trans. on Systems, Man and Cybernetics, Vol. 32, No. 2, pp. 140–153, 2002.
Article Google Scholar
J. Si, A. Barto, W. Powel, and D. Wunch, Handbook of Learning and Approximate Dynamic Programming, John Wiley, Hoboken, NJ, 2004.
Book Google Scholar
B. L. Stevens and F. L. Lewis, Aircraft Control and Simulation, Wiley, Hoboken, NJ, 2nd Edition, 2003.
Google Scholar
R. S. Sutton and A. G. Barto, Reinforcement Learning–An Introduction, MIT Press, Cambridge, MA, 1998.
Google Scholar
R. Sutton, Learning to predict by the method of temporal differences, Machine Learning, Vol. 3, pp. 9–44, 1988.
Google Scholar
D. Vrabie, M. Abu-Khalaf, F. Lewis, and Y. Wang, Continuous-time ADP for linear systems with unknown dynamics, Proc. of IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, pp. 247–253, Hawaii, April 2007.
Google Scholar
D. Vrabie, O. Pastravanu, and F. Lewis, Policy iteration for continuous-time systems with unknown internal dynamics, Proc. of Mediterranean Conference on Control and Automation, MED’07, Athens, June 2007.
Google Scholar
C.J. C. H. Watkins, Learning from delayed rewards. PhD Thesis, University of Cambridge, England, 1989.
Google Scholar
P. Werbos, Neural networks for control and system identification, IEEE Proc. CDC89, IEEE, 1989.
Google Scholar
P. J. Werbos, Approximate dynamic programming for real-time control and neural modeling, Handbook of Intelligent Control, D. A. White and D. A. Sofge eds., Van Nostrand, New York, 1992.
Google Scholar
D.A. White and D.A. Sofge, eds., Handbook of Intelligent Control, Van Nostrand Reinhold, New York, 1992.
Google Scholar
B. Widrow and E. Walach, Adaptive Inverse Control, Prentice-Hall, Englewood Cliffs, NJ, 1995.
Google Scholar

Download references

Acknowledgments

The authors wish to acknowledge Dr. Murad Abu-Khalaf and Prof. Octavian Pastravanu for their ideas, criticism and support during the research that led to the results presented here. This work was supported by the National Science Foundation ECS-0501451 and the Army Research Office W91NF-05-1-0314.

Author information

Authors and Affiliations

Automation and Robotics Research Institute, The University of Texas at Arlington, 7300 Jack Newell Blvd. S, Ft. Worth, Texas 76118, USA
Draguna Vrabie & Frank Lewis

Authors

Draguna Vrabie
View author publications
You can also search for this author in PubMed Google Scholar
Frank Lewis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Draguna Vrabie .

Editor information

Editors and Affiliations

Dept. Electrical &, Computer Engineering, Temple University , North 12th Street 1947, Philadelphia, 19122, U.S.A.
Chang-Hee Won
College of Engineering, Boise State University, University Drive 1910, Boise, 83725-2100, U.S.A.
Cheryl B. Schrader
Department of Electrical Engineering, University of Notre Dame, Fitzpatrick Hall 275, Notre Dame, 46556-5637, U.S.A.
Anthony N. Michel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vrabie, D., Lewis, F. (2008). Direct Adaptive Optimal Control: Biologically Inspired Feedback Control. In: Won, CH., Schrader, C., Michel, A. (eds) Advances in Statistical Control, Algebraic Systems Theory, and Dynamic Systems Characteristics. Systems & Control: Foundations & Applications. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-0-8176-4795-7_10

Download citation

DOI: https://doi.org/10.1007/978-0-8176-4795-7_10
Published: 22 August 2008
Publisher Name: Birkhäuser, Boston, MA
Print ISBN: 978-0-8176-4794-0
Online ISBN: 978-0-8176-4795-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics