Skip to main content

Part of the book series: Systems & Control: Foundations & Applications ((SCFA))

Summary

Control system theory has been based on certain well understood and accepted techniques such as transfer function-based methods, adaptive control, robust control, nonlinear systems theory and state-space methods. Besides these classical techniques, in recent decades, many successful results have been obtained by incorporating artificial neural networks in classical control structures. Due to their universal approximation property, neural network structures are the perfect candidates for designing controllers for complex nonlinear systems. These successful results have caused a number of control engineers to focus their interest on the results and algorithms of the machine learning and computational intelligence community and, at the same time, to find new inspiration in the biological neural structures of living organisms in their most evolved and complex form: the human brain. In this chapter we discuss two algorithms that were developed, based on a biologically inspired structure, with the purpose of learning the optimal state feedback controller for a linear system, while at the same time performing continuous-time online control for the system at hand. Moreover, since the algorithms are related to the reinforcement learning techniques in which an agent tries to maximize the total amount of reward received while interacting with an unknown environment, the optimal controller will be obtained while only making use of the input-to-state system dynamics. Mathematically speaking, the solution of the algebraic Riccati equation underlying the optimal control problem will be obtained without making use of any knowledge of the system internal dynamics. The two algorithms are built on iteration between the policy evaluation and policy update steps until updating the control policy no longer improves the system performance. Both algorithms can be characterized as direct adaptive optimal control types since the optimal control solution is determined without using an explicit, a priori obtained, model of the system internal dynamics. The effectiveness of the algorithms is shown and their performances compared while finding the optimal state feedback dynamics of an F-16 autopilot.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A. Al-Tamimi, M. Abu-Khalaf, and F. L. Lewis, Model-free \(Q\)-learning designs for discrete-time zero-sum games with application to \(H\)-infinity control, Automatica, Vol 43, No. 3, pp. 473–482, 2007.

    Article  MathSciNet  MATH  Google Scholar 

  2. A. Al-Tamimi and F. Lewis, Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof, Proc. of IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, pp. 38–43, April 2007.

    Google Scholar 

  3. A. Al-Chalabi, M. R. Turner, and R. S. Delamond, The Brain–-A Beginner’s Guide, Oneworld Publications, Oxford, England, 2006.

    Google Scholar 

  4. L. Baird, Reinforcement learning in continuous time: Advantage updating, Proceedings of the International Conference on Neural Networks, Orlando, FL, June 1994.

    Google Scholar 

  5. R. Bellman, Dynamic Programming, Dover Publications, Mincola, NY, 2003.

    MATH  Google Scholar 

  6. D. P. Bertsekas, Dynamic programming and suboptimal control: A survey from ADP to MPC, Proceedings of CDC’05, 2005.

    Google Scholar 

  7. D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, and MA, 1996.

    Google Scholar 

  8. S. J. Bradtke, B. E. Ydestie, and A. G. Barto, Adaptive linear quadratic control using policy iteration, Proceedings of the American Control Conference, pp. 3475–3476, Baltimore, MD, June 1994.

    Google Scholar 

  9. J.W. Brewer, Kronecker products and matrix calculus in system theory, IEEE Trans. on Circuits and Systems, Vol. 25, No. 9, 1978.

    Google Scholar 

  10. G. Buzsaki, Rhythms of the Brain, Oxford University Press, London, 2006.

    Book  MATH  Google Scholar 

  11. K. Doya, What are the computations of the cerebellum, the basal ganglia and the cerebral cortex, Neural Networks, Vol. 12, pp. 961–974, 1999.

    Article  Google Scholar 

  12. K. Doya, Complementary roles of basal ganglia and cerebellum in learning and motor control, Current Opinion in Neurobiology, Vol. 10, pp. 732–739, 2000.

    Article  Google Scholar 

  13. K. Doya, Reinforcement learning in continuous time and space, Neural Computation, vol. 12, pp. 219–245, MIT Press, Cambridge, MA, 2000.

    Google Scholar 

  14. K.Doya, H. Kimura, and M. Kawato, Neural mechanisms of learning and control, IEEE Control Systems Magazine, pp. 42–54, Aug. 2001.

    Google Scholar 

  15. S. Ferrari, and R. Stengel, An adaptive critic global controller, Proceedings of the American Control Conference, pp. 2665–2670, Anchorage, AK, 2002.

    Google Scholar 

  16. G. Hewer, An iterative technique for the computation of the steady state gains for the discrete optimal regulator, IEEE Trans. on Automatic Control, Vol. 16, pp. 382–384, 1971.

    Article  Google Scholar 

  17. R. A. Howard, Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA, 1960.

    MATH  Google Scholar 

  18. D. Kleinman, On an Iterative technique for Riccati equation computations, IEEE Trans. on Automatic Control, Vol. 13, pp. 114–115, 1968.

    Article  Google Scholar 

  19. T. Landelius, Reinforcement learning and distributed local model synthesis, PhD Dissertation, Linkoping University, Sweden, 1997.

    Google Scholar 

  20. D. S. Levine and W. R. Elsberry, eds., Optimality in Biological and Artificial Networks?, Lawrence Erlbaum Assoc., Mahwah, NJ, 1997.

    Google Scholar 

  21. D. S. Levine, V. R. Brown, and V. T. Shirey, eds., Oscillations in Neural Systems, Lawrence Erlbaum Assoc., Mahwah, NJ, 2000.

    Google Scholar 

  22. F. L. Lewis and V. L. Syrmos, Optimal Control, John Wiley, New York, 1995.

    Google Scholar 

  23. X. Liu and S. N. Balakrishnan, Convergence analysis of adaptive critic based optimal control, Proceedings of the American Control Conference, pp. 1929–1933, Chicago, IL, 2000.

    Google Scholar 

  24. J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, Adaptive dynamic programming, IEEE Trans. on Systems, Man and Cybernetics, Vol. 32, No. 2, pp. 140–153, 2002.

    Article  Google Scholar 

  25. J. Si, A. Barto, W. Powel, and D. Wunch, Handbook of Learning and Approximate Dynamic Programming, John Wiley, Hoboken, NJ, 2004.

    Book  Google Scholar 

  26. B. L. Stevens and F. L. Lewis, Aircraft Control and Simulation, Wiley, Hoboken, NJ, 2nd Edition, 2003.

    Google Scholar 

  27. R. S. Sutton and A. G. Barto, Reinforcement Learning–An Introduction, MIT Press, Cambridge, MA, 1998.

    Google Scholar 

  28. R. Sutton, Learning to predict by the method of temporal differences, Machine Learning, Vol. 3, pp. 9–44, 1988.

    Google Scholar 

  29. D. Vrabie, M. Abu-Khalaf, F. Lewis, and Y. Wang, Continuous-time ADP for linear systems with unknown dynamics, Proc. of IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, pp. 247–253, Hawaii, April 2007.

    Google Scholar 

  30. D. Vrabie, O. Pastravanu, and F. Lewis, Policy iteration for continuous-time systems with unknown internal dynamics, Proc. of Mediterranean Conference on Control and Automation, MED’07, Athens, June 2007.

    Google Scholar 

  31. C.J. C. H. Watkins, Learning from delayed rewards. PhD Thesis, University of Cambridge, England, 1989.

    Google Scholar 

  32. P. Werbos, Neural networks for control and system identification, IEEE Proc. CDC89, IEEE, 1989.

    Google Scholar 

  33. P. J. Werbos, Approximate dynamic programming for real-time control and neural modeling, Handbook of Intelligent Control, D. A. White and D. A. Sofge eds., Van Nostrand, New York, 1992.

    Google Scholar 

  34. D.A. White and D.A. Sofge, eds., Handbook of Intelligent Control, Van Nostrand Reinhold, New York, 1992.

    Google Scholar 

  35. B. Widrow and E. Walach, Adaptive Inverse Control, Prentice-Hall, Englewood Cliffs, NJ, 1995.

    Google Scholar 

Download references

Acknowledgments

The authors wish to acknowledge Dr. Murad Abu-Khalaf and Prof. Octavian Pastravanu for their ideas, criticism and support during the research that led to the results presented here. This work was supported by the National Science Foundation ECS-0501451 and the Army Research Office W91NF-05-1-0314.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Draguna Vrabie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Birkhäuser Boston, a part of Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Vrabie, D., Lewis, F. (2008). Direct Adaptive Optimal Control: Biologically Inspired Feedback Control. In: Won, CH., Schrader, C., Michel, A. (eds) Advances in Statistical Control, Algebraic Systems Theory, and Dynamic Systems Characteristics. Systems & Control: Foundations & Applications. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-0-8176-4795-7_10

Download citation

Publish with us

Policies and ethics