Reinforcement learning in feedback control

Challenges and benchmarks from technical process control

Abstract

Technical process control is a highly interesting area of application serving a high practical impact. Since classical controller design is, in general, a demanding job, this area constitutes a highly attractive domain for the application of learning approaches—in particular, reinforcement learning (RL) methods. RL provides concepts for learning controllers that, by cleverly exploiting information from interactions with the process, can acquire high-quality control behaviour from scratch.

This article focuses on the presentation of four typical benchmark problems whilst highlighting important and challenging aspects of technical process control: nonlinear dynamics; varying set-points; long-term dynamic effects; influence of external variables; and the primacy of precision. We propose performance measures for controller quality that apply both to classical control design and learning controllers, measuring precision, speed, and stability of the controller. A second set of key-figures describes the performance from the perspective of a learning approach while providing information about the efficiency of the method with respect to the learning effort needed. For all four benchmark problems, extensive and detailed information is provided with which to carry out the evaluations outlined in this article.

A close evaluation of our own RL learning scheme, NFQCA (Neural Fitted Q Iteration with Continuous Actions), in acordance with the proposed scheme on all four benchmarks, thereby provides performance figures on both control quality and learning behavior.

References

  1. Anderson, C., & Miller, W. (1990). Challenging control problems. In Neural networks for control (pp. 475–410).

    Google Scholar 

  2. Anderson, C. W., Hittle, D., Katz, A., & Kretchmar, R. M. (1997). Synthesis of reinforcement learning, neural networks, and pi control applied to a simulated heating coil. Journal of Artificial Intelligence in Engineering, 11(4), 423–431.

    Google Scholar 

  3. Bellman, R. (1957). Dynamic programming. Princeton: Princeton Univ Press.

    Google Scholar 

  4. Boyan, J., & Littman, M. (1994). Packet routing in dynamically changing networks—a reinforcement learning approach. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems 6.

    Google Scholar 

  5. Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In: Andvances in neural information processing systems 8.

    Google Scholar 

  6. CTM (1996). Digital Control Tutorial. University of Michigan, www.engin.umich.edu/group/ctm (online).

  7. Deisenroth, M., Rasmussen, C., & Peters, J. (2009). Gaussian process dynamic programming. Neurocomputing, 72(7–9), 1508–1524.

    Article  Google Scholar 

  8. Dullerud, G. P. F. (2000). A course in robust control theory: A convex approach. New York: Springer.

    Google Scholar 

  9. El-Fakdi, A., & Carreras, M. (2008). Policy gradient based reinforcement learning for real autonomous underwater cable tracking. In International conference on intelligent robots and systems, 2008. IROS 2008. IEEE/RSJ (pp. 3635–3640).

    Google Scholar 

  10. Farrel, J. A., & Polycarpou, M. M. (2006). Adaptive approximation based control. New York: Wiley Interscience.

    Google Scholar 

  11. Gabel, T., & Riedmiller, M. (2008). Adaptive reactive job-shop scheduling with reinforcement learning agents. International Journal of Information Technology and Intellifent Computing, 24(4).

  12. Goodwin, G. C., & Payne, R. L. (1977). Dynamic system identification: experiment design and data analysis. New York: Academic Press.

    Google Scholar 

  13. Hafner, R. (2009). Dateneffiziente selbstlernende neuronale Regler. PhD thesis, University of Osnabrueck.

  14. Hafner, R., & Riedmiller, M. (2007). Neural reinforcement learning controllers for a real robot application. In Proceedings of the IEEE international conference on robotics and automation (ICRA 07), Rome, Italy.

    Google Scholar 

  15. Jordan, M. I., & Jacobs, R. A. (1990). Learning to control an unstable system with forward modeling. In D. Touretzky (Ed.), Advances in neural information processing systems (NIPS) 2 (pp. 324–331). San Mateo: Morgan Kaufmann.

    Google Scholar 

  16. Kaloust, J., Ham, C., & Qu, Z. (1997). Nonlinear autopilot control design for a 2-dof helicopter model. IEE Proceedings. Control Theory and Applications, 144(6), 612–616.

    MATH  Article  Google Scholar 

  17. Kretchmar, R. M. (2000). A synthesis of reinforcement learning and robust control theory. PhD thesis, Colorado State University, Fort Collins, CO.

  18. Krishnakumar, K., & Gundy-burlet, K. (2001). Intelligent control approaches for aircraft applications (Technical report). National Aeronautics and Space Administration, Ames Research.

  19. Kwan, C., Lewis, F., & Kim, Y. (1999). Robust neural network control of rigid link flexible-joint robots. Asian Journal of Control, 1(3), 188–197.

    Article  Google Scholar 

  20. Liu, D., Javaherian, H., Kovalenko, O., & Huang, T. (2008). Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man and Cybernetics. Part B. Cybernetics, 38(4), 988–993.

    Article  Google Scholar 

  21. Ljung, L. (1999). System identification theory for the user (2nd ed.). Upper Saddle River: PTR Prentice Hall.

    Google Scholar 

  22. Martinez, J. J., Sename, O., & Voda, A. (2009). Modeling and robust control of blu-ray disc servo-mechanisms. Mechatronics, 19(5), 715–725.

    Article  Google Scholar 

  23. Nelles, O. (2001). Nonlinear system identification. Berlin: Springer.

    Google Scholar 

  24. Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., & Liang, E. (2004). Inverted autonomous helicopter flight via reinforcement learning. In International symposium on experimental robotics.

    Google Scholar 

  25. Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE international conference on intelligent robotics systems (Iros 2006).

    Google Scholar 

  26. Prokhorov, D., & Wunsch, D. (1997). Adaptive critic designs. IEEE Transactions on Neural Networks, 8, 997–1007.

    Article  Google Scholar 

  27. Riedmiller, M. (2005). Neural fitted q iteration—first experiences with a data efficient neural reinforcement learning method. In Proc. of the European conference on machine learning, ECML 2005, Porto, Portugal.

    Google Scholar 

  28. Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In H. Ruspini (Ed.), Proceedings of the IEEE international conference on neural networks (ICNN), San Francisco (pp. 586–591).

    Google Scholar 

  29. Riedmiller, M., Hafner, R., Lange, S., & Timmer, S. (2006). Clsquare—software framework for closed loop control. Available at http://ml.informatik.uni-freiburg.de/research/clsquare.

  30. Riedmiller, M., Montemerlo, M., & Dahlkamp, H. (2007a). Learning to drive in 20 minutes. In Proceedings of the FBIT 2007 conference, Jeju, Korea. Berlin: Springer. Best Paper Award.

    Google Scholar 

  31. Riedmiller, M., Peters, J., & Schaal, S. (2007b). Evaluation of policy gradient methods and variants on the cart-pole benchmark. In Proceedings of the IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL 07), Honolulu, USA.

    Google Scholar 

  32. Riedmiller, M., Gabel, T., Hafner, R., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots, 27(1), 55–74.

    Article  Google Scholar 

  33. Schiffmann, W., Joost, M., & Werner, R. (1993). Comparison of optimized backpropagation algorithms. In Proc. of ESANN’93, Brussels (pp. 97–104).

    Google Scholar 

  34. Sjöberg, J., Zhang, Q., Ljung, L., Benveniste, A., Deylon, B., Glorennec, Y. P., Hjalmarsson, H., & Juditsky, A. (1995). Nonlinear black-box modeling in system identification: a unified overview. Automatica, 31, 1691–1724.

    MATH  Article  Google Scholar 

  35. Slotine, J. E., & Li, W. (1991). Applied nonlinear control. New York: Prentice Hall.

    Google Scholar 

  36. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (adaptive computation and machine learning). Cambridge: MIT Press.

    Google Scholar 

  37. Szepesvari, C. (2009). Successful application of rl. Available at http://www.ualberta.ca/szepesva/RESEARCH/RLApplications.html.

  38. Tanner, B., & White, A. (2009). RL-Glue: language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10, 2133–2136.

    Google Scholar 

  39. Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.

    MATH  Google Scholar 

  40. Tesauro, G., Chess, D. M., Walsh, W. E., Das, R., Segal, A., Whalley, I., Kephart, J. O., & White, S. R. (2004). A multi-agent systems approach to autonomic computing. In AAMAS ’04: Proceedings of the third international joint conference on autonomous agents and multiagent systems (pp. 464–471). Washington: IEEE Computer Society.

    Google Scholar 

  41. Underwood, D. M., & Crawford, R. R. (1991). Dynamic nonlinear modeling of a hot-water-to-air heat exchanger for control applications. ASHRAE Transactions, 97(1), 149–155.

    Google Scholar 

  42. Wang, Y., & Si, J. (2001). On-line learning control by association and reinforcement. IEEE Transactions on Neural Networks, 12(2), 264–276.

    MathSciNet  Article  Google Scholar 

  43. Watkins, C. J. (1989). Learning from delayed rewards. PhD thesis, Cambridge University.

  44. Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279–292.

    MATH  Google Scholar 

  45. Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. The AI Magazine, 31(2), 81–94.

    Google Scholar 

  46. Yang, Z.-J., & Minashima, M. (2001). Robust nonlinear control of a feedback linearizable voltage-controlled magnetic levitation system. Transactions of the Institute of Electrical Engeneers of Japan, 1203–1211.

  47. Yang, Z.-J., & Tateishi, M. (2001). Adaptive robust nonlinear control of a magnetic levitation system. Automatica, 37(7), 1125–1131.

    MATH  Article  Google Scholar 

  48. Yang, Z.-J., Tsubakihara, H., Kanae, S., & Wada, K. (2007). Robust nonlinear control of a voltage-controlled magnetic levitation system using disturbance observer. Transactions of IEE of Japan, 127-C(12), 2118–2125.

    Google Scholar 

  49. Yang, Z.-J., Kunitoshi, K., Kanae, S., & Wada, K. (2008). Adaptive robust output feedback control of a magnetic levitation system by k-filter approach. IEEE Transactions on Industrial Electronics, 55(1), 390–399.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Roland Hafner.

Additional information

Editors: S. Whiteson and M. Littman.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hafner, R., Riedmiller, M. Reinforcement learning in feedback control. Mach Learn 84, 137–169 (2011). https://doi.org/10.1007/s10994-011-5235-x

Download citation

Keywords

  • Reinforcement learning
  • Feedback control
  • Benchmarks
  • Nonlinear control