Technical process control is a highly interesting area of application serving a high practical impact. Since classical controller design is, in general, a demanding job, this area constitutes a highly attractive domain for the application of learning approaches—in particular, reinforcement learning (RL) methods. RL provides concepts for learning controllers that, by cleverly exploiting information from interactions with the process, can acquire high-quality control behaviour from scratch.
This article focuses on the presentation of four typical benchmark problems whilst highlighting important and challenging aspects of technical process control: nonlinear dynamics; varying set-points; long-term dynamic effects; influence of external variables; and the primacy of precision. We propose performance measures for controller quality that apply both to classical control design and learning controllers, measuring precision, speed, and stability of the controller. A second set of key-figures describes the performance from the perspective of a learning approach while providing information about the efficiency of the method with respect to the learning effort needed. For all four benchmark problems, extensive and detailed information is provided with which to carry out the evaluations outlined in this article.
A close evaluation of our own RL learning scheme, NFQCA (Neural Fitted Q Iteration with Continuous Actions), in acordance with the proposed scheme on all four benchmarks, thereby provides performance figures on both control quality and learning behavior.
Anderson, C., & Miller, W. (1990). Challenging control problems. In Neural networks for control (pp. 475–410).
Anderson, C. W., Hittle, D., Katz, A., & Kretchmar, R. M. (1997). Synthesis of reinforcement learning, neural networks, and pi control applied to a simulated heating coil. Journal of Artificial Intelligence in Engineering, 11(4), 423–431.
Bellman, R. (1957). Dynamic programming. Princeton: Princeton Univ Press.
Boyan, J., & Littman, M. (1994). Packet routing in dynamically changing networks—a reinforcement learning approach. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems 6.
Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In: Andvances in neural information processing systems 8.
CTM (1996). Digital Control Tutorial. University of Michigan, www.engin.umich.edu/group/ctm (online).
Deisenroth, M., Rasmussen, C., & Peters, J. (2009). Gaussian process dynamic programming. Neurocomputing, 72(7–9), 1508–1524.
Dullerud, G. P. F. (2000). A course in robust control theory: A convex approach. New York: Springer.
El-Fakdi, A., & Carreras, M. (2008). Policy gradient based reinforcement learning for real autonomous underwater cable tracking. In International conference on intelligent robots and systems, 2008. IROS 2008. IEEE/RSJ (pp. 3635–3640).
Farrel, J. A., & Polycarpou, M. M. (2006). Adaptive approximation based control. New York: Wiley Interscience.
Gabel, T., & Riedmiller, M. (2008). Adaptive reactive job-shop scheduling with reinforcement learning agents. International Journal of Information Technology and Intellifent Computing, 24(4).
Goodwin, G. C., & Payne, R. L. (1977). Dynamic system identification: experiment design and data analysis. New York: Academic Press.
Hafner, R. (2009). Dateneffiziente selbstlernende neuronale Regler. PhD thesis, University of Osnabrueck.
Hafner, R., & Riedmiller, M. (2007). Neural reinforcement learning controllers for a real robot application. In Proceedings of the IEEE international conference on robotics and automation (ICRA 07), Rome, Italy.
Jordan, M. I., & Jacobs, R. A. (1990). Learning to control an unstable system with forward modeling. In D. Touretzky (Ed.), Advances in neural information processing systems (NIPS) 2 (pp. 324–331). San Mateo: Morgan Kaufmann.
Kaloust, J., Ham, C., & Qu, Z. (1997). Nonlinear autopilot control design for a 2-dof helicopter model. IEE Proceedings. Control Theory and Applications, 144(6), 612–616.
Kretchmar, R. M. (2000). A synthesis of reinforcement learning and robust control theory. PhD thesis, Colorado State University, Fort Collins, CO.
Krishnakumar, K., & Gundy-burlet, K. (2001). Intelligent control approaches for aircraft applications (Technical report). National Aeronautics and Space Administration, Ames Research.
Kwan, C., Lewis, F., & Kim, Y. (1999). Robust neural network control of rigid link flexible-joint robots. Asian Journal of Control, 1(3), 188–197.
Liu, D., Javaherian, H., Kovalenko, O., & Huang, T. (2008). Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man and Cybernetics. Part B. Cybernetics, 38(4), 988–993.
Ljung, L. (1999). System identification theory for the user (2nd ed.). Upper Saddle River: PTR Prentice Hall.
Martinez, J. J., Sename, O., & Voda, A. (2009). Modeling and robust control of blu-ray disc servo-mechanisms. Mechatronics, 19(5), 715–725.
Nelles, O. (2001). Nonlinear system identification. Berlin: Springer.
Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., & Liang, E. (2004). Inverted autonomous helicopter flight via reinforcement learning. In International symposium on experimental robotics.
Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE international conference on intelligent robotics systems (Iros 2006).
Prokhorov, D., & Wunsch, D. (1997). Adaptive critic designs. IEEE Transactions on Neural Networks, 8, 997–1007.
Riedmiller, M. (2005). Neural fitted q iteration—first experiences with a data efficient neural reinforcement learning method. In Proc. of the European conference on machine learning, ECML 2005, Porto, Portugal.
Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In H. Ruspini (Ed.), Proceedings of the IEEE international conference on neural networks (ICNN), San Francisco (pp. 586–591).
Riedmiller, M., Hafner, R., Lange, S., & Timmer, S. (2006). Clsquare—software framework for closed loop control. Available at http://ml.informatik.uni-freiburg.de/research/clsquare.
Riedmiller, M., Montemerlo, M., & Dahlkamp, H. (2007a). Learning to drive in 20 minutes. In Proceedings of the FBIT 2007 conference, Jeju, Korea. Berlin: Springer. Best Paper Award.
Riedmiller, M., Peters, J., & Schaal, S. (2007b). Evaluation of policy gradient methods and variants on the cart-pole benchmark. In Proceedings of the IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL 07), Honolulu, USA.
Riedmiller, M., Gabel, T., Hafner, R., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots, 27(1), 55–74.
Schiffmann, W., Joost, M., & Werner, R. (1993). Comparison of optimized backpropagation algorithms. In Proc. of ESANN’93, Brussels (pp. 97–104).
Sjöberg, J., Zhang, Q., Ljung, L., Benveniste, A., Deylon, B., Glorennec, Y. P., Hjalmarsson, H., & Juditsky, A. (1995). Nonlinear black-box modeling in system identification: a unified overview. Automatica, 31, 1691–1724.
Slotine, J. E., & Li, W. (1991). Applied nonlinear control. New York: Prentice Hall.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (adaptive computation and machine learning). Cambridge: MIT Press.
Szepesvari, C. (2009). Successful application of rl. Available at http://www.ualberta.ca/szepesva/RESEARCH/RLApplications.html.
Tanner, B., & White, A. (2009). RL-Glue: language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10, 2133–2136.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.
Tesauro, G., Chess, D. M., Walsh, W. E., Das, R., Segal, A., Whalley, I., Kephart, J. O., & White, S. R. (2004). A multi-agent systems approach to autonomic computing. In AAMAS ’04: Proceedings of the third international joint conference on autonomous agents and multiagent systems (pp. 464–471). Washington: IEEE Computer Society.
Underwood, D. M., & Crawford, R. R. (1991). Dynamic nonlinear modeling of a hot-water-to-air heat exchanger for control applications. ASHRAE Transactions, 97(1), 149–155.
Wang, Y., & Si, J. (2001). On-line learning control by association and reinforcement. IEEE Transactions on Neural Networks, 12(2), 264–276.
Watkins, C. J. (1989). Learning from delayed rewards. PhD thesis, Cambridge University.
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279–292.
Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. The AI Magazine, 31(2), 81–94.
Yang, Z.-J., & Minashima, M. (2001). Robust nonlinear control of a feedback linearizable voltage-controlled magnetic levitation system. Transactions of the Institute of Electrical Engeneers of Japan, 1203–1211.
Yang, Z.-J., & Tateishi, M. (2001). Adaptive robust nonlinear control of a magnetic levitation system. Automatica, 37(7), 1125–1131.
Yang, Z.-J., Tsubakihara, H., Kanae, S., & Wada, K. (2007). Robust nonlinear control of a voltage-controlled magnetic levitation system using disturbance observer. Transactions of IEE of Japan, 127-C(12), 2118–2125.
Yang, Z.-J., Kunitoshi, K., Kanae, S., & Wada, K. (2008). Adaptive robust output feedback control of a magnetic levitation system by k-filter approach. IEEE Transactions on Industrial Electronics, 55(1), 390–399.
Editors: S. Whiteson and M. Littman.
About this article
Cite this article
Hafner, R., Riedmiller, M. Reinforcement learning in feedback control. Mach Learn 84, 137–169 (2011). https://doi.org/10.1007/s10994-011-5235-x
- Reinforcement learning
- Feedback control
- Nonlinear control