Machine Learning

, Volume 84, Issue 1–2, pp 51–80 | Cite as

Empirical evaluation methods for multiobjective reinforcement learning algorithms

  • Peter Vamplew
  • Richard Dazeley
  • Adam Berry
  • Rustam Issabekov
  • Evan Dekker


While a number of algorithms for multiobjective reinforcement learning have been proposed, and a small number of applications developed, there has been very little rigorous empirical evaluation of the performance and limitations of these algorithms. This paper proposes standard methods for such empirical evaluation, to act as a foundation for future comparative studies. Two classes of multiobjective reinforcement learning algorithms are identified, and appropriate evaluation metrics and methodologies are proposed for each class. A suite of benchmark problems with known Pareto fronts is described, and future extensions and implementations of this benchmark suite are discussed. The utility of the proposed evaluation methods are demonstrated via an empirical comparison of two example learning algorithms.


Multiobjective reinforcement learning Multiple objectives Empirical methods Pareto fronts Pareto optimal policies 


  1. Aissani, N., Beldjilali, B., & Trentesaux, D. (2008). Efficient and effective reactive scheduling of manufacturing system using sarsa-multi-objective agents. In MOSIM’08: 7th conference internationale de modelisation and simulation, Paris, April 2008. Google Scholar
  2. Barrett, L., & Narayanan, S. (2008). Learning all optimal policies with multiple criteria. In Proceedings of the international conference on machine learning. Google Scholar
  3. Berry, A. (2008). Escaping the bounds of generality—unbounded bi-objective optimisation. Ph.D. thesis, School of Computing, University of Tasmania. Google Scholar
  4. Berry, D. A., & Fristedt, B. (1985). Bandit problems: sequential allocation of experiments. London: Chapman and Hall. zbMATHGoogle Scholar
  5. Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: safely approximating the value function, NIPS-7. Google Scholar
  6. Castelletti, A., Corani, G., Rizzolli, A., Soncinie-Sessa, R., & Weber, E. (2002). Reinforcement learning in the operational management of a water system. In IFAC workshop on modeling and control in environmental issues, Keio University, Yokohama, Japan (pp. 325–330). Google Scholar
  7. Chaterjee, K., Majumdar, R., & Henzinger, T. (2006). Markov decision processes with multiple objectives. In Lecture notes in computer science: Vol. 3884. Proceedings of the 23rd international conference on theoretical aspects of computer science (STACS) (pp. 325–336). Berlin: Springer. Google Scholar
  8. Coello, C. A. C., Veldhuizen, D. A. V., & Lamont, G. B. (2002). Evolutionary algorithms for solving multi-objective problems. Dordrecht: Kluwer Academic. zbMATHGoogle Scholar
  9. Crabbe, F. L. (2001). Multiple goal Q-learning: Issues and functions. In Proceedings of the international conference on computational intelligence for modelling control and automation (CIMCA). San Mateo: Morgan Kaufmann. Google Scholar
  10. Dutech, A., Edmunds, T., Kok, J., Lagoudakis, M., Littman, M., Riedmiller, M., Russell, B., Scherrer, B., Sutton, R., Timmer, S., Vlassis, N., White, A., & Whiteson, S. (2005). Reinforcement learning benchmarks and bake-offs ii. In Workshop at advances in neural information processing systems conference. Google Scholar
  11. Frank, A., & Asuncion, A. (2010). UCI machine learning repository []. Irvine, CA: University of California, School of Information and Computer Science.
  12. Gabor, Z., Kalmar, Z., & Szepesvari, C. (1998). Multi-criteria reinforcement learning. In The fifteenth international conference on machine learning (pp. 197–205). Google Scholar
  13. Geibel, P. (2006). Reinforcement learning for MDPs with constraints. In ECML 2006: European conference on machine learning (pp. 646–653). CrossRefGoogle Scholar
  14. Handa, H. (2009). Solving multi-objective reinforcement learning problems by EDA-RL—acquisition of various strategies. In Proceedings of the 2009 ninth international conference on intelligent systems design and applications (pp. 426–431). CrossRefGoogle Scholar
  15. Horn, J., Nafpliotis, N., & Goldberg, D. E. (1994). A niched Pareto genetic algorithm for multiobjective optimisation. In Proceedings of the first IEEE conference on evolutionary computation, IEEE world congress on computational intelligence. Google Scholar
  16. Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237–285. Google Scholar
  17. Knowles, J. D., Thiele, L., & Zitzler, E. (2006). A tutorial on the performance assessment of stochastive multiobjective optimizers (TIK-Report No. 214). Computer engineering and networks laboratory, ETH Zurich, February 2006. Google Scholar
  18. Mannor, S., & Shimkin, N. (2001). The steering approach for multi-criteria reinforcement learning. In Neural information processing systems, Vancouver, Canada (pp. 1563–1570). Google Scholar
  19. Mannor, S., & Shimkin, N. (2004). A geometric approach to multi-criterion reinforcement learning. Journal of Machine Learning Research, 5, 325–360. MathSciNetGoogle Scholar
  20. Natarajan, S., & Tadepalli, P. (2005). Dynamic preferences in multi-criteria reinforcement learning. In International conference on machine learning, Bonn, Germany (pp. 601–608). Google Scholar
  21. Pareto, V. (1896). Manuel d’economie politique. Paris: Giard. Google Scholar
  22. Perez, J., Germain-Renaud, C., Kegl, B., & Loomis, C. (2009). Responsive elastic computing. In International conference on autonomic computing, Barcelona (pp. 55–64). Google Scholar
  23. Shelton, C. R. (2001). Importance sampling for reinforcement learning with multiple objectives (Tech. Report No. 2001-003). Massachusetts Institute of Technology, AI Laboratory. Google Scholar
  24. Srinivas, N., & Deb, K. (1994). Multiobjective optimisation using nondominated sorting in genetic algorithms. Evolutionary Computation, 2(3), 221–248. CrossRefGoogle Scholar
  25. Sutton, R. S. (1996). Generalisation in reinforcement learning: successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems: proceedings of the 1995 conference (pp. 1038–1044). Cambridge: MIT Press. Google Scholar
  26. Tanner, B., & White, A. (2009). RL-glue: language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10, 2133–2136. Google Scholar
  27. Tesauro, G., Das, R., Chan, H., Kephart, J. O., Lefurgy, C., Levine, D. W., & Rawson, F. (2007). Managing power consumption and performance of computing systems using reinforcement learning. Neural information processing systems. Google Scholar
  28. UMass (2010). University of Massachusetts reinforcement learning repository.
  29. Vamplew, P., Dazeley, R., Barker, E., & Kelarev, A. (2009). Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks. In Lecture notes in artificial intelligence. Proceedings of AI09: the 22nd Australasian conference on artificial intelligence, Melbourne, Australia, December 2009. Berlin: Springer. Google Scholar
  30. Vamplew, P., Yearwood, J., Dazeley, R., & Berry, A. (2008). On the limitations of scalarisation for multiobjective learning of Pareto fronts. In W. Wobcke & M. Zhang (Eds.), Lecture notes in artificial intelligence: Vol. 5360. Proceedings of AI08: the 21st Australasian conference on artificial intelligence Auckland, New Zealand, December 2008 (pp. 372–378). Berlin: Springer. Google Scholar
  31. White, A. (2006). A standard system for benchmarking in reinforcement learningi. Master’s thesis, University of Alberta, Alberta, Canada. Google Scholar
  32. Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2009). Generalized domains for empirical evaluations in reinforcement learning. In Proceedings of the 4th workshop on evaluation methods for machine learning at ICML-09, Montreal, Canada. Google Scholar
  33. Wiering, M. A., & de Jong, E. D. (2007). Computing optimal stationary policies for multi-objective Markov decision processes. In Proceedings of the IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL) (pp. 158–165). CrossRefGoogle Scholar
  34. Zitzler, E., & Thiele, L. (1999). Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Transactions on Evolutionary Computation, 3(4), 257–271. CrossRefGoogle Scholar
  35. Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C. M., & Grunert da Fonseca, V. (2003). Performance assessment of multiobjective optimizers: an analysis and review. IEEE Transactions on Evolutionary Computation, 7(2), 117–132. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Peter Vamplew
    • 1
  • Richard Dazeley
    • 1
  • Adam Berry
    • 2
  • Rustam Issabekov
    • 1
  • Evan Dekker
    • 1
  1. 1.Graduate School of Information Technology and Mathematical SciencesUniversity of BallaratBallaratAustralia
  2. 2.CSIRO Energy CentreMayfield WestAustralia

Personalised recommendations