Multi-objective safe reinforcement learning

The relationship between multi-objective reinforcement learning and safe reinforcement learning
  • Naoto Horie
  • Tohgoroh MatsuiEmail author
  • Koichi Moriyama
  • Atsuko Mutoh
  • Nobuhiro Inuzuka
Original Article


Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is to maximize the expected rewards; however, this may cause a fatal state because safety is not considered. Therefore, RL methods that consider safety during or after learning have been proposed. SafeRL is similar to MORL because it considers two objectives, i.e., maximizing expected rewards and satisfying safety constraints. However, to the best of our knowledge, no study has investigated the relationship between MORL and SafeRL to demonstrate that the SafeRL method can be applied to MORL tasks. This paper combines MORL with SafeRL and proposes a method for multi-objective safe RL (MOSafeRL). We applied the proposed method to Resource Gathering task, which is a standard task used in MORL test cases.


Reinforcement learning Risk control Multi-objective Success probability 



  1. 1.
    Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, CambridgeGoogle Scholar
  2. 2.
    Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2011) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn 84:51–80MathSciNetCrossRefGoogle Scholar
  3. 3.
    García J, Fernández F (2015) A comprehensive survey on safe reinforcement learning. J Mach Learn Res 16:1437–1480MathSciNetzbMATHGoogle Scholar
  4. 4.
    Aissani N, Beldjilali, Trentesaux D (2008) Efficient and effective reactive scheduling of manufacturing system using SARSA multi-objective agents. In: Proceedings of the 7th international conference on modeling and simulation, pp 698–707Google Scholar
  5. 5.
    Van Moffaert K, Drugan MM, Nowé A (2013) Scalarized multi-objective reinforcement learning: novel design techniques. In: Proceedings of 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning, pp 191–199Google Scholar
  6. 6.
    Gábor Z, Kalmár Z, Szepesvári C (1998) Multi-criteria reinforcement learning. In: Proceedings of the 15th international conference on machine learning, pp 197–205Google Scholar
  7. 7.
    Barrett L, Narayanan S (2008) Learning all optimal policies with multiple criteria. In: Proceedings of the 25th international conference on machine learning, pp 41–47Google Scholar
  8. 8.
    Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of Pareto dominating policies. J Mach Learn Res 15:3663–3692MathSciNetzbMATHGoogle Scholar
  9. 9.
    Basu A, Bhattacharyya T, Borkar VS (2008) A learning algorithm for risk-sensitive cost. Math Oper Res 33(4):880–898MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Borkar VS, Meyn SP (2002) Risk-sensitive optimal control for Markov decision processes with monotone cost. Math Oper Res 27(1):192–209MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Borkar VS (2002) Q-learning for risk-sensitive control. Math Oper Res 27(2):294–311MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Mihatsch O, Neuneier R (2002) Risk-sensitive reinforcement learning. Mach Learn 49(2–3):267–290CrossRefzbMATHGoogle Scholar
  13. 13.
    Sato M, Kimura H, Kobayashi S (2002) TD algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif Intell 16(3):353–362 (in Japanese) CrossRefGoogle Scholar
  14. 14.
    Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Mach Learn Res 24:81–108zbMATHGoogle Scholar
  15. 15.
    Takeyama D, Kanoh M, Matsui T, Nakamura T (2015) Obtaining robot’s behavior to avoid danger by using probability based reinforcement learning. J Jpn Soc Fuzzy Theory Intell Inform 27(6):877–884 (in Japanese) Google Scholar
  16. 16.
    Horie N, Matsui T, Moriyama K, Mutoh A, Inuzuka N (2016) Reinforcement learning based on action values combined with success probability and profit. In: Proceedings of the 30th annual conference of the Japanese society for artificial intelligence, 1M2-4 (in Japanese) Google Scholar
  17. 17.
    Van Moffaert K, Drugan MM, Nowé A (2013) Hypervolume-based multi-objective reinforcement learning. In: Proceedings of the 7th international conference on evolutionary multi-criterion optimization, pp 352–366Google Scholar
  18. 18.
    Wiering M, Withagen M, Drugan M (2014) Model-based multi-objective reinforcement learning. In: Proceedings of 2014 IEEE symposium on adaptive dynamic programming and reinforcement learningGoogle Scholar
  19. 19.
    Wang W, Sebag M (2013) Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search. Mach Learn 92:403–429MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Zitzler E, Thiele L (1998) Multiobjective optimization using evolutionary algorithms: a comparative case study. In: Proceedings of the 5th international conference on parallel problem solving from nature, pp 292–301Google Scholar
  21. 21.
    Auger A, Bader J, Brockhoff D, Zitzler E (2009) Theory of the hypervolume indicator: optimal \(\mu\)-distributions and the choice of the reference point. In: Proceedings of the 10th ACM/SIGEVO workshop on foundations of genetic algorithmsGoogle Scholar
  22. 22.
    Künzel S, Meyer-Nieberg S (2018) Evolving artificial neural networks for multi-objective tasks. In: Proceedings of the 21st international conference on applications of evolutionary computation, pp 671–686Google Scholar

Copyright information

© International Society of Artificial Life and Robotics (ISAROB) 2019

Authors and Affiliations

  • Naoto Horie
    • 1
    • 3
  • Tohgoroh Matsui
    • 2
    Email author
  • Koichi Moriyama
    • 1
  • Atsuko Mutoh
    • 1
  • Nobuhiro Inuzuka
    • 1
  1. 1.Department of Computer ScienceNagoya Institute of TechnologyNagoyaJapan
  2. 2.Department of Clinical Engineering, College of Life and Health SciencesChubu UniversityKasugaiJapan
  3. 3.Meitetsucom Co. Ltd.NagoyaJapan

Personalised recommendations