Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning

  • Naoto Horie
  • Tohgoroh MatsuiEmail author
  • Koichi Moriyama
  • Atsuko Mutoh
  • Nobuhiro Inuzuka
Original Article


Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is to maximize the expected rewards; however, this may cause a fatal state because safety is not considered. Therefore, RL methods that consider safety during or after learning have been proposed. SafeRL is similar to MORL because it considers two objectives, i.e., maximizing expected rewards and satisfying safety constraints. However, to the best of our knowledge, no study has investigated the relationship between MORL and SafeRL to demonstrate that the SafeRL method can be applied to MORL tasks. This paper combines MORL with SafeRL and proposes a method for Multi-Objective SafeRL (MOSafeRL). We applied the proposed method to resource gathering task, which is a standard task used in MORL test cases.


Reinforcement learning Risk control Multi-objective Success probability 


  1. 1.
    Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT press, CambridgeGoogle Scholar
  2. 2.
    Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2011) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn 84:51–80MathSciNetCrossRefGoogle Scholar
  3. 3.
    García J, Fernández F (2015) A comprehensive survey on safe reinforcement learning. J Mach Learn Res 16:1437–1480MathSciNetzbMATHGoogle Scholar
  4. 4.
    Aissani N, Beldjilali, Trentesaux D (2008) Efficient and effective reactive scheduling of manufacturing system using SARSA multi-objective agents. In: Proc of the 7th Int’l Conf on Modelization and Simulation, pp 698–707Google Scholar
  5. 5.
    Van Moffaert K, Drugan MM, Nowé A (2013) Scalarized multi-objective reinforcement learning: novel design techniques. In: Proc of 2013 IEEE Sympo on Adapt Dyn Progr and Reinforce Learn, pp 191–199Google Scholar
  6. 6.
    Gábor Z, Kalmár Z, Szepesvári C (1998) Multi-criteria reinforcement learning. In: Proc of the 15th Int’l Conf on Mach Learn, pp 197–205Google Scholar
  7. 7.
    Barrett L, Narayanan S (2008) Learning all optimal policies with multiple criteria. In: Proc of the 25th Int’l Conf on Mach Learn, pp 41–47Google Scholar
  8. 8.
    Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of Pareto dominating policies. J Mach Learn Res 15:3663–3692MathSciNetzbMATHGoogle Scholar
  9. 9.
    Basu A, Bhattacharyya T, Borkar VS (2008) A learning algorithm for risk-sensitive cost. Math Oper Res 33(4):880–898MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Borkar VS, Meyn SP (2002) Risk-sensitive optimal control for Markov decision processes with monotone cost. Math Oper Res 27(1):192–209MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Borkar VS (2002) Q-learning for risk-sensitive control. Math Oper Res 27(2):294–311MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Mihatsch O, Neuneier R (2002) Risk-sensitive reinforcement learning. Mach Learn 49(2–3):267–290CrossRefzbMATHGoogle Scholar
  13. 13.
    Sato M, Kimura H, Kobayashi S (2002) TD algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif Intell 16(3):353–362 in JapaneseCrossRefGoogle Scholar
  14. 14.
    Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Mach Learn Res 24:81–108zbMATHGoogle Scholar
  15. 15.
    Takeyama D, Kanoh M, Matsui T, Nakamura T (2015) Obtaining robot’s behavior to avoid danger by using probability based reinforcement learning. J Jpn Soc Fuzzy Theory Intell Inform 27(6):877–884 in JapaneseGoogle Scholar
  16. 16.
    Horie N, Matsui T, Moriyama K, Mutoh A, Inuzuka N (2016) Reinforcement learning based on action values combined with success probability and profit. In: Proc of the 30th Ann Conf of the Jpn Soc for Artif Intell, 1M2-4, in JapaneseGoogle Scholar
  17. 17.
    Van Moffaert K, Drugan MM, Nowé A (2013) Hypervolume-based multi-objective reinforcement learning. In: Proc of the 7th Int’l Conf on Evol Multi-Criterion Opt, pp 352–366Google Scholar
  18. 18.
    Wiering M, Withagen M, Drugan M (2014) Model-based multi-objective reinforcement learning. In: Proc of 2014 IEEE Sympo on Adapt Dyn Progr and Reinforce LearnGoogle Scholar
  19. 19.
    Wang W, Sebag M (2013) Hypervolume indicator and dominance reward based multi-objective Monte–Carlo tree search. Mach Learn 92:403–429MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Zitzler E, Thiele L (1998) Multiobjective optimization using evolutionary algorithms: a comparative case study. In: Proc of the 5th Int’l Conf on Parallel Problem Solving from Nature, pp 292-301Google Scholar
  21. 21.
    Auger A, Bader J, Brockhoff D, Zitzler E (2009) Theory of the hypervolume indicator: optimal \(\mu\)-distributions and the choice of the reference point. In: Proc of the 10th ACM SIGEVO Workshop on Found on Genetic AlgorithmsGoogle Scholar
  22. 22.
    Künzel S, Meyer-Nieberg S (2018) Evolving artificial neural networks for multi-objective tasks. In: Proc of the 21st Int’l Conf on Appl of Evol Comput, pp 671–686Google Scholar

Copyright information

© International Society of Artificial Life and Robotics (ISAROB) 2019

Authors and Affiliations

  • Naoto Horie
    • 1
    • 3
  • Tohgoroh Matsui
    • 2
    Email author
  • Koichi Moriyama
    • 1
  • Atsuko Mutoh
    • 1
  • Nobuhiro Inuzuka
    • 1
  1. 1.Department of Computer ScienceNagoya Institute of TechnologyNagoyaJapan
  2. 2.Department of Clinical Engineering, College of Life and Health SciencesChubu UniversityKasugaiJapan
  3. 3.Meitetsucom Co. Ltd.NagoyaJapan

Personalised recommendations