Skip to main content
Log in

Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is to maximize the expected rewards; however, this may cause a fatal state because safety is not considered. Therefore, RL methods that consider safety during or after learning have been proposed. SafeRL is similar to MORL because it considers two objectives, i.e., maximizing expected rewards and satisfying safety constraints. However, to the best of our knowledge, no study has investigated the relationship between MORL and SafeRL to demonstrate that the SafeRL method can be applied to MORL tasks. This paper combines MORL with SafeRL and proposes a method for Multi-Objective SafeRL (MOSafeRL). We applied the proposed method to resource gathering task, which is a standard task used in MORL test cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT press, Cambridge

    MATH  Google Scholar 

  2. Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2011) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn 84:51–80

    Article  MathSciNet  Google Scholar 

  3. García J, Fernández F (2015) A comprehensive survey on safe reinforcement learning. J Mach Learn Res 16:1437–1480

    MathSciNet  MATH  Google Scholar 

  4. Aissani N, Beldjilali, Trentesaux D (2008) Efficient and effective reactive scheduling of manufacturing system using SARSA multi-objective agents. In: Proc of the 7th Int’l Conf on Modeling and Simulation, pp 698–707

  5. Van Moffaert K, Drugan MM, Nowé A (2013) Scalarized multi-objective reinforcement learning: novel design techniques. In: Proc of 2013 IEEE Sympo on Adapt Dyn Progr and Reinforce Learn, pp 191–199

  6. Gábor Z, Kalmár Z, Szepesvári C (1998) Multi-criteria reinforcement learning. In: Proc of the 15th Int’l Conf on Mach Learn, pp 197–205

  7. Barrett L, Narayanan S (2008) Learning all optimal policies with multiple criteria. In: Proc of the 25th Int’l Conf on Mach Learn, pp 41–47

  8. Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of Pareto dominating policies. J Mach Learn Res 15:3663–3692

    MathSciNet  MATH  Google Scholar 

  9. Basu A, Bhattacharyya T, Borkar VS (2008) A learning algorithm for risk-sensitive cost. Math Oper Res 33(4):880–898

    Article  MathSciNet  MATH  Google Scholar 

  10. Borkar VS, Meyn SP (2002) Risk-sensitive optimal control for Markov decision processes with monotone cost. Math Oper Res 27(1):192–209

    Article  MathSciNet  MATH  Google Scholar 

  11. Borkar VS (2002) Q-learning for risk-sensitive control. Math Oper Res 27(2):294–311

    Article  MathSciNet  MATH  Google Scholar 

  12. Mihatsch O, Neuneier R (2002) Risk-sensitive reinforcement learning. Mach Learn 49(2–3):267–290

    Article  MATH  Google Scholar 

  13. Sato M, Kimura H, Kobayashi S (2002) TD algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif Intell 16(3):353–362 (in Japanese)

    Article  Google Scholar 

  14. Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Mach Learn Res 24:81–108

    MATH  Google Scholar 

  15. Takeyama D, Kanoh M, Matsui T, Nakamura T (2015) Obtaining robot’s behavior to avoid danger by using probability based reinforcement learning. J Jpn Soc Fuzzy Theory Intell Inform 27(6):877–884 (in Japanese)

    Google Scholar 

  16. Horie N, Matsui T, Moriyama K, Mutoh A, Inuzuka N (2016) Reinforcement learning based on action values combined with success probability and profit. In: Proc of the 30th Ann Conf of the Jpn Soc for Artif Intell, 1M2-4 (in Japanese)

  17. Van Moffaert K, Drugan MM, Nowé A (2013) Hypervolume-based multi-objective reinforcement learning. In: Proc of the 7th Int’l Conf on Evol Multi-Criterion Opt, pp 352–366

  18. Wiering M, Withagen M, Drugan M (2014) Model-based multi-objective reinforcement learning. In: Proc of 2014 IEEE Sympo on Adapt Dyn Progr and Reinforce Learn

  19. Wang W, Sebag M (2013) Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search. Mach Learn 92:403–429

    Article  MathSciNet  MATH  Google Scholar 

  20. Zitzler E, Thiele L (1998) Multiobjective optimization using evolutionary algorithms: a comparative case study. In: Proc of the 5th Int’l Conf on Parallel Problem Solving from Nature, pp 292-301

  21. Auger A, Bader J, Brockhoff D, Zitzler E (2009) Theory of the hypervolume indicator: optimal \(\mu\)-distributions and the choice of the reference point. In: Proc of the 10th ACM SIGEVO Workshop on Found of Genetic Algorithms

  22. Künzel S, Meyer-Nieberg S (2018) Evolving artificial neural networks for multi-objective tasks. In: Proc of the 21st Int’l Conf on Appl of Evol Comput, pp 671–686

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tohgoroh Matsui.

Additional information

This work was presented in part at the 23rd International Symposium on Artificial Life and Robotics, Beppu, Oita, January 18–20, 2018.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Horie, N., Matsui, T., Moriyama, K. et al. Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning. Artif Life Robotics 24, 352–359 (2019). https://doi.org/10.1007/s10015-019-00523-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-019-00523-3

Keywords

Navigation