Skip to main content
Log in

Multi-objective deep inverse reinforcement learning for weight estimation of objectives

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Weight is a parameter used for measuring the priority in multi-objective reinforcement learning when linearly scalarizing the reward vector for each objective. The weights need to be set in advance; however, most real-world problems have numerous objectives. Therefore, adjusting the weights requires many trials and errors by the designer. In addition, a method to automatically estimate weights is needed to reduce the burden on designers to set weights. In this paper, we propose a novel method for estimating the weights based on the reward vector for each objective and the expert trajectories using the framework of inverse reinforcement learning (IRL). In particular, we adopt deep IRL with deep reinforcement learning and multiplicative weights apprenticeship learning for fast weight estimation in a continuous state space. Through experiments in a benchmark environment for multi-objective sequential decision-making problems in a continuous state space, we verified that our novel weight estimation method is superior to the projection method and Bayesian optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Russell S (1998) Learning agents for uncertain environments. In: Proceedings of the eleventh annual conference on Computational learning theory, pp 101–103

  2. Ikenaga A, Arai S (2018) Inverse reinforcement learning approach for elicitation of preferences in multi-objective sequential optimization. In: IEEE international conference on agents (ICA) 2018, pp 117–118

  3. Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning, pp 1–8

  4. Syed U, Bowling M, Schapire RE (2008) Apprenticeship learning using linear programming. In: Proceedings of the 25th international conference on machine learning, pp 1032–1039

  5. Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2011) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn 84(1–2):51–80

    Article  MathSciNet  Google Scholar 

  6. Pelikan M, Goldberg DE, Cantú-Paz E (1999) BOA: the Bayesian optimization algorithm. In: Proceedings of the genetic and evolutionary computation conference GECCO-99, vol 1, pp 525–532

  7. Moffaert V, Drugan M, Nowé A (2013) Scalarized multi-objective reinforcement learning: Novel design techniques. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL) 2013, pp 191–199

  8. Saaty RW (1987) The analytic hierarchy process-what it is and how it is used. Math Model 9(3–5):161–176

    Article  MathSciNet  Google Scholar 

  9. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602

  10. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347

  11. Ito H, Matsubayashi T, Kurashima T, Toda H (2019) Scalable Bayesian optimization with memory retention (in Japanese). In: The 33rd annual conference of the Japanese Society for Artificial Intelligence, p 1J3J202

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naoya Takayama.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was presented in part at the joint symposium of the 27th International Symposium on Artificial Life and Robotics, the 7th International Symposium on BioComplexity, and the 5th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Online, January 25–27, 2022).

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Takayama, N., Arai, S. Multi-objective deep inverse reinforcement learning for weight estimation of objectives. Artif Life Robotics 27, 594–602 (2022). https://doi.org/10.1007/s10015-022-00773-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-022-00773-8

Keywords

Navigation