Multi-objective deep inverse reinforcement learning for weight estimation of objectives

Takayama, Naoya; Arai, Sachiyo

doi:10.1007/s10015-022-00773-8

Multi-objective deep inverse reinforcement learning for weight estimation of objectives

Original Article
Published: 22 July 2022

Volume 27, pages 594–602, (2022)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Naoya Takayama¹ &
Sachiyo Arai¹

580 Accesses
4 Citations
Explore all metrics

Abstract

Weight is a parameter used for measuring the priority in multi-objective reinforcement learning when linearly scalarizing the reward vector for each objective. The weights need to be set in advance; however, most real-world problems have numerous objectives. Therefore, adjusting the weights requires many trials and errors by the designer. In addition, a method to automatically estimate weights is needed to reduce the burden on designers to set weights. In this paper, we propose a novel method for estimating the weights based on the reward vector for each objective and the expert trajectories using the framework of inverse reinforcement learning (IRL). In particular, we adopt deep IRL with deep reinforcement learning and multiplicative weights apprenticeship learning for fast weight estimation in a continuous state space. Through experiments in a benchmark environment for multi-objective sequential decision-making problems in a continuous state space, we verified that our novel weight estimation method is superior to the projection method and Bayesian optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Reinforcement Learning Policy with Proportional-Integral Control

Integrating Classical Control into Reinforcement Learning Policy

Article 10 October 2019

References

Russell S (1998) Learning agents for uncertain environments. In: Proceedings of the eleventh annual conference on Computational learning theory, pp 101–103
Ikenaga A, Arai S (2018) Inverse reinforcement learning approach for elicitation of preferences in multi-objective sequential optimization. In: IEEE international conference on agents (ICA) 2018, pp 117–118
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning, pp 1–8
Syed U, Bowling M, Schapire RE (2008) Apprenticeship learning using linear programming. In: Proceedings of the 25th international conference on machine learning, pp 1032–1039
Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2011) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn 84(1–2):51–80
Article MathSciNet Google Scholar
Pelikan M, Goldberg DE, Cantú-Paz E (1999) BOA: the Bayesian optimization algorithm. In: Proceedings of the genetic and evolutionary computation conference GECCO-99, vol 1, pp 525–532
Moffaert V, Drugan M, Nowé A (2013) Scalarized multi-objective reinforcement learning: Novel design techniques. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL) 2013, pp 191–199
Saaty RW (1987) The analytic hierarchy process-what it is and how it is used. Math Model 9(3–5):161–176
Article MathSciNet Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Ito H, Matsubayashi T, Kurashima T, Toda H (2019) Scalable Bayesian optimization with memory retention (in Japanese). In: The 33rd annual conference of the Japanese Society for Artificial Intelligence, p 1J3J202

Download references

Author information

Authors and Affiliations

Division of Earth and Environmental Sciences, Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University, Chiba, Japan
Naoya Takayama & Sachiyo Arai

Authors

Naoya Takayama
View author publications
You can also search for this author in PubMed Google Scholar
Sachiyo Arai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naoya Takayama.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was presented in part at the joint symposium of the 27th International Symposium on Artificial Life and Robotics, the 7th International Symposium on BioComplexity, and the 5th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Online, January 25–27, 2022).

About this article

Cite this article

Takayama, N., Arai, S. Multi-objective deep inverse reinforcement learning for weight estimation of objectives. Artif Life Robotics 27, 594–602 (2022). https://doi.org/10.1007/s10015-022-00773-8

Download citation

Received: 16 April 2022
Accepted: 09 June 2022
Published: 22 July 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10015-022-00773-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-objective deep inverse reinforcement learning for weight estimation of objectives

Abstract

Access this article

Similar content being viewed by others

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Reinforcement Learning Policy with Proportional-Integral Control

Integrating Classical Control into Reinforcement Learning Policy

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Keywords

Navigation

Multi-objective deep inverse reinforcement learning for weight estimation of objectives

Abstract

Access this article

Similar content being viewed by others

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Reinforcement Learning Policy with Proportional-Integral Control

Integrating Classical Control into Reinforcement Learning Policy

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Share this article

Keywords

Search

Navigation