Reinforcement learning approach to multi-stage decision making problems with changes in action sets

Etoh, Takuya; Takano, Hirotaka; Murata, Junichi

doi:10.1007/s10015-012-0058-9

Reinforcement learning approach to multi-stage decision making problems with changes in action sets

Original Article
Published: 06 November 2012

Volume 17, pages 293–299, (2012)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Takuya Etoh¹,
Hirotaka Takano¹ &
Junichi Murata¹

291 Accesses
Explore all metrics

Abstract

Multi-stage decision making (MSDM) problems often include changes in practical situations. For example, in the shortest route selection problems in road networks, travelling times of road sections vary depending on traffic conditions. The changes give rise to risks in adopting particular solutions to MSDM problems. Therefore, a method is proposed in this paper for solving MSDM problems considering the risks. Reinforcement learning (RL) is adopted as a method for solving those problems, and stochastic changes of action sets are treated. It is necessary to evaluate risks based on subjective views of decision makers (DMs) because the risk evaluation is by nature subjective and depends on DMs. Therefore, an RL approach is proposed which uses a new method for evaluating risks of the changes that can easily incorporate the DM’s subjective view and can be readily imbedded in reinforcement learning algorithms. The effectiveness of the method is illustrated with a road network path selection problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing Urban Design for Pandemics Using Reinforcement Learning and Multi-objective Optimization

Reinforcement learning algorithm for non-stationary environments

Article 18 June 2020

Reinforcement Learning Using Monte Carlo Policy Estimation for Disaster Mitigation

References

Bellman RE, Zadeh LA (1970) Decision-making in a fuzzy environment. Manag Sci 17(4):B-141–B-164
Article MathSciNet Google Scholar
Bertsekas DP (2007) Dynamic programming and optimal control, vol 1. Athena Scientigic, Belmont
Howard RA (1966) Dynamic programming. Manag Sci 12(5):317–348
Article Google Scholar
Wang F-Y, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction, IEEE Comput Intell Mag 39–47
Si J, Barto AG, Powell WB, Wunsch D (2004) Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE Press, New York
Momoh JA, Zhang Y (2005) Unit Commitment Using Adaptive Dynamic Programming
Barto Andrew G, Mahadevan Sridhar (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst Theory Appl 13:343–379
MATH Google Scholar
Merrick K, Maher ML (2009) Motivated learning from interesting events: adaptive, multitask learning agents for complex environments. Int Soc Adapt Behav 17:7–27
Article Google Scholar
Bedford T, Cooke R (2001) Probabilistic risk analysis: foundations and methods, Cambridge University Press, Cambridge
Kaplan S, Garrick J (1981) On the quantitative definition of risk. Risk Anal 1(1):11–27
Article Google Scholar
Kahneman D, Tversky A (1979) An analysis of decision under risk. Econometrica 47(2):263–292
Article MATH Google Scholar
Basak S, Shapiro A (2001) Value-at-risk-based risk management: optimal policies and asset prices. Rev Financ Stud Summer 14(2):371–405
Article Google Scholar
Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Artif Intell Res 24:81–108
MATH Google Scholar
Sato M, Kobayashi S (2000) Variance-penalized reinforcement learning for risk-averse asset allocation. Proc IDEAL 2000:244–249
Google Scholar
Shibuya T (2010) A study on reinforcement learning in unstationary dynamic environments. Proc SSI 2010 3B1–3B2 (in Japanese)
Sutton RS, Barto AG (1998) Reinforcement learning—an introduction, The MIT Press, Cambridge
Howard RA (1960) Dynamic programming and markov processes. The MIT Press, Cambridge
MATH Google Scholar

Download references

Acknowledgments

This work has been partly supported by JSPS KAKENHI Grant Number 24560499.

Author information

Authors and Affiliations

Kyushu University, 744 Motooka, Nishi-ku, Fukuoka, Japan
Takuya Etoh, Hirotaka Takano & Junichi Murata

Authors

Takuya Etoh
View author publications
You can also search for this author in PubMed Google Scholar
Hirotaka Takano
View author publications
You can also search for this author in PubMed Google Scholar
Junichi Murata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takuya Etoh.

About this article

Cite this article

Etoh, T., Takano, H. & Murata, J. Reinforcement learning approach to multi-stage decision making problems with changes in action sets. Artif Life Robotics 17, 293–299 (2012). https://doi.org/10.1007/s10015-012-0058-9

Download citation

Received: 14 March 2012
Accepted: 27 August 2012
Published: 06 November 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10015-012-0058-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning approach to multi-stage decision making problems with changes in action sets

Abstract

Access this article

Similar content being viewed by others

Optimizing Urban Design for Pandemics Using Reinforcement Learning and Multi-objective Optimization

Reinforcement learning algorithm for non-stationary environments

Reinforcement Learning Using Monte Carlo Policy Estimation for Disaster Mitigation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Reinforcement learning approach to multi-stage decision making problems with changes in action sets

Abstract

Access this article

Similar content being viewed by others

Optimizing Urban Design for Pandemics Using Reinforcement Learning and Multi-objective Optimization

Reinforcement learning algorithm for non-stationary environments

Reinforcement Learning Using Monte Carlo Policy Estimation for Disaster Mitigation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation