Skip to main content
Log in

Reinforcement learning approach to multi-stage decision making problems with changes in action sets

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Multi-stage decision making (MSDM) problems often include changes in practical situations. For example, in the shortest route selection problems in road networks, travelling times of road sections vary depending on traffic conditions. The changes give rise to risks in adopting particular solutions to MSDM problems. Therefore, a method is proposed in this paper for solving MSDM problems considering the risks. Reinforcement learning (RL) is adopted as a method for solving those problems, and stochastic changes of action sets are treated. It is necessary to evaluate risks based on subjective views of decision makers (DMs) because the risk evaluation is by nature subjective and depends on DMs. Therefore, an RL approach is proposed which uses a new method for evaluating risks of the changes that can easily incorporate the DM’s subjective view and can be readily imbedded in reinforcement learning algorithms. The effectiveness of the method is illustrated with a road network path selection problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bellman RE, Zadeh LA (1970) Decision-making in a fuzzy environment. Manag Sci 17(4):B-141–B-164

    Article  MathSciNet  Google Scholar 

  2. Bertsekas DP (2007) Dynamic programming and optimal control, vol 1. Athena Scientigic, Belmont

  3. Howard RA (1966) Dynamic programming. Manag Sci 12(5):317–348

    Article  Google Scholar 

  4. Wang F-Y, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction, IEEE Comput Intell Mag 39–47

  5. Si J, Barto AG, Powell WB, Wunsch D (2004) Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE Press, New York

  6. Momoh JA, Zhang Y (2005) Unit Commitment Using Adaptive Dynamic Programming

  7. Barto Andrew G, Mahadevan Sridhar (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst Theory Appl 13:343–379

    MATH  Google Scholar 

  8. Merrick K, Maher ML (2009) Motivated learning from interesting events: adaptive, multitask learning agents for complex environments. Int Soc Adapt Behav 17:7–27

    Article  Google Scholar 

  9. Bedford T, Cooke R (2001) Probabilistic risk analysis: foundations and methods, Cambridge University Press, Cambridge

  10. Kaplan S, Garrick J (1981) On the quantitative definition of risk. Risk Anal 1(1):11–27

    Article  Google Scholar 

  11. Kahneman D, Tversky A (1979) An analysis of decision under risk. Econometrica 47(2):263–292

    Article  MATH  Google Scholar 

  12. Basak S, Shapiro A (2001) Value-at-risk-based risk management: optimal policies and asset prices. Rev Financ Stud Summer 14(2):371–405

    Article  Google Scholar 

  13. Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Artif Intell Res 24:81–108

    MATH  Google Scholar 

  14. Sato M, Kobayashi S (2000) Variance-penalized reinforcement learning for risk-averse asset allocation. Proc IDEAL 2000:244–249

    Google Scholar 

  15. Shibuya T (2010) A study on reinforcement learning in unstationary dynamic environments. Proc SSI 2010 3B1–3B2 (in Japanese)

  16. Sutton RS, Barto AG (1998) Reinforcement learning—an introduction, The MIT Press, Cambridge

  17. Howard RA (1960) Dynamic programming and markov processes. The MIT Press, Cambridge

    MATH  Google Scholar 

Download references

Acknowledgments

This work has been partly supported by JSPS KAKENHI Grant Number 24560499.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takuya Etoh.

About this article

Cite this article

Etoh, T., Takano, H. & Murata, J. Reinforcement learning approach to multi-stage decision making problems with changes in action sets. Artif Life Robotics 17, 293–299 (2012). https://doi.org/10.1007/s10015-012-0058-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-012-0058-9

Keywords

Navigation