Compound Reinforcement Learning: Theory and an Application to Finance

  • Tohgoroh Matsui
  • Takashi Goto
  • Kiyoshi Izumi
  • Yu Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7188)


This paper describes compound reinforcement learning (RL) that is an extended RL based on the compound return. Compound RL maximizes the logarithm of expected double-exponentially discounted compound return in return-based Markov decision processes (MDPs). The contributions of this paper are (1) Theoretical description of compound RL that is an extended RL framework for maximizing the compound return in a return-based MDP and (2) Experimental results in an illustrative example and an application to finance.


Reinforcement learning compound return value functions finance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Basu, A., Bhattacharyya, T., Borkar, V.S.: A learning algorithm for risk-sensitive cost. Mathematics of Operations Research 33(4), 880–898 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Borkar, V.S.: Q-learning for risk-sensitive control. Mathematics of Operations Research 27(2), 294–311 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Campbell, J.Y., Lo, A.W., Graig MacKinlay, A.: The Econometrics of Financial Markets. Princeton University Press (1997)Google Scholar
  4. 4.
    CMA. Global sovereign credit risk report, 4th quarter 2010. Credit Market Analysis, Ltd. (CMA) (2011)Google Scholar
  5. 5.
    Geibel, P., Wysotzki, F.: Risk-sensitive reinforcement learning applied to control under constraints. Journal of Artificial Intelligence Research 24, 81–108 (2005)zbMATHGoogle Scholar
  6. 6.
    Gosavi, A.: A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis. Machine Learning 55(1), 5–29 (2004)zbMATHCrossRefGoogle Scholar
  7. 7.
    Heger, M.: Consideration of risk in reinforcement learning. In: Proc. of the Eleventh International Conference on Machine Learning, ICML 1994, pp. 105–111 (1994)Google Scholar
  8. 8.
    Kelly Jr., J.L.: A new interpretation of information rate. Bell System Technical Journal 35, 917–926 (1956)MathSciNetGoogle Scholar
  9. 9.
    Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Machine Learning 49(2-3), 267–290 (2002)zbMATHCrossRefGoogle Scholar
  10. 10.
    Poundstone, W.: Fortune’s Formula: The untold story of the scientific betting system that beat the casinos and wall street. Hill and Wang (2005)Google Scholar
  11. 11.
    Sato, M., Kobayashi, S.: Average-reward reinforcement learning for variance penalized Markov decision problems. In: Proc. of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 473–480 (2001)Google Scholar
  12. 12.
    Schwartz, A.: A reinforcement learning method for maximizing undiscounted rewards. In: Proc. of the Tenth International Conference on Machine Learning (ICML 1993), pp. 298–305 (1993)Google Scholar
  13. 13.
    Singh, S.P.: Reinforcement learning algorithms for average-payoff Markovian decision processes. In: Proc. of the Twelfth National Conference on Artificial Intelligence (AAAI 1994), vol. 1, pp. 700–705 (1994)Google Scholar
  14. 14.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press (1998)Google Scholar
  15. 15.
    Tsitsiklis, J.N., Van Roy, B.: On average versus discounted reward temporal-difference learning. Machine Learning 49, 179–191 (2002)zbMATHCrossRefGoogle Scholar
  16. 16.
    Vince, R.: Portfolio management formulas: mathematical trading methods for the futures, options, and stock markets. Wiley (1990)Google Scholar
  17. 17.
    Watkins, C.J.C.H., Dayan, P.: Technical note: Q-learning. Machine Learning 8(3/4), 279–292 (1992)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tohgoroh Matsui
    • 1
  • Takashi Goto
    • 2
  • Kiyoshi Izumi
    • 3
    • 4
  • Yu Chen
    • 3
  1. 1.Chubu UniversityKasugaiJapan
  2. 2.Bank of Tokyo-Mitsubishi UFJ, Ltd.TokyoJapan
  3. 3.The University of TokyoTokyoJapan
  4. 4.JST PRESTOTokyoJapan

Personalised recommendations