Computational Management Science

, Volume 9, Issue 1, pp 89–107 | Cite as

Regime-switching recurrent reinforcement learning for investment decision making

Original Paper


This paper presents the regime-switching recurrent reinforcement learning (RSRRL) model and describes its application to investment problems. The RSRRL is a regime-switching extension of the recurrent reinforcement learning (RRL) algorithm. The basic RRL model was proposed by Moody and Wu (Proceedings of the IEEE/IAFE 1997 on Computational Intelligence for Financial Engineering (CIFEr). IEEE, New York, pp 300–307 1997) and presented as a methodology to solve stochastic control problems in finance. We argue that the RRL is unable to capture all the intricacies of financial time series, and propose the RSRRL as a more suitable algorithm for such type of data. This paper gives a description of two variants of the RSRRL, namely a threshold version and a smooth transition version, and compares their performance to the basic RRL model in automated trading and portfolio management applications. We use volatility as an indicator/transition variable for switching between regimes. The out-of-sample results are generally in favour of the RSRRL models, thereby supporting the regime-switching approach, but some doubts exist regarding the robustness of the proposed models, especially in the presence of transaction costs.


  1. Bertoluzzo F, Corazza M (2007) Making financial trading by recurrent reinforcement learning. In: Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference. Springer-Verlag, USA, pp 619–626Google Scholar
  2. Dempster M, Leemans V (2006) An automated FX trading system using adaptive reinforcement learning. Expert Syst Appl 30(3): 543–552CrossRefGoogle Scholar
  3. Franses P, van Dijk D (2000) Nonlinear time series models in empirical finance. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  4. Gold C (2003) FX trading via recurrent reinforcement learning. In: Proceedings. 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, 2003. IEEE, pp 363–370Google Scholar
  5. Hamilton JD (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57(2): 357–384CrossRefGoogle Scholar
  6. Hamilton JD (2008) Regime-switching models. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, EnglandGoogle Scholar
  7. Kaelbling L, Littman M, Moore A (1996) Reinforcement learning: A survey. J Artif Intell Res 4(1): 237–285Google Scholar
  8. Koutmos G (1997) Feedback trading and the autocorrelation pattern of stock returns: further empirical evidence. J Int Money Financ 16(4): 625–636CrossRefGoogle Scholar
  9. LeBaron B (1992) Some relations between volatility and serial correlations in stock market returns. J Bus 65(2): 199–219CrossRefGoogle Scholar
  10. McKenzie MD, Faff RW (2003) The determinants of conditional autocorrelation in stock returns. J Financ Res 26(2): 259–274CrossRefGoogle Scholar
  11. Moody J, Wu L (1997) Optimization of trading systems and portfolios. In: Proceedings of the IEEE/IAFE 1997 on Computational Intelligence for Financial Engineering (CIFEr). IEEE, New York, pp 300–307Google Scholar
  12. Moody J, Wu L, Liao Y, Saffell M (1998) Performance functions and reinforcement learning for trading systems and portfolios. J Forecast 17(56): 441–470CrossRefGoogle Scholar
  13. Moody J, Saffell M (2001) Learning to trade via direct reinforcement. IEEE Trans Neural Netw 12(4): 875–889CrossRefGoogle Scholar
  14. Sentana E, Wadhwani S (1992) Feedback traders and stock return autocorrelations: evidence from a century of daily data. Econ J 102(411): 415–425CrossRefGoogle Scholar
  15. Sharpe W (1966) Mutual fund performance. J Bus 39(1): 119–138CrossRefGoogle Scholar
  16. Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob optim 11(4): 341–359CrossRefGoogle Scholar
  17. Sutton R, Barto A (1998) Introduction to reinforcement learning. MIT Press, CambridgeGoogle Scholar
  18. Teräsvirta T (1994) Specification, estimation, and evaluation of smooth transition autoregressive models. J Am Stat Assoc 89(425): 208–218CrossRefGoogle Scholar
  19. Tong H (1978) On a threshold model. In: Chen C (eds) Pattern recognition and signal processing. Sijthoff & Noordhoff, The Netherlands, pp 101–141Google Scholar
  20. Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge, EnglandGoogle Scholar
  21. Werbos P (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10): 1550–1560CrossRefGoogle Scholar
  22. White H (1989) Some asymptotic results for learning in single hidden-layer feedforward network models. J Am Stat Assoc 84(408): 1003–1013CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Universität BaselBaselSwitzerland

Personalised recommendations