Computational Management Science

, Volume 9, Issue 1, pp 89–107 | Cite as

Regime-switching recurrent reinforcement learning for investment decision making

  • Dietmar Maringer
  • Tikesh Ramtohul
Original Paper


This paper presents the regime-switching recurrent reinforcement learning (RSRRL) model and describes its application to investment problems. The RSRRL is a regime-switching extension of the recurrent reinforcement learning (RRL) algorithm. The basic RRL model was proposed by Moody and Wu (Proceedings of the IEEE/IAFE 1997 on Computational Intelligence for Financial Engineering (CIFEr). IEEE, New York, pp 300–307 1997) and presented as a methodology to solve stochastic control problems in finance. We argue that the RRL is unable to capture all the intricacies of financial time series, and propose the RSRRL as a more suitable algorithm for such type of data. This paper gives a description of two variants of the RSRRL, namely a threshold version and a smooth transition version, and compares their performance to the basic RRL model in automated trading and portfolio management applications. We use volatility as an indicator/transition variable for switching between regimes. The out-of-sample results are generally in favour of the RSRRL models, thereby supporting the regime-switching approach, but some doubts exist regarding the robustness of the proposed models, especially in the presence of transaction costs.


Transaction Cost Serial Correlation Trading Cost Trading System Sharpe Ratio 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bertoluzzo F, Corazza M (2007) Making financial trading by recurrent reinforcement learning. In: Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference. Springer-Verlag, USA, pp 619–626Google Scholar
  2. Dempster M, Leemans V (2006) An automated FX trading system using adaptive reinforcement learning. Expert Syst Appl 30(3): 543–552CrossRefGoogle Scholar
  3. Franses P, van Dijk D (2000) Nonlinear time series models in empirical finance. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  4. Gold C (2003) FX trading via recurrent reinforcement learning. In: Proceedings. 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, 2003. IEEE, pp 363–370Google Scholar
  5. Hamilton JD (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57(2): 357–384CrossRefGoogle Scholar
  6. Hamilton JD (2008) Regime-switching models. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, EnglandGoogle Scholar
  7. Kaelbling L, Littman M, Moore A (1996) Reinforcement learning: A survey. J Artif Intell Res 4(1): 237–285Google Scholar
  8. Koutmos G (1997) Feedback trading and the autocorrelation pattern of stock returns: further empirical evidence. J Int Money Financ 16(4): 625–636CrossRefGoogle Scholar
  9. LeBaron B (1992) Some relations between volatility and serial correlations in stock market returns. J Bus 65(2): 199–219CrossRefGoogle Scholar
  10. McKenzie MD, Faff RW (2003) The determinants of conditional autocorrelation in stock returns. J Financ Res 26(2): 259–274CrossRefGoogle Scholar
  11. Moody J, Wu L (1997) Optimization of trading systems and portfolios. In: Proceedings of the IEEE/IAFE 1997 on Computational Intelligence for Financial Engineering (CIFEr). IEEE, New York, pp 300–307Google Scholar
  12. Moody J, Wu L, Liao Y, Saffell M (1998) Performance functions and reinforcement learning for trading systems and portfolios. J Forecast 17(56): 441–470CrossRefGoogle Scholar
  13. Moody J, Saffell M (2001) Learning to trade via direct reinforcement. IEEE Trans Neural Netw 12(4): 875–889CrossRefGoogle Scholar
  14. Sentana E, Wadhwani S (1992) Feedback traders and stock return autocorrelations: evidence from a century of daily data. Econ J 102(411): 415–425CrossRefGoogle Scholar
  15. Sharpe W (1966) Mutual fund performance. J Bus 39(1): 119–138CrossRefGoogle Scholar
  16. Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob optim 11(4): 341–359CrossRefGoogle Scholar
  17. Sutton R, Barto A (1998) Introduction to reinforcement learning. MIT Press, CambridgeGoogle Scholar
  18. Teräsvirta T (1994) Specification, estimation, and evaluation of smooth transition autoregressive models. J Am Stat Assoc 89(425): 208–218CrossRefGoogle Scholar
  19. Tong H (1978) On a threshold model. In: Chen C (eds) Pattern recognition and signal processing. Sijthoff & Noordhoff, The Netherlands, pp 101–141Google Scholar
  20. Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge, EnglandGoogle Scholar
  21. Werbos P (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10): 1550–1560CrossRefGoogle Scholar
  22. White H (1989) Some asymptotic results for learning in single hidden-layer feedforward network models. J Am Stat Assoc 84(408): 1003–1013CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Universität BaselBaselSwitzerland

Personalised recommendations