Abstract
This paper presents the regime-switching recurrent reinforcement learning (RSRRL) model and describes its application to investment problems. The RSRRL is a regime-switching extension of the recurrent reinforcement learning (RRL) algorithm. The basic RRL model was proposed by Moody and Wu (Proceedings of the IEEE/IAFE 1997 on Computational Intelligence for Financial Engineering (CIFEr). IEEE, New York, pp 300–307 1997) and presented as a methodology to solve stochastic control problems in finance. We argue that the RRL is unable to capture all the intricacies of financial time series, and propose the RSRRL as a more suitable algorithm for such type of data. This paper gives a description of two variants of the RSRRL, namely a threshold version and a smooth transition version, and compares their performance to the basic RRL model in automated trading and portfolio management applications. We use volatility as an indicator/transition variable for switching between regimes. The out-of-sample results are generally in favour of the RSRRL models, thereby supporting the regime-switching approach, but some doubts exist regarding the robustness of the proposed models, especially in the presence of transaction costs.
Similar content being viewed by others
References
Bertoluzzo F, Corazza M (2007) Making financial trading by recurrent reinforcement learning. In: Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference. Springer-Verlag, USA, pp 619–626
Dempster M, Leemans V (2006) An automated FX trading system using adaptive reinforcement learning. Expert Syst Appl 30(3): 543–552
Franses P, van Dijk D (2000) Nonlinear time series models in empirical finance. Cambridge University Press, Cambridge
Gold C (2003) FX trading via recurrent reinforcement learning. In: Proceedings. 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, 2003. IEEE, pp 363–370
Hamilton JD (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57(2): 357–384
Hamilton JD (2008) Regime-switching models. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, England
Kaelbling L, Littman M, Moore A (1996) Reinforcement learning: A survey. J Artif Intell Res 4(1): 237–285
Koutmos G (1997) Feedback trading and the autocorrelation pattern of stock returns: further empirical evidence. J Int Money Financ 16(4): 625–636
LeBaron B (1992) Some relations between volatility and serial correlations in stock market returns. J Bus 65(2): 199–219
McKenzie MD, Faff RW (2003) The determinants of conditional autocorrelation in stock returns. J Financ Res 26(2): 259–274
Moody J, Wu L (1997) Optimization of trading systems and portfolios. In: Proceedings of the IEEE/IAFE 1997 on Computational Intelligence for Financial Engineering (CIFEr). IEEE, New York, pp 300–307
Moody J, Wu L, Liao Y, Saffell M (1998) Performance functions and reinforcement learning for trading systems and portfolios. J Forecast 17(56): 441–470
Moody J, Saffell M (2001) Learning to trade via direct reinforcement. IEEE Trans Neural Netw 12(4): 875–889
Sentana E, Wadhwani S (1992) Feedback traders and stock return autocorrelations: evidence from a century of daily data. Econ J 102(411): 415–425
Sharpe W (1966) Mutual fund performance. J Bus 39(1): 119–138
Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob optim 11(4): 341–359
Sutton R, Barto A (1998) Introduction to reinforcement learning. MIT Press, Cambridge
Teräsvirta T (1994) Specification, estimation, and evaluation of smooth transition autoregressive models. J Am Stat Assoc 89(425): 208–218
Tong H (1978) On a threshold model. In: Chen C (eds) Pattern recognition and signal processing. Sijthoff & Noordhoff, The Netherlands, pp 101–141
Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England
Werbos P (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10): 1550–1560
White H (1989) Some asymptotic results for learning in single hidden-layer feedforward network models. J Am Stat Assoc 84(408): 1003–1013
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Maringer, D., Ramtohul, T. Regime-switching recurrent reinforcement learning for investment decision making. Comput Manag Sci 9, 89–107 (2012). https://doi.org/10.1007/s10287-011-0131-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10287-011-0131-1