Reinforcement Learning for Trading Systems and Portfolios: Immediate vs Future Rewards

Moody, John; Saffell, Matthew; Liao, Yuansong; Wu, Lizhong

doi:10.1007/978-1-4615-5625-1_10

John Moody⁶,
Matthew Saffell⁶,
Yuansong Liao⁶ &
…
Lizhong Wu⁶

Part of the book series: Advances in Computational Management Science ((AICM,volume 2))

313 Accesses
6 Citations

Abstract

We propose to train trading systems and portfolios by optimizing financial objective functions via reinforcement learning. The performance functions that we consider as value functions are profit or wealth, the Sharpe ratio and our recently proposed differential Sharpe ratio for online learning. In Moody & Wu (1997), we presented empirical results in controlled experiments that demonstrated the efficacy of some of our methods for optimizing trading systems. Here we extend our previous work to the use of Q-Learning, a reinforcement learning technique that uses approximated future rewards to choose actions, and compare its performance to that of our previous systems which are trained to maximize immediate reward. We also provide new simulation results that demonstrate the presence of predictability in the monthly S&P 500 Stock Index for the 25 year period 1970 through 1994.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Crites R. H. & Barto A. G. (1996), Improving elevator performance using reinforcement learning, in D. S. Touretzky, M. C. Mozer & M. E. Hasselmo, eds, ‘Advances in NIPS’, Vol. 8, pp. 1017–1023.
Google Scholar
Moody J. & Wu L. (1997), Optimization of trading systems and portfolios, in Y. Abu-Mostafa, A. N. Refenes & A. S. Weigend, eds, ‘Neural Networks in the Capital Markets’, World Scientific, London.
Google Scholar
Moody J., Wu L., Liao Y. & Saffell M. (1998), ‘Performance functions and reinforcement learning for trading systems and portfolios’, Journal of Forecasting 17. To appear.
Google Scholar
Neuneier R. (1996), Optimal asset allocation using adaptive dynamic programming, in D. S. Touretzky, M. C. Mozer & M. E. Hasselmo, eds, ‘Advances in NIPS’, Vol. 8, pp. 952–958.
Google Scholar
Sharpe W. F. (1966), ‘Mutual fund performance’, Journal of Business pp. 119–138.
Google Scholar
Tesauro G. (1989), ‘Neurogammon wins the computer olympiad’, Neural Computation 1, 321–323.
Article Google Scholar
Watkins C. J. C. H. (1989), Learning with Delayed Rewards, PhD thesis, Cambridge University, Psychology Department.
Google Scholar
Zhang W. & Dietterich T. G. (1996), High-performance job-shop scheduling with a time-delay td(λ) network, in D. S. Touretzky, M. C. Mozer & M. E. Hasselmo, eds, ‘Advances in NIPS’, Vol. 8, pp. 1024–1030.
Google Scholar

Download references

Author information

Authors and Affiliations

CSE Dept., Oregon Graduate Institute, P.O. Box 91000, Portland, 97291-1000, USA
John Moody, Matthew Saffell, Yuansong Liao & Lizhong Wu

Authors

John Moody
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Saffell
View author publications
You can also search for this author in PubMed Google Scholar
Yuansong Liao
View author publications
You can also search for this author in PubMed Google Scholar
Lizhong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

London Business School, UK
Apostolos-Paul N. Refenes
London Business School, UK
Andrew N. Burgess
Oregon Graduate Institute, Portland, USA
John E. Moody

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Moody, J., Saffell, M., Liao, Y., Wu, L. (1998). Reinforcement Learning for Trading Systems and Portfolios: Immediate vs Future Rewards. In: Refenes, AP.N., Burgess, A.N., Moody, J.E. (eds) Decision Technologies for Computational Finance. Advances in Computational Management Science, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5625-1_10

Download citation

DOI: https://doi.org/10.1007/978-1-4615-5625-1_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-8309-3
Online ISBN: 978-1-4615-5625-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics