A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems

  • Jae Won Lee
  • Jangmin O
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2453)


This paper presents a reinforcement learning framework for stock trading systems. Trading system parameters are optimized by Qlearning algorithm and neural networks are adopted for value approximation. In this framework, cooperative multiple agents are used to efficiently integrate global trend prediction and local trading strategy for obtaining better trading performance. Agents communicate with others sharing training episodes and learned policies, while keeping the overall scheme of conventional Q-learning. Experimental results on KOSPI 200 show that a trading system based on the proposed framework outperforms the market average and makes appreciable profits. Furthermore, in view of risk management, the system is superior to a system trained by supervised learning.


Stock Market Optimal Policy Trading System Asset Allocation Signal Agent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kendall, S. M., Ord, K.: Time Series. Oxford, New York. (1997)Google Scholar
  2. 2.
    Neuneier, R.: Enhancing Q-Learning for Optimal Asset Allocation. Advances in Neural Information Processing Systems 10. MIT Press, Cambridge. (1998) 936–942Google Scholar
  3. 3.
    Lee, J.: Stock Price Prediction using Reinforcement Learning. In Proceedings of the 6th IEEE International Symposium on Industrial Electronics. (2001)Google Scholar
  4. 4.
    Sutton, R. S., Barto, A. G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge. (1998)Google Scholar
  5. 5.
    Baird, L. C.: Residual Algorithms: Reinforcement learning with Function Approximation. In Proceedings of Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Fransisco. (1995) 30–37Google Scholar
  6. 6.
    Bengio, Y., Simard, P., Frasconi, P.: Learning Long-Term Dependencies with Gradient is Dificult. IEEE Transactions on Neural Networks 5(2). (1994) 157–166CrossRefGoogle Scholar
  7. 7.
    Jaakkola, M., Jordan, M., Singh, S.: On the Convergence of Stochastic Iterative Dynamic Programming Algorithms. Neural Computation, 6(6). (1994) 1185–2201zbMATHCrossRefGoogle Scholar
  8. 8.
    Xiu, G., Laiwan, C.: Algorithm for Trading and Portfolio Management Using Qlearning and Sharpe Ratio Maximization. In Proceedings of ICONIP 2000, Korea. (2000) 832–837Google Scholar
  9. 9.
    Moody, J., Wu, Y., Liao, Y., Saffell, M.: Performance Functions and Reinforcement Learning for Trading Systems and Portfolios. Journal of Forecasting, 17(5–6). (1998) 441–470CrossRefGoogle Scholar
  10. 10.
    Moody, J., Saffell, M.: Learning to Trade via Direct Reinforcement. IEEE Transactions on Neural Networks, 12(4). (2001) 875–889CrossRefGoogle Scholar
  11. 11.
    Neuneier, R., Mihatsch., O.: Risk Sensitive Reinforcement Learning. Advances in Neural Information Processing Systems 11. MIT Press, Cambridge. (1999) 1031–1037Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Jae Won Lee
    • 1
  • Jangmin O
    • 2
  1. 1.School of Computer Science and EngineeringSungshin Women’s UniversitySeoulKorea
  2. 2.School of Computer EngineeringSeoul National UniversitySeoulKorea

Personalised recommendations