A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems
This paper presents a reinforcement learning framework for stock trading systems. Trading system parameters are optimized by Qlearning algorithm and neural networks are adopted for value approximation. In this framework, cooperative multiple agents are used to efficiently integrate global trend prediction and local trading strategy for obtaining better trading performance. Agents communicate with others sharing training episodes and learned policies, while keeping the overall scheme of conventional Q-learning. Experimental results on KOSPI 200 show that a trading system based on the proposed framework outperforms the market average and makes appreciable profits. Furthermore, in view of risk management, the system is superior to a system trained by supervised learning.
KeywordsStock Market Optimal Policy Trading System Asset Allocation Signal Agent
Unable to display preview. Download preview PDF.
- 1.Kendall, S. M., Ord, K.: Time Series. Oxford, New York. (1997)Google Scholar
- 2.Neuneier, R.: Enhancing Q-Learning for Optimal Asset Allocation. Advances in Neural Information Processing Systems 10. MIT Press, Cambridge. (1998) 936–942Google Scholar
- 3.Lee, J.: Stock Price Prediction using Reinforcement Learning. In Proceedings of the 6th IEEE International Symposium on Industrial Electronics. (2001)Google Scholar
- 4.Sutton, R. S., Barto, A. G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge. (1998)Google Scholar
- 5.Baird, L. C.: Residual Algorithms: Reinforcement learning with Function Approximation. In Proceedings of Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Fransisco. (1995) 30–37Google Scholar
- 8.Xiu, G., Laiwan, C.: Algorithm for Trading and Portfolio Management Using Qlearning and Sharpe Ratio Maximization. In Proceedings of ICONIP 2000, Korea. (2000) 832–837Google Scholar
- 11.Neuneier, R., Mihatsch., O.: Risk Sensitive Reinforcement Learning. Advances in Neural Information Processing Systems 11. MIT Press, Cambridge. (1999) 1031–1037Google Scholar