Transaction-aware inverse reinforcement learning for trading in stock markets

Sun, Qizhou; Gong, Xueyuan; Si, Yain-Whar

doi:10.1007/s10489-023-04959-w

Transaction-aware inverse reinforcement learning for trading in stock markets

Published: 23 September 2023

Volume 53, pages 28186–28206, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

393 Accesses
Explore all metrics

Abstract

Training automated trading agents is a long-standing topic that has been widely discussed in artificial intelligence for the quantitative finance. Reinforcement learning (RL) is designed to solve the sequential decision-making tasks, like the stock trading. The output of the RL is the policy which can be presented as the probability values of the possible actions based on a given state. The policy is optimized by the reward function. However, even if the profit is considered as the natural reward function, a trading agent equipped with an RL model has several serious problems. Specifically, profit is only obtained after executing sell action, different profits exist at the same time step due to the varying-length transactions and the hold action deals with two opposite states, empty or nonempty position. To alleviate these shortcomings, in this paper, we introduce a new trading action called wait for the empty position status and design the appropriate rewards to all actions. Based on the new action space and reward functions, a novel approach named Transaction-aware Inverse Reinforcement Learning (TAIRL) is proposed. TAIRL rewards all trading actions for avoiding the reward bias and dilemma. TAIRL is evaluated by backtesting on 12 stocks of US, UK and China stock markets, and compared against other state-of-art RL methods and moving average trading methods. The experimental results show that the agent of TAIRL achieves the state-of-art performance in profitability and anti-risk ability.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Intelligent Trader Model Based on Deep Reinforcement Learning

A Sharpe Ratio Based Reward Scheme in Deep Reinforcement Learning for Financial Trading

Evaluation of Deep Reinforcement Learning Based Stock Trading

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request

References

Mariani MC, Florescu I (2019) Quantitative Finance. John Wiley & Sons, London
Book Google Scholar
Sutton RS (2020) Sutton & barto book: reinforcement learning: an introduction. In: A Bradford Book. MIT Press Cambridge, MA, London
Liu X-Y, Yang H, Gao J, Wang CD (2021) Finrl: deep reinforcement learning framework to automate trading in quantitative finance. In: Proceedings of the second ACM international conference on AI in finance, pp 1–9
Li Z, Liu X-Y, Zheng J, Wang Z, Walid A, Guo J (2021) Finrl-podracer: high performance and scalable deep reinforcement learning for quantitative finance. In: Proceedings of the second ACM international conference on AI in finance, pp 1–9
Wu X, Chen H, Wang J, Troiano L, Loia V, Fujita H (2020) Adaptive stock trading strategies with deep reinforcement learning methods. Information Sciences 538:142–158. https://doi.org/10.1016/j.ins.2020.05.066
Article MathSciNet Google Scholar
Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47:253–279
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Chou P-W, Maturana D, Scherer S (2017) Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: International conference on machine learning, PMLR pp 834–843
Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Niroui F, Zhang K, Kashino Z, Nejat G (2019) Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments. IEEE Robotics and Automation Letters 4(2):610–617
Article Google Scholar
Marchesini E, Farinelli A (2022) Enhancing deep reinforcement learning approaches for multi-robot navigation via single-robot evolutionary policy search. In: 2022 international conference on robotics and automation (ICRA), IEEE pp 5525–5531
Nguyen H, La H (2019) Review of deep reinforcement learning for robot manipulation. In: 2019 third IEEE international conference on robotic computing (IRC), pp 590–595. https://doi.org/10.1109/IRC.2019.00120
Ng AY, Russell S et al (2000) Algorithms for inverse reinforcement learning. In: ICML, vol 1, p 2
Alpaydin E (2020) Introduction to Machine Learning. MIT press, US
MATH Google Scholar
Yang SY, Yu Y, Almahdi S (2018) An investor sentiment reward-based trading system using gaussian inverse reinforcement learning algorithm. Expert Systems with Applications 114:388–401
Article Google Scholar
Zhang W, Zhang N, Yan J, Li G, Yang X (2022) Auto uning of price prediction models for high-frequency trading via reinforcement learning. Pattern Recogn 125:108543
Article Google Scholar
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 Aaai fall symposium series
Wang Y, He H, Tan X (2020) Truly proximal policy optimization. In: Uncertainty in artificial intelligence, PMLR pp 113– 122
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor, PMLR 1861–1870
Edwards RD, Magee J, Bassetti WC (2018) Technical Analysis of Stock Trends. CRC Press, UK
Book Google Scholar
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning, PMLR pp 1995–2003
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Chung J (2013) Playing atari with deep reinforcement learning. Comput Ence 21:351–362
Google Scholar
Ziebart BD, Maas AL, Bagnell JA, Dey AK et al (2008) Maximum entropy inverse reinforcement learning. Aaai, vol 8. IL, USA, Chicago, pp 1433–1438
Google Scholar
Hadfield-Menell D, Russell SJ, Abbeel P, Dragan A (2016) Cooperative inverse reinforcement learning. Advances in Neural Information Processing Systems 29
Herman M, Fischer V, Gindele T, Burgard W (2015) Inverse reinforcement learning of behavioral models for online-adapting navigation strategies. In: 2015 IEEE international conference on robotics and automation (ICRA), IEEE pp 3215–3222
Zhifei S, Joo EM (2012) A review of inverse reinforcement learning theory and recent advances. In: 2012 IEEE congress on evolutionary computation, IEEE pp 1–8
Audiffren J, Valko M, Lazaric A, Ghavamzadeh M (2015) Maximum entropy semi-supervised inverse reinforcement learning. In: Twenty- fourth international joint conference on artificial intelligence
Zhifei S, Joo EM (2012) A survey of inverse reinforcement learning techniques. International Journal of Intelligent Computing and Cybernetics
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, PMLR pp 1928–1937
Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trustregion method for deep reinforcement learning using kronecker-factored approximation. Advances in Neural Information Processing Systems 30
Ho J, Ermon S (2016) Generative adversarial imitation learning. Advances in Neural Information Processing Systems 29
Yang SY, Qiao Q, Beling PA, Scherer WT, Kirilenko AA (2015) Gaussian process-based algorithmic trading strategy identification. Quantitative Finance 15(10):1683–1703
Article MathSciNet MATH Google Scholar
Baiynd A-M (2011) The trading book: a complete solution to mastering technical systems and trading psychology. McGraw Hill Professional, NewYork
Liu Y, Liu Q, Zhao H, Pan Z, Liu C (2020) Adaptive quantitative trading: An imitative deep reinforcement learning approach. Proceedings of the AAAI Conference on Artificial Intelligence 34:2128–2135
Article Google Scholar
Finn C, Levine S, Abbeel P (2016) Guided cost learning: Deep inverse optimal control via policy optimization. In: International conference on machine learning, PMLR pp 49–58
Nielsen A (2019) Practical Time Series Analysis: Prediction with Statistics and Machine Learning. O’ Reilly Media, London
Google Scholar
Pascanu R, Çaglar G, Cho K, Bengio Y (2014) How to construct deep recurrent neural networks. CoRR arXiv:1312.6026
Li Y, Zheng W, Zheng Z (2019) Deep robust reinforcement learning for practical algorithmic trading. IEEE Access 7:108014–108022
Article Google Scholar
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning, p 1

Download references

Acknowledgements

This research was funded by the Research Committee of University of Macau File number (MYRG2022-00162-FST and MYRG2019-00136-FST)

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Macau, Avenida da Universidade, Macau, China
Qizhou Sun & Yain-Whar Si
School of Intelligent Systems Science and Engineering, Jinan University, Skinny Dog Road, Tianhe District, Guangzhou, China
Xueyuan Gong

Authors

Qizhou Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xueyuan Gong
View author publications
You can also search for this author in PubMed Google Scholar
Yain-Whar Si
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yain-Whar Si.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, Q., Gong, X. & Si, YW. Transaction-aware inverse reinforcement learning for trading in stock markets. Appl Intell 53, 28186–28206 (2023). https://doi.org/10.1007/s10489-023-04959-w

Download citation

Accepted: 08 August 2023
Published: 23 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-04959-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions