Skip to main content
Log in

Transaction-aware inverse reinforcement learning for trading in stock markets

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Training automated trading agents is a long-standing topic that has been widely discussed in artificial intelligence for the quantitative finance. Reinforcement learning (RL) is designed to solve the sequential decision-making tasks, like the stock trading. The output of the RL is the policy which can be presented as the probability values of the possible actions based on a given state. The policy is optimized by the reward function. However, even if the profit is considered as the natural reward function, a trading agent equipped with an RL model has several serious problems. Specifically, profit is only obtained after executing sell action, different profits exist at the same time step due to the varying-length transactions and the hold action deals with two opposite states, empty or nonempty position. To alleviate these shortcomings, in this paper, we introduce a new trading action called wait for the empty position status and design the appropriate rewards to all actions. Based on the new action space and reward functions, a novel approach named Transaction-aware Inverse Reinforcement Learning (TAIRL) is proposed. TAIRL rewards all trading actions for avoiding the reward bias and dilemma. TAIRL is evaluated by backtesting on 12 stocks of US, UK and China stock markets, and compared against other state-of-art RL methods and moving average trading methods. The experimental results show that the agent of TAIRL achieves the state-of-art performance in profitability and anti-risk ability.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request

References

  1. Mariani MC, Florescu I (2019) Quantitative Finance. John Wiley & Sons, London

    Book  Google Scholar 

  2. Sutton RS (2020) Sutton & barto book: reinforcement learning: an introduction. In: A Bradford Book. MIT Press Cambridge, MA, London

  3. Liu X-Y, Yang H, Gao J, Wang CD (2021) Finrl: deep reinforcement learning framework to automate trading in quantitative finance. In: Proceedings of the second ACM international conference on AI in finance, pp 1–9

  4. Li Z, Liu X-Y, Zheng J, Wang Z, Walid A, Guo J (2021) Finrl-podracer: high performance and scalable deep reinforcement learning for quantitative finance. In: Proceedings of the second ACM international conference on AI in finance, pp 1–9

  5. Wu X, Chen H, Wang J, Troiano L, Loia V, Fujita H (2020) Adaptive stock trading strategies with deep reinforcement learning methods. Information Sciences 538:142–158. https://doi.org/10.1016/j.ins.2020.05.066

    Article  MathSciNet  Google Scholar 

  6. Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47:253–279

    Article  Google Scholar 

  7. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  8. Chou P-W, Maturana D, Scherer S (2017) Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: International conference on machine learning, PMLR pp 834–843

  9. Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence

  10. Niroui F, Zhang K, Kashino Z, Nejat G (2019) Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments. IEEE Robotics and Automation Letters 4(2):610–617

    Article  Google Scholar 

  11. Marchesini E, Farinelli A (2022) Enhancing deep reinforcement learning approaches for multi-robot navigation via single-robot evolutionary policy search. In: 2022 international conference on robotics and automation (ICRA), IEEE pp 5525–5531

  12. Nguyen H, La H (2019) Review of deep reinforcement learning for robot manipulation. In: 2019 third IEEE international conference on robotic computing (IRC), pp 590–595. https://doi.org/10.1109/IRC.2019.00120

  13. Ng AY, Russell S et al (2000) Algorithms for inverse reinforcement learning. In: ICML, vol 1, p 2

  14. Alpaydin E (2020) Introduction to Machine Learning. MIT press, US

    MATH  Google Scholar 

  15. Yang SY, Yu Y, Almahdi S (2018) An investor sentiment reward-based trading system using gaussian inverse reinforcement learning algorithm. Expert Systems with Applications 114:388–401

    Article  Google Scholar 

  16. Zhang W, Zhang N, Yan J, Li G, Yang X (2022) Auto uning of price prediction models for high-frequency trading via reinforcement learning. Pattern Recogn 125:108543

    Article  Google Scholar 

  17. Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 Aaai fall symposium series

  18. Wang Y, He H, Tan X (2020) Truly proximal policy optimization. In: Uncertainty in artificial intelligence, PMLR pp 113– 122

  19. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor, PMLR 1861–1870

  20. Edwards RD, Magee J, Bassetti WC (2018) Technical Analysis of Stock Trends. CRC Press, UK

    Book  Google Scholar 

  21. Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning, PMLR pp 1995–2003

  22. Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

  23. Chung J (2013) Playing atari with deep reinforcement learning. Comput Ence 21:351–362

    Google Scholar 

  24. Ziebart BD, Maas AL, Bagnell JA, Dey AK et al (2008) Maximum entropy inverse reinforcement learning. Aaai, vol 8. IL, USA, Chicago, pp 1433–1438

    Google Scholar 

  25. Hadfield-Menell D, Russell SJ, Abbeel P, Dragan A (2016) Cooperative inverse reinforcement learning. Advances in Neural Information Processing Systems 29

  26. Herman M, Fischer V, Gindele T, Burgard W (2015) Inverse reinforcement learning of behavioral models for online-adapting navigation strategies. In: 2015 IEEE international conference on robotics and automation (ICRA), IEEE pp 3215–3222

  27. Zhifei S, Joo EM (2012) A review of inverse reinforcement learning theory and recent advances. In: 2012 IEEE congress on evolutionary computation, IEEE pp 1–8

  28. Audiffren J, Valko M, Lazaric A, Ghavamzadeh M (2015) Maximum entropy semi-supervised inverse reinforcement learning. In: Twenty- fourth international joint conference on artificial intelligence

  29. Zhifei S, Joo EM (2012) A survey of inverse reinforcement learning techniques. International Journal of Intelligent Computing and Cybernetics

  30. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, PMLR pp 1928–1937

  31. Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trustregion method for deep reinforcement learning using kronecker-factored approximation. Advances in Neural Information Processing Systems 30

  32. Ho J, Ermon S (2016) Generative adversarial imitation learning. Advances in Neural Information Processing Systems 29

  33. Yang SY, Qiao Q, Beling PA, Scherer WT, Kirilenko AA (2015) Gaussian process-based algorithmic trading strategy identification. Quantitative Finance 15(10):1683–1703

    Article  MathSciNet  MATH  Google Scholar 

  34. Baiynd A-M (2011) The trading book: a complete solution to mastering technical systems and trading psychology. McGraw Hill Professional, NewYork

  35. Liu Y, Liu Q, Zhao H, Pan Z, Liu C (2020) Adaptive quantitative trading: An imitative deep reinforcement learning approach. Proceedings of the AAAI Conference on Artificial Intelligence 34:2128–2135

    Article  Google Scholar 

  36. Finn C, Levine S, Abbeel P (2016) Guided cost learning: Deep inverse optimal control via policy optimization. In: International conference on machine learning, PMLR pp 49–58

  37. Nielsen A (2019) Practical Time Series Analysis: Prediction with Statistics and Machine Learning. O’ Reilly Media, London

    Google Scholar 

  38. Pascanu R, Çaglar G, Cho K, Bengio Y (2014) How to construct deep recurrent neural networks. CoRR arXiv:1312.6026

  39. Li Y, Zheng W, Zheng Z (2019) Deep robust reinforcement learning for practical algorithmic trading. IEEE Access 7:108014–108022

    Article  Google Scholar 

  40. Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning, p 1

Download references

Acknowledgements

This research was funded by the Research Committee of University of Macau File number (MYRG2022-00162-FST and MYRG2019-00136-FST)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yain-Whar Si.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Q., Gong, X. & Si, YW. Transaction-aware inverse reinforcement learning for trading in stock markets. Appl Intell 53, 28186–28206 (2023). https://doi.org/10.1007/s10489-023-04959-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04959-w

Keywords

Navigation