Abstract
Training automated trading agents is a long-standing topic that has been widely discussed in artificial intelligence for the quantitative finance. Reinforcement learning (RL) is designed to solve the sequential decision-making tasks, like the stock trading. The output of the RL is the policy which can be presented as the probability values of the possible actions based on a given state. The policy is optimized by the reward function. However, even if the profit is considered as the natural reward function, a trading agent equipped with an RL model has several serious problems. Specifically, profit is only obtained after executing sell action, different profits exist at the same time step due to the varying-length transactions and the hold action deals with two opposite states, empty or nonempty position. To alleviate these shortcomings, in this paper, we introduce a new trading action called wait for the empty position status and design the appropriate rewards to all actions. Based on the new action space and reward functions, a novel approach named Transaction-aware Inverse Reinforcement Learning (TAIRL) is proposed. TAIRL rewards all trading actions for avoiding the reward bias and dilemma. TAIRL is evaluated by backtesting on 12 stocks of US, UK and China stock markets, and compared against other state-of-art RL methods and moving average trading methods. The experimental results show that the agent of TAIRL achieves the state-of-art performance in profitability and anti-risk ability.
Graphical abstract
![](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Figb_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04959-w/MediaObjects/10489_2023_4959_Fig16_HTML.png)
Similar content being viewed by others
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request
References
Mariani MC, Florescu I (2019) Quantitative Finance. John Wiley & Sons, London
Sutton RS (2020) Sutton & barto book: reinforcement learning: an introduction. In: A Bradford Book. MIT Press Cambridge, MA, London
Liu X-Y, Yang H, Gao J, Wang CD (2021) Finrl: deep reinforcement learning framework to automate trading in quantitative finance. In: Proceedings of the second ACM international conference on AI in finance, pp 1–9
Li Z, Liu X-Y, Zheng J, Wang Z, Walid A, Guo J (2021) Finrl-podracer: high performance and scalable deep reinforcement learning for quantitative finance. In: Proceedings of the second ACM international conference on AI in finance, pp 1–9
Wu X, Chen H, Wang J, Troiano L, Loia V, Fujita H (2020) Adaptive stock trading strategies with deep reinforcement learning methods. Information Sciences 538:142–158. https://doi.org/10.1016/j.ins.2020.05.066
Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47:253–279
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Chou P-W, Maturana D, Scherer S (2017) Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: International conference on machine learning, PMLR pp 834–843
Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Niroui F, Zhang K, Kashino Z, Nejat G (2019) Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments. IEEE Robotics and Automation Letters 4(2):610–617
Marchesini E, Farinelli A (2022) Enhancing deep reinforcement learning approaches for multi-robot navigation via single-robot evolutionary policy search. In: 2022 international conference on robotics and automation (ICRA), IEEE pp 5525–5531
Nguyen H, La H (2019) Review of deep reinforcement learning for robot manipulation. In: 2019 third IEEE international conference on robotic computing (IRC), pp 590–595. https://doi.org/10.1109/IRC.2019.00120
Ng AY, Russell S et al (2000) Algorithms for inverse reinforcement learning. In: ICML, vol 1, p 2
Alpaydin E (2020) Introduction to Machine Learning. MIT press, US
Yang SY, Yu Y, Almahdi S (2018) An investor sentiment reward-based trading system using gaussian inverse reinforcement learning algorithm. Expert Systems with Applications 114:388–401
Zhang W, Zhang N, Yan J, Li G, Yang X (2022) Auto uning of price prediction models for high-frequency trading via reinforcement learning. Pattern Recogn 125:108543
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 Aaai fall symposium series
Wang Y, He H, Tan X (2020) Truly proximal policy optimization. In: Uncertainty in artificial intelligence, PMLR pp 113– 122
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor, PMLR 1861–1870
Edwards RD, Magee J, Bassetti WC (2018) Technical Analysis of Stock Trends. CRC Press, UK
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning, PMLR pp 1995–2003
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Chung J (2013) Playing atari with deep reinforcement learning. Comput Ence 21:351–362
Ziebart BD, Maas AL, Bagnell JA, Dey AK et al (2008) Maximum entropy inverse reinforcement learning. Aaai, vol 8. IL, USA, Chicago, pp 1433–1438
Hadfield-Menell D, Russell SJ, Abbeel P, Dragan A (2016) Cooperative inverse reinforcement learning. Advances in Neural Information Processing Systems 29
Herman M, Fischer V, Gindele T, Burgard W (2015) Inverse reinforcement learning of behavioral models for online-adapting navigation strategies. In: 2015 IEEE international conference on robotics and automation (ICRA), IEEE pp 3215–3222
Zhifei S, Joo EM (2012) A review of inverse reinforcement learning theory and recent advances. In: 2012 IEEE congress on evolutionary computation, IEEE pp 1–8
Audiffren J, Valko M, Lazaric A, Ghavamzadeh M (2015) Maximum entropy semi-supervised inverse reinforcement learning. In: Twenty- fourth international joint conference on artificial intelligence
Zhifei S, Joo EM (2012) A survey of inverse reinforcement learning techniques. International Journal of Intelligent Computing and Cybernetics
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, PMLR pp 1928–1937
Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trustregion method for deep reinforcement learning using kronecker-factored approximation. Advances in Neural Information Processing Systems 30
Ho J, Ermon S (2016) Generative adversarial imitation learning. Advances in Neural Information Processing Systems 29
Yang SY, Qiao Q, Beling PA, Scherer WT, Kirilenko AA (2015) Gaussian process-based algorithmic trading strategy identification. Quantitative Finance 15(10):1683–1703
Baiynd A-M (2011) The trading book: a complete solution to mastering technical systems and trading psychology. McGraw Hill Professional, NewYork
Liu Y, Liu Q, Zhao H, Pan Z, Liu C (2020) Adaptive quantitative trading: An imitative deep reinforcement learning approach. Proceedings of the AAAI Conference on Artificial Intelligence 34:2128–2135
Finn C, Levine S, Abbeel P (2016) Guided cost learning: Deep inverse optimal control via policy optimization. In: International conference on machine learning, PMLR pp 49–58
Nielsen A (2019) Practical Time Series Analysis: Prediction with Statistics and Machine Learning. O’ Reilly Media, London
Pascanu R, Çaglar G, Cho K, Bengio Y (2014) How to construct deep recurrent neural networks. CoRR arXiv:1312.6026
Li Y, Zheng W, Zheng Z (2019) Deep robust reinforcement learning for practical algorithmic trading. IEEE Access 7:108014–108022
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning, p 1
Acknowledgements
This research was funded by the Research Committee of University of Macau File number (MYRG2022-00162-FST and MYRG2019-00136-FST)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, Q., Gong, X. & Si, YW. Transaction-aware inverse reinforcement learning for trading in stock markets. Appl Intell 53, 28186–28206 (2023). https://doi.org/10.1007/s10489-023-04959-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04959-w