Skip to main content
Log in

Dynamic datasets and market environments for financial reinforcement learning

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

The financial market is a particularly challenging playground for deep reinforcement learning due to its unique feature of dynamic datasets. Building high-quality market environments for training financial reinforcement learning (FinRL) agents is difficult due to major factors such as the low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting. In this paper, we present an updated version of FinRL-Meta, a data-centric and openly accessible library that processes dynamic datasets from real-world markets into gym-style market environments and has been actively maintained by the AI4Finance community. First, following a DataOps paradigm, we provide hundreds of market environments through an automatic data curation pipeline. Second, we provide homegrown examples and reproduce popular research papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance via community-wise competitions. Third, we provide dozens of Jupyter/Python demos organized into a curriculum and a documentation website to serve the rapidly growing community. The codes are available at https://github.com/AI4Finance-Foundation/FinRL-Meta

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Algorithm 1
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Availability of data and materials

No data is directly held by us. We release data processing codes in https://github.com/AI4Finance-Foundation/FinRL-Meta

Code availability

FinRL-Meta’s code is open-sourced at: https://github.com/AI4Finance-Foundation/FinRL-Meta with MIT License.

Notes

  1. Find a (technology) edge and position to win.

  2. The Four V’s of Big Data: https://opensistemas.com/en/the-four-vs-of-big-data/

  3. Note that “data-driven” and “data-centric” are two distinct concepts. The former refers to utilizing data to guide policy training, whereas the latter means placing data quality in the central role in FinRL development. The endeavors of “data-driven” and “data-centric” approaches complement each other in their efforts to enhance overall policy performance.

  4. MLOps is an ML engineering culture and practice that aims at unifying ML system development (Dev) and ML system operation (Ops).

  5. There is information leakage.

  6. Github repo: https://github.com/jealous/stockstats

  7. Github repo: https://github.com/mrjbq7/ta-lib

  8. Github repo: https://github.com/nltk/nltk

  9. https://github.com/AI4Finance-Foundation/FinRL-Tutorials.

  10. Web page of Alpaca: https://alpaca.markets/

  11. Website: https://wandb.ai/site

  12. OpenAI SpinningUp: https://spinningup.openai.com/en/latest/spinningup/rl_intro.html

  13. Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wal- lach, Hal Daumé Iii, and Kate Crawford. Datasheets for datasets. Communications of the ACM, 64(12):86-92, 2021.

References

  • Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In ACM SIGKDD international conference on knowledge discovery & data mining.

  • Alla, S., & Adari, S. K. (2021). What is MLOps? In: Beginning MLOps with MLFlow (pp. 79–124).

  • Amrouni, S., Moulin, A., Vann, J., Vyetrenko, S., Balch, T., & Veloso, M. (2021). ABIDES-Gym: Gym environments for multi-agent discrete event simulation and application to financial markets. In ACM International conference on AI in finance (ICAIF).

  • Ang, A. (2012). Mean-variance investing. Columbia Business School Research Paper No. 12/49.

  • Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.

  • Ardon, L., Vadori, N., Spooner, T., Xu, M., Vann, J., & Ganesh, S. (2021). Towards a fully RL-based market simulator. In ACM international conference on AI in finance (ICAIF).

  • Atwal, H. (2019). Practical DataOps: Delivering agile data science at scale.

  • Bao, W., & Liu, X.-Y. (2019). Multi-agent deep reinforcement learning for liquidation strategy analysis. In ICML workshop on applications and infrastructure for multi-agent learning.

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym. arXiv preprint arXiv:1606.01540.

  • Brown, S. J., Goetzmann, W., Ibbotson, R. G., & Ross, S. A. (1992). Survivorship bias in performance studies. The Review of Financial Studies, 5(4), 553–580.

    Article  Google Scholar 

  • Byrd, D., & Polychroniadou, A. (2020). Differentially private secure multi-party computation for federated learning in financial applications. In Proceedings of the first ACM international conference on AI in finance (pp. 1–9).

  • Chen, Q., & Liu, X.-Y. (2020) Quantifying ESG alpha using scholar big data: An automated machine learning approach. In Proceedings of the first ACM international conference on AI in finance (pp. 1–8).

  • Chen, C.-C., Huang, H.-H., & Chen, H.-H. (2018). Ntusd-fin: A market sentiment dictionary for financial social media data applications. In Proceedings of the 1st financial narrative processing workshop (FNP 2018) (pp. 37–43).

  • Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems30.

  • Coletta, A., Prata, M., Conti, M., Mercanti, E., Bartolini, N., Moulin, A., Vyetrenko, S., & Balch, T. (2021). Towards realistic market simulations: A generative adversarial networks approach. In ACM international conference on AI in finance (ICAIF).

  • De Prado, M. L. (2018). Advances in financial machine learning.

  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (pp. 248–255).

  • Dulac-Arnold, G., Mankowitz, D., & Hester, T. (2019). Challenges of real-world reinforcement learning. In ICML workshop on reinforcement learning for real life.

  • Dulac-Arnold, G., Levine, N., Mankowitz, D. J., Li, J., Paduraru, C., Gowal, S., & Hester, T. (2021). Challenges of real-world reinforcement learning: Definitions, benchmarks and analysis. Machine Learning, 110(9), 2419–2468.

    Article  MathSciNet  Google Scholar 

  • Ereth, J. (2018). DataOps: Towards a definition. LWDA, 2191, 104–112.

    Google Scholar 

  • Fang, Y., Liu, X.-Y., & Yang, H. (2019). Practical machine learning approach to capture the scholar data driven Alpha in AI industry. In IEEE international conference on big data (big data) (pp. 2230–2239). IEEE.

  • Fu, J., Kumar, A., Nachum, O., Tucker, G., & Levine, S. (2020). D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219.

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems27.

  • Gort, B., Liu, X.-Y., Sun, X., Gao, J., Chen, S., & Wang, C. D. (2023). Deep reinforcement learning for cryptocurrency trading: Practical approach to address backtest overfitting. AAAI: AI in Finance Bridge.

  • Guan, M., & Liu, X.-Y. (2021). Explainable deep reinforcement learning for portfolio management: An empirical approach. In ACM international conference on AI in finance (ICAIF).

  • Gupta, A., Savarese, S., Ganguli, S., & Fei-Fei, L. (2021). Embodied intelligence via learning and evolution. Nature Communications.

  • Hambly, B., Xu, R., & Yang, H. (2023). Recent advances in reinforcement learning in finance. Mathematical Finance.

  • Hamilton, W. L., Clark, K., Leskovec, J., & Jurafsky, D. (2016). Inducing domain-specific sentiment lexicons from unlabeled corpora. In Proceedings of the conference on empirical methods in natural language processing. conference on empirical methods in natural language processing, vol. 2016 (p. 595). NIH Public Access.

  • Han, J., Xia, Z., Liu, X.-Y., Zhang, C., Wang, Z., & Guo, J. (2023). Massively parallel market simulator for financial reinforcement learning. AI in Finance Bridge, AAAI.

  • Hein, D., Depeweg, S., Tokic, M., Udluft, S., Hentschel, A., Runkler, T.A., & Sterzing, V. (2017). A benchmark environment motivated by industrial control problems. In IEEE symposium series on computational intelligence (SSCI) (pp. 1–8). IEEE.

  • Hutto, C., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media, vol. 8 (pp. 216–225).

  • Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cormode, G., & Cummings R. (2021). Advances and open problems in federated learning. Foundations and trends® in machine learning 14(1–2), 1–210.

  • Kritzman, M., & Li, Y. (2010). Skulls, financial turbulence, and risk management. Financial Analysts Journal, 66(5), 30–41.

    Article  Google Scholar 

  • Levine, S., Kumar, A., Tucker, G., & Fu, J. (2020). Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643.

  • Li, X., Li, Y., Yang, H., Yang, L., & Liu, X.-Y. (2019). DP-LSTM: Differential privacy-inspired LSTM for stock prediction using financial news. In 33rd conference on neural information processing systems workshop on robust AI in financial services: Data, fairness, explainability, trustworthiness, and privacy, December 2019.

  • Li, Z., Liu, X.-Y., Zheng, J., Wang, Z., Walid, A., & Guo, J. (2021). FinRL-Podracer: High-performance and scalable deep reinforcement learning for quantitative finance. In ACM international conference on AI in finance (ICAIF).

  • Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., Gonzalez, J., Jordan, M., & Stoica, I. (2018). RLlib: Abstractions for distributed reinforcement learning. In International conference on machine learning (pp. 3053–3062). PMLR.

  • Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J. E., & Stoica, I. (2018). Tune: A research platform for distributed model selection and training. In ICML AutoML workshop.

  • Lillicrap, T., Hunt, J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. In International conference on learning representations (ICLR).

  • Liu, X.-Y., Li, Z., Wang, Z., & Zheng, J. (2021). ElegantRL: A lightweight and stable deep reinforcement learning library. GitHub.

  • Liu, X.-Y., Li, Z., Yang, Z., Zheng, J., Wang, Z., Walid, A., Guo, J., & Jordan, M. (2021). ElegantRL-Podracer: Scalable and elastic library for cloud-native deep reinforcement learning. In Deep reinforcement learning workshop at NeurIPS.

  • Liu, Y., Liu, Q., Zhao, H., Pan, Z., & Liu, C. (2020). Adaptive quantitative trading: An imitative deep reinforcement learning approach. In Proceedings of the AAAI conference on artificial intelligence, vol. 34 (pp. 2128–2135).

  • Liu, X.-Y., Rui, J., Gao, J., Yang, L., Yang, H., Wang, Z., Wang, C. D., & Jian, G. (2021). FinRL-Meta: Data-driven deep reinforcementlearning in quantitative finance. NeurIPS: Data-Centric AI Workshop.

  • Liu, X.-Y., Xia, Z., Rui, J., Gao, J., Yang, H., Zhu, M., Wang, C. D., Wang, Z., & Guo, J. FinRL-Meta: Market environments and benchmarks for data-driven financial reinforcement learning. In Thirty-sixth conference on neural information processing systems.

  • Liu, X.-Y., Xiong, Z., Zhong, S., Yang, H., & Walid, A. (2018). Practical deep reinforcement learning approach for stock trading. NeurIPS: Workshop on Challenges and Opportunities for AI in Financial Services.

  • Liu, X.-Y., Yang, H., Chen, Q., Zhang, R., Yang, L., Xiao, B., & Wang, C. D. (2020). FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance. NeurIPS: Deep RL Workshop.

  • Liu, X.-Y., Yang, H., Gao, J., & Wang, C. D. (2021). FinRL: Deep reinforcement learning framework to automate trading in quantitative finance. In ACM international conference on AI in finance (ICAIF)

  • Liu, Y., Fan, T., Chen, T., Xu, Q., & Yang, Q. (2021). Fate: An industrial grade platform for collaborative learning with data protection. Journal of Machine Learning Research, 22(226), 1–6.

    MathSciNet  Google Scholar 

  • Loria, S. (2018). textblob documentation. Release 0.15 2(8).

  • Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66(1), 35–65.

    Article  Google Scholar 

  • Lussange, J., Lazarevich, I., Bourgeois-Gironde, S., Palminteri, S., & Gutkin, B. (2021). Modelling stock markets by multi-agent reinforcement learning. Computational Economics, 57(1), 113–147.

    Article  Google Scholar 

  • Mahfouz, M., Gopalakrishnan, S., Suau, M., Patra, S., Mandic, P. D., Magazzeni, D., & Veloso, M. (2023). Towards asset allocation using behavioural cloning and reinforcement learning. AAAI AI for Financial Services Bridge.

  • Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., Hoeller, D., Rudin, N., Allshire, A., Handa, A., & State, G. (2021). Isaac Gym: High performance GPU-based physics simulation for robot learning. NeurIPS: Datasets and Benchmarks Track.

  • Malkiel, B. G. (2003). Passive investment strategies and efficient markets. European Financial Management, 9(1), 1–10.

    Article  Google Scholar 

  • Mamon, R. S., & Elliott, R. J. (2007). Hidden Markov models in finance vol. 4.

  • Mazumder, M., Banbury, C., Yao, X., Karlaš, B., Rojas, W. G., Diamos, S., Diamos, G., He, L., Kiela, D., & Jurado, D. et al. (2022). Dataperf: Benchmarks for data-centric AI development. arXiv preprint arXiv:2207.10062.

  • Miller, G. A. (1998). WordNet: An electronic lexical database.

  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518, 529–33.

    Article  Google Scholar 

  • Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E. B., & Turaga, D. S. (2017). Learning feature engineering for classification. In IJCAI, vol. 17 (pp. 2529–2535).

  • Nuti, G., Mirghaemi, M., Treleaven, P., & Yingsaeree, C. (2011). Algorithmic trading. Computer, 44, 61–69.

    Article  Google Scholar 

  • OpenAI: GPT-4 technical report. https://arxiv.org/abs/2303.08774 (2023).

  • Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.

    Google Scholar 

  • Polyzotis, N., & Zaharia, M. (2021). What can data-centric AI learn from data and ML engineering? arXiv preprint arXiv:2112.06439.

  • Pricope, T.-V. (2021). Deep reinforcement learning in quantitative algorithmic trading: A review. arXiv preprint arXiv:2106.00123.

  • Qin, R., Gao, S., Zhang, X., Xu, Z., Huang, S., Li, Z., Zhang, W., & Yu, Y. (2022). NeoRL: A near real-world benchmark for offline reinforcement learning. NeurIPS Datasets and Benchmarks.

  • Raberto, M., Cincotti, S., Focardi, S. M., & Marchesi, M. (2001). Agent-based simulation of a financial market. Physica A: Statistical Mechanics and its Applications, 299(1–2), 319–327.

    Article  Google Scholar 

  • Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research.

  • Rundo, F. (2019). Deep LSTM with reinforcement learning layer for financial trend prediction in fx high frequency trading systems. Applied Sciences, 9(20), 4460.

    Article  Google Scholar 

  • Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., & Aroyo, L. M. (2021). “Everyone wants to do the model work, not the data work”: Data cascades in high-stakes AI. In Proceedings of the 2021 CHI conference on human factors in computing systems (pp. 1–15).

  • Scholl, M. P., Calinescu, A., & Farmer, J. D. (2021). How market ecology explains market malfunction. Proceedings of the National Academy of Sciences118(26)

  • Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347.

  • Sharpe, W. F. (1994). The sharpe ratio. Journal of Portfolio Management.

  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., & Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature529(7587), 484–489.

  • Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354–359.

    Article  Google Scholar 

  • Strapparava, C., & Mihalcea, R. (2007). Semeval-2007 task 14: Affective text. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 70–74).

  • Sutton, R. S. (2022). The quest for a common model of the intelligent decision maker. arXiv preprint arXiv:2202.13252.

  • Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction.

  • Tai, Y.-J., & Kao, H.-Y. (2013). Automatic domain-specific sentiment lexicon generation with label propagation. In Proceedings of international conference on information integration and web-based applications & services (pp. 53–62).

  • Team, O. E. L., Stooke, A., Mahajan, A., Barros, C., Deck, C., Bauer, J., Sygnowski, J., Trebacz, M., Jaderberg, M., & Mathieu, M. et al. (2021). Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808.

  • Todorov, E., Erez, T., & Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In IEEE/RSJ international conference on intelligent robots and systems (pp. 5026–5033). IEEE.

  • Treleaven, P., Galas, M., & Lalchand, V. (2013). Algorithmic trading review. Communications of the ACM, 56, 76–85.

    Article  Google Scholar 

  • Vázquez-Canteli, J. R., Kämpf, J., Henze, G., & Nagy, Z. (2019). CityLearn v1.0: An OpenAI gym environment for demand response with deep reinforcement learning. In ACM international conference on systems for energy-efficient buildings, cities, and transportation.

  • Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., et al. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350–354.

    Article  Google Scholar 

  • Whaley, R. E. (2009). Understanding the VIX. The Journal of Portfolio Management, 35(3), 98–105.

    Article  Google Scholar 

  • Whang, S. E., Roh, Y., Song, H., & Lee, J.-G. (2023). Data collection and quality challenges in deep learning: A data-centric AI perspective. The VLDB Journal 1–23.

  • Wilkman, M. (2020). Feasibility of a reinforcement learning based stock trader. Aaltodoc.

  • Xiao, G., Li, J., Chen, Y., & Li, K. (2020). Malfcs: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks. Elsevier Journal of Parallel and Distributed Computing, 141, 49–58.

    Article  Google Scholar 

  • Xing, F. Z., Cambria, E., & Welsch, R. E. (2018). Natural language based financial forecasting: A survey. Artificial Intelligence Review, 50(1), 49–73.

    Article  Google Scholar 

  • Yang, H., Liu, X.-Y., Zhong, S., & Walid, A. (2020). Deep reinforcement learning for automated stock trading: An ensemble strategy. In ACM International Conference on AI in Finance.

  • Zha, D., Bhat, Z. P., Lai, K.-H., Yang, F., & Hu, X. (2023). Data-centric AI: Perspectives and challenges. arXiv preprint arXiv:2301.04819.

  • Zha, D., Bhat, Z. P., Lai, K.-H., Yang, F., Jiang, Z., Zhong, S., & Hu, X. (2023). Data-centric artificial intelligence: A survey. arXiv preprint arXiv:2303.10158.

  • Zhang, Z., Zohren, S., & Roberts, S. (2020). Deep reinforcement learning for trading. The Journal of Financial Data Science, 2(2), 25–40.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Jingyang Rui for participating early design and development of FinRL-Meta and his contribution of our previous conference version in NeurIPS 22. He didn’t participate in this version because of employment constraint. We thank Mr. Tao Liu (IDEA Research, International Digital Economy Academy) for technical support of computing platform on this research project. Ming Zhu was supported by National Natural Science Foundations of China (Grant No. 61902387). Christina Dan Wang is Assistant Professor, Shanghai Frontiers Science Center of Artificial Intelligence and Deep Learning, NYU Shanghai; Business Division, NYU Shanghai, Shanghai China 200122; and Supported in part by National Natural ScienceFoundation of China (NNSFC) grant 12271363. Disclaimer: Nothing in this paper and the FinRL-Meta repository is financial advice, and NOT a recommendation to trade real money. Please use common sense and always first consult a professional before trading or investing.

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

X-Y: Designs and leads the development of the whole FinRL-Meta framework. ZX: Contributes to the development of FinRL-Meta, MDP modeling, stock trading, trading in real time, and tutorials. HY: Contributes to related work, stock trading, ensemble strategy, Financial Sentiment Analysis. Jiechao Gao: Contributes to FinRL-Meta’s mathematical modeling, revises whole paper. DZ: Contributes to the data curation pipeline, data centric idea. MZ: Contributes to maintaining the open-source repo of FinRL-Meta on GitHub, and proofreading. CDW: Supervises applications in FinRL-Meta and financial sentiment analysis. ZW: Supervises reinforcement learning, MDP modeling, and DRL algorithms. JG: Provides computing resources, and helps proofreading.

Corresponding author

Correspondence to Xiao-Yang Liu.

Ethics declarations

Conflict of interest

The authors are from universities and research labs. No competing interests.

Ethics approval

We do not contain ethics issues.

Consent to participate

FinRL-Meta uses MIT License. We, all authors, welcome any person participate in our project and join our open-source community.

Consent for publication

We, all authors, consent publication of everything mentioned in the paper.

Additional information

Editors: Emma Brunskill, Minmin Chen, Omer Gottesman, Lihong Li, Yuxi Li, Yao Liu, Zonging Lu, Niranjani Prasad, Zhiwei Qin, Csaba Szepesvari, Matthew Taylor

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Terminology

Table 6 List of key terms for reinforcement learning
Table 7 List of key terms for finance

We provide a list of key terms for reinforcement learning and finance in Table 6 and Table 7. For terminologies of reinforcement learning, interested users can refer to Sutton (2022) or the classic textbook (Sutton & Barto, 2018). Also, the webpageFootnote 12 explains key concepts of RL. For terminologies of finance, interested users can refer to De Prado (2018).

Appendix 2: Dataset documentation and usages

We organize the dataset documentation according to the suggested template of datasheets for datasetsFootnote 13.

1.1 2.1: Motivation

  • For what purpose was the dataset created? As data is refreshing minute-to-millisecond, finance is a particularly difficult playground for deep reinforcement learning. In academia, scholars use financial big data to obtain more complex and precise understanding of markets and economics. While industries use financial big data to refine their analytical strategies and strengthen their prediction models. To serve the rapidly growing AI4Finance community, we create FinRL-Meta that provides data access from different sources, pre-processes the raw data with different features, and builds the data to RL environments. We aim to provide dynamic RL environments that are manageable by users. We aim to build a financial metaverse, a universe of near real-market environments, as a playground for data-driven financial machine learning.

  • Who created the dataset? FinRL-Meta is an open-source project created by the AI4Finance community. Contents of FinRL-Meta are contributed by the authors of this paper and will be maintained by members of the AI4Finance community.

  • Who funded the creation of the dataset? AI4Finance Foundation, a non-profit open-source community that shares AI tools for finance, funded our project.

1.2 2.2: Composition

  • What do the instances that comprise the dataset represent? Instances of FinRL-Meta are volume-price data includes: stocks, securities, cryptocurrencies, etc; and sentiment data from social media, ESG, Google Trends, etc. FinRL-Meta provides hundreds of market environments through an automatic pipeline that collects dynamic datasets from real-world markets and processes them into standard gym-style market environments. FinRL-Meta also benchmarks popular papers as stepping stones for users to design new trading strategies.

  • How many instances are there in total? FinRL-Meta does not store data directly. Instead, we provide codes for a pipeline of data accessing, data cleaning, feature engineering, and building into RL environments. Table 2 provides the supported data sources of FinRL-Meta. At the moment, there are hundreds of market environments, dozens of tutorials and demos, and several benchmarks provided.

  • Does the dataset contain all possible instances or is it a sample of instances from a larger set? With our provided codes, users could fetch data from the data source by properly specifying the starting date, ending date, time granularity, asset set, attributes, etc.

  • What data does each instance consist of? Now there are several types of financial data, as shown in Table 2:

  • Is there a label or target associated with each instance? No. There is not label or preset target for each instance. But users can use our benchmarks are baselines.

  • Is any information missing from individual instances? Yes. In several data sources, there are missing values and we provided standard preprocessing methods.

  • Are relationships between individual instances made explicit? Yes. An instance is a sample set of the market of interest.

  • Are there recommended data splits? We recommend users to follow our training–testing-training pipeline, as shown in Fig. 4. Users can flexibly choose their preferred settings, e.g., in stock trading task, our demo access Yahoo! Finance database and use data from 01/01/2009 to 06/30/2020 for training and data from 07/01/2020 to 05/31/2022 for backtesting.

  • Are there any errors, sources of noise, or redundancies in the dataset? For the raw data fetched from different sources, there are noise and outliers. We provide codes to process the data and built them into standard RL gym environment.

  • Is the dataset self-contained, or does it link to or otherwise rely on external resources? It is linked to external resources. As shown in Table 2, FinRL-Meta fetch data from data sources to build gym environments.

  • Does the dataset contain data that might be considered confidential? No. All our data are from publicly available data sources.

  • Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety? No. All our data are numerical.

1.3 2.3: Collection process

  • How was the data associated with each instance acquired? FinRL-Meta fetches data from data sources. as shown in Table 2.

  • What mechanisms or procedures were used to collect the data? FinRL-Meta provides dynamic market environments that are built according to users’ settings. To achieve this, we provide software APIs to fetch data from different data sources. Note that some data sources require accounts and passwords or have limitations on the number or frequency of requests.

  • If the dataset is a sample from a larger set, what was the sampling strategy? It is dynamic, depending on users’ settings, such as the starting date, ending date, time granularity, asset set, attributes, etc.

  • Who was involved in the data collection process and how were they compensated? Our codes collect publicly available market data, which is free.

  • Over what timeframe was the data collected? It is not applicable because the environments are created dynamically by running the codes to fetch data in real-time.

  • Were any ethical review processes conducted? No?

1.4 2.4: Preprocessing/cleaning/labeling

  • Was any preprocessing/cleaning/labeling of the data done? Yes. For the raw data fetched from different sources, there are noise and outliers. We provide codes to process the data and built them into standard RL gym environment.

  • Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data The raw data are hold by different data sources (data providers).

  • Is the software that was used to preprocess/clean/label the data available? Yes. We use our own codes to do cleaning and preprocessing.

1.5 2.5: Uses

  • Has the dataset been used for any tasks already? Yes. Thousands of AI4Finance community members use FinRL-Meta for learning and research purpose. There are also courses in colleges using FinRL-Meta as material for teaching financial reinforcement learning. Demos and tutorials are mentioned in Sect. 5.

  • Is there a repository that links to any or all papers or systems that use the dataset? 1. Research papers that used FinRL-Meta are listed here: https://github.com/AI4Finance-Foundation/FinRL-Tutorials/blob/master/FinRL_papers.md Our conference version of FinRL-Meta (Liu et al., 2022) appeared in NeurIPS 2022 Datasets and Benchmarks Track. Our workshop version of FinRL-Meta (Liu et al., 2021) appeared in NeurIPS 2021 Workshop on Data-Centric AI. 2. The following three repositories have incorporated FinRL-Meta:

  • What (other) tasks could the dataset be used for? Besides the current tasks (tutorial, demo and benchmarks), FinRL-Meta will be useful for the following tasks:

    • Curriculum learning for agents: Based on FinRL-Meta (a universe of market environments, say \(\ge 100\)), one is able to construct an environment by sampling data samples from multiple market datasets, similar to XLand (Team et al., 2021). In this way, one can apply the curriculum learning method (Team et al., 2021) to train a generally capable agent for several financial tasks.

    • To improve the performance for the large-scale markets, we are exploiting GPU-based massive parallel simulation such as Isaac Gym (Makoviychuk et al., 2021).

    • It will be interesting to explore the evolutionary perspectives (Gupta et al., 2021; Scholl et al., 2021; Li et al., 2021; Liu et al., 2021) to simulate the markets. We believe that FinRL-Meta will provide insights into complex market phenomena and offer guidance for financial regulations.

  • Is there anything about the composition of the dataset or the way it was collected and preprocessed/cleaned/labeled that might impact future uses? We believe that FinRL-Meta will not encounter usage limits. Our data are fetched from different sources in real-time when running the codes. However, there may be one or two out of \(\ge 30\) data sources (in Table 2) change data access rules that may impact future use. So please refer to the rules and accessibility of certain data sources when using.

  • Are there tasks for which the dataset should not be used? No. Since there are no ethical problems for FinRL-Meta, users could use FinRL-Meta in any task as long as it does not violate laws. Disclaimer: Nothing herein is financial advice, and NOT a recommendation to trade real money. Please use common sense and always first consult a professional before trading or investing.

1.6 2.6: Distribution

  • Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created? No. It will always be held on GitHub under MIT license, for educational and research purposes.

  • How will the dataset be distributed? Our codes and existing environments are available on GitHub FinRL-Meta repository https://github.com/AI4Finance-Foundation/FinRL-Meta.

  • When will the dataset be distributed? FinRL-Meta is publicly available since February 14th, 2021.

  • Will the dataset be distributed under a copyright or other intellectual property (IP) license, and/or under applicable terms of use (ToU)? FinRL-Meta is distributed under MIT License, for educational and research purposes.

  • Have any third parties imposed IP-based or other restrictions on the data associated with the instances? No.

  • Do any export controls or other regulatory restrictions apply to the dataset or to individual instances? No. Our data are fetched from different sources in real time. However, there may be one or two out of \(\ge 20\) data sources (in Table 2) change data access rules that may impact future use. So please refer to the rules and accessibility of certain data sources when using.

1.7 2.7: Maintenance

  • Who will be supporting/hosting/maintaining the dataset? FinRL-Meta has been actively maintained by AI4Finance Foundation (including the authors of this paper) which has over 12K members at the moment (Mar. 2023). We are actively updating market environments, to serve the rapidly growing open-source AI4Finance community.

  • How can the owner/curator/manager of the dataset be contacted? We encourage users to join our Slack channel: https://join.slack.com/t/ai4financeworkspace/shared_invite/zt-v670l1jm-dzTgIT9fHZIjjrqprrY0kg or our mailing list: https://groups.google.com/u/1/g/ai4finance,

  • Is there an erratum? Users can use GitHub to report issues/bugs and use Slack channel, Discord channel, or mailing list (AI4Finance_FinRL at https://groups.google.com/u/2/g/ai4finance) to discuss solutions. AI4Finance community is actively improving the codes, say extracting technical indicators, evaluating feature importance, quantifying the probability of model overfitting, etc.

  • Will the dataset be updated? Yes, we are actively updating codes and adding more data sources. Users could get information and the newly updated version through our GitHub repository, or join the mailing list: https://groups.google.com/u/1/g/ai4finance.

  • If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances The data of FinRL-Meta do not relate to people.

  • Will older versions of the dataset continue to be supported/hosted/maintained? Yes. All versions can be found on our GitHub repository.

  • If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? We maintain FinRL-Meta on GitHub. Users can use GitHub to report issues/bugs and use Slack channel or mailing list to discuss solutions. We welcome community members to submit pull requests through GitHub.

  • How does the platform handle ticker name changes due to corporate actions? To our knowledge, changing of ticker names is very rare. Therefore, periodically conducting manual checks and adjustments to the data API may be satisfactory. Nevertheless, there exists the potential for implementing automated processes for handling such changes, such as utilizing web crawling techniques to retrieve updated ticker names from sources like https://stockanalysis.com/stocks/. We plan to explore this possibility in future investigations.

  • What about missing data? We can incorporate a backup data source as a precautionary measure. Oftentimes, data for specific variables, such as stock prices, are available from multiple sources. In the unfortunate event of missing data from the primary source, we can resort to the backup source. We will investigate other possible solutions in our future work.

  • What if data server is down? Given that we have open-sourced our codebase, users have the option to directly retrieve the data using their own server or PC. Additionally, we plan to progressively introduce supplementary data servers to facilitate data downloading.

Appendix 3: MDP setup for market environments

1.1 3.1 Order execution

The order execution task has the following MDP:

  • State \(\mathbf {s_t} = [\mathbf {h_t}, (\mathbf {p_t}, \mathbf {o_t})] \in \mathbb {R}^{1+2*9}\), where \(\mathbf {h_t} \in \mathbb {R}_+\) denotes the remaining holds that haven’t been executed as orders, \((\mathbf {p_t}, \mathbf {o_t}) \in \mathbb {R}^2*9_+\) denotes the current limit order book at time t.

  • Action \(\mathbf {a_t}=[\mathbf {ap_t}, \mathbf {ah_t}] \in \mathbb {R}^2_+\), which denotes the agent would place \(\mathbf {ah_t}\) number of shares to the market order with price \(\mathbf {ap_t}\).

  • Reward \(r(\mathbf {s_t}, \mathbf {a_t}, \mathbf {s_{t+1}}) \in \mathbb {R}\). In this order execution task, the reward function is set to be the excess return of the agent comparing to the Time-weighted average price (TWAP).

1.2 3.2 Paper trading

Paper trading task is the variance of stock trading that trading in real time. It has a similar MDP setup:

  • State \({\varvec{s_t}}=[b_t,{\varvec{p_t}},{\varvec{f_t}},{\varvec{h_t}}] \in \mathbb {R}^{30(I+2)+1}\), where scalar \(b_t\in \mathbb {R}_+\) is the remaining balance in the account, \({\varvec{p_t}}\in \mathbb {R}_+^{30}\) is the prices of 30 stocks, \({\varvec{f_t}}\in \mathbb {R}^{30\cdot I}\) is a feature vector and each stock has I technical indicators, and \({\varvec{h_t}}\in \mathbb {R}_+^{30}\) denotes the share holdings, where \(\mathbb {R}_+\) is the set of non-negative real numbers.

  • Action \({\varvec{a_t}} \in \mathbb {R}^{30}\) denotes the trading operations on the 30 stocks, i.e., \({\varvec{h_{t+1}}}={\varvec{h_t}} + {\varvec{a_t}}\). When an entry \({\varvec{a}}_t^i > 0, i=1,..., 30\), it means a buy-in of \({\varvec{a}}_t^i\) shares on the i-th stock, negative action \({\varvec{a}}_t^i < 0\) for selling, and zero action \({\varvec{a}}_t^i = 0\) keeps \({\varvec{h}}_t^i\) unchanged.

  • Reward function \(R({\varvec{s_t}}, {\varvec{a_t}}, {\varvec{s_{t+1}}})\in \mathbb {R}\): In this paper trading task, the reward function is set to be the change of total asset values, i.e., \(R({\varvec{s_t}}, {\varvec{a_t}}, {\varvec{s_{t+1}}})=v_{t+1}-v_t\), where \(v_t\) and \(v_{t+1}\) are the total asset values at state \({\varvec{s_t}}\) and \({\varvec{s_{t+1}}}\), respectively, i.e., \(v_t={\varvec{p_t}}^{\intercal }{\varvec{h_t}}+b_t\in \mathbb {R}\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, XY., Xia, Z., Yang, H. et al. Dynamic datasets and market environments for financial reinforcement learning. Mach Learn 113, 2795–2839 (2024). https://doi.org/10.1007/s10994-023-06511-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10994-023-06511-w

Keywords

Navigation