Modelling Stock Markets by Multi-agent Reinforcement Learning

Abstract

Quantitative finance has had a long tradition of a bottom-up approach to complex systems inference via multi-agent systems (MAS). These statistical tools are based on modelling agents trading via a centralised order book, in order to emulate complex and diverse market phenomena. These past financial models have all relied on so-called zero-intelligence agents, so that the crucial issues of agent information and learning, central to price formation and hence to all market activity, could not be properly assessed. In order to address this, we designed a next-generation MAS stock market simulator, in which each agent learns to trade autonomously via reinforcement learning. We calibrate the model to real market data from the London Stock Exchange over the years 2007 to 2018, and show that it can faithfully reproduce key market microstructure metrics, such as various price autocorrelation scalars over multiple time intervals. Agent learning thus enables accurate emulation of the market microstructure as an emergent property of the MAS.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Notes

  1. 1.

    Computations were performed on a Mac Pro with 3,5 GHz 6-Core Intel Xeon E5 processor, and 16 GB 1866 MHz DDR memory.

  2. 2.

    We used the time series feature extraction functions implemented in the tsfresh Python package (Christ et al. 2018).

  3. 3.

    We used the implementation from the scikit-learn Python package (Pedregosa et al. 2011), with 200 estimators, maximal tree depth equal to 5 and default values for other hyperparameters.

References

  1. Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research, 29, 1608–1639.

    Google Scholar 

  2. Aloud, M. (2014). Agent-based simulation in finance: Design and choices. In: Proceedings in finance and risk perspectives ‘14.

  3. Andreas, J., Klein, D., & Levine, S. (2017). Modular multitask reinforcement learning with policy sketches.

  4. Bak, P., Norrelykke, S., & Shubik, M. (1999). Dynamics of money. Physical Review E, 60, 2528–2532.

    Google Scholar 

  5. Bak, P., Norrelykke, S., & Shubik, M. (2001). Money and goldstone modes. Quantitative Finance, 1, 186–190.

    Google Scholar 

  6. Barde, S. (2015). A practical, universal, information criterion over nth order Markov processes (p. 04). School of Economics Discussion Papers, University of Kent.

  7. Bavard, S., Lebreton, M., Khamassi, M., Coricelli, G., & Palminteri, S. (2018). Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences. Nature Communications, 9(1), 4503. https://doi.org/10.1038/s41467-018-06781-2.

    Article  Google Scholar 

  8. Benzaquen, M., & Bouchaud, J. (2018). A fractional reaction–diffusion description of supply and demand. The European Physical Journal B, 91, 23. https://doi.org/10.1140/epjb/e2017-80246-9D.

    Article  Google Scholar 

  9. Bera, A. K., Ivliev, S., & Lillo, F. (2015). Financial econometrics and empirical market microstructure. Berlin: Springer.

    Google Scholar 

  10. Bhatnagara, S., & Panigrahi, J. R. (2006). Actor-critic algorithms for hierarchical decision processes. Automatica, 42, 637–644.

    Google Scholar 

  11. Biondo, A. E. (2018a). Learning to forecast, risk aversion, and microstructural aspects of financial stability. Economics, 12(2018–20), 1–21.

    Google Scholar 

  12. Biondo, A. E. (2018b). Order book microstructure and policies for financial stability. Studies in Economics and Finance, 35(1), 196–218.

    Google Scholar 

  13. Biondo, A. E. (2018c). Order book modeling and financial stability. Journal of Economic Interaction and Coordination, 14(3), 469–489.

    Google Scholar 

  14. Boero, R., Morini, M., Sonnessa, M., & Terna, P. (2015). Agent-based models of the economy, from theories to applications. New York: Palgrave Macmillan.

    Google Scholar 

  15. Bouchaud, J., Cont, R., & Potters, M. (1997). Scale invariance and beyond. In Proceeding CNRS Workshop on Scale Invariance, Les Houches. Springer.

  16. Bouchaud, J. P. (2018). Handbook of computational economics (Vol. 4). Amsterdam: Elsevier.

    Google Scholar 

  17. Chiarella, C., Iori, G., & Perell, J. (2007). The impact of heterogeneous trading rules on the limit order book and order flows. arXiv:0711.3581.

  18. Christ, M., Braun, N., Neuffer, J., & Kempa-Liehr, A. W. (2018). Time series feature extraction on basis of scalable hypothesis tests, tsfresh-a python package. Neurocomputing, 307, 72–77.

    Google Scholar 

  19. Cont, R. (2001). Empirical properties of asset returns: Stylized facts and statistical issues. Quantitative Finance, 1, 223–236.

    Google Scholar 

  20. Cont, R. (2005). Chapter 7-Agent-based models for market impact and volatility. In A. Kirman & G. Teyssiere (Eds.), Long memory in economics. Berlin: Springer.

    Google Scholar 

  21. Cont, R., & Bouchaud, J. P. (2000). Herd behavior and aggregate fluctuations in financial markets. Macroeconomic Dynamics, 4, 170–196.

    Google Scholar 

  22. Cristelli, M. (2014). Complexity in financial markets. Berlin: Springer.

    Google Scholar 

  23. Current dividend impacts of FTSE-250 stocks. Retrieved May 19, 2020 from https://www.dividenddata.co.uk.

  24. Delbaen, F., & Schachermayer, W. (2004). What is a free lunch? Notices of the AMS, 51(5), 526–528.

    Google Scholar 

  25. Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2017). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3), 653–64.

    Google Scholar 

  26. de Vries, C., & Leuven, K. (1994). Stylized facts of nominal exchange rate returns. Working papers from Purdue University, Krannert School of Management—Center for International Business Education and Research (CIBER).

  27. Ding, Z., Engle, R., & Granger, C. (1993). A long memory property of stock market returns and a new model. Journal of Empirical Finance, 1, 83–106.

    Google Scholar 

  28. Dodonova, A., & Khoroshilov, Y. (2018). Private information in futures markets: An experimental study. Managerial and Decision Economics, 39, 65–70.

    Google Scholar 

  29. Donangelo, R., Hansen, A., Sneppen, K., & Souza, S. R. (2000). Modelling an imperfect market. Physica A, 283, 469–478.

    Google Scholar 

  30. Donangelo, R., & Sneppen, K. (2000). Self-organization of value and demand. Physica A, 276, 572–580.

    Google Scholar 

  31. Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I., & Abbeel, P. (2016). Rl-squared: Fast reinforcement learning via slow reinforcement learning. arXiv:1611.02779.

  32. Duncan, K., Doll, B. B., Daw, N. D., & Shohamy, D. (2018). More than the sum of its parts: A role for the hippocampus in configural reinforcement learning. Neuron, 98, 645–657.

    Google Scholar 

  33. Eickhoff, S. B., Yeo, B. T. T., & Genon, S. (2018). Imaging-based parcellations of the human brain. Nature Reviews Neuroscience, 19, 672–686.

    Google Scholar 

  34. Eisler, Z., & Kertesz, J. (2006). Size matters: Some stylized facts of the stock market revisited. European Physical Journal B, 51, 145–154.

    Google Scholar 

  35. Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50(4), 987–1007.

    Google Scholar 

  36. Erev, I., & Roth, A. E. (2014). Maximization, learning and economic behaviour. PNAS, 111, 10818–10825.

    Google Scholar 

  37. Fama, E. (1970). Efficient capital markets: A review of theory and empirical work. Journal of Finance, 25, 383–417.

    Google Scholar 

  38. Franke, R., & Westerhoff, F. (2011). Structural stochastic volatility in asset pricing dynamics: Estimation and model contest. BERG working paper series on government and growth (Vol. 78).

  39. Fulcher, B. D., & Jones, N. S. (2014). Highly comparative feature-based time-series classification. IEEE Transactions Knowledge and Data Engineering, 26, 3026–3037.

    Google Scholar 

  40. Ganesh, S., Vadori, N., Xu, M., Zheng, H., Reddy, P., & Veloso, M. (2019). Reinforcement learning for market making in a multi-agent dealer market. arXiv:1911.05892.

  41. Gode, D., & Sunder, S. (1993). Allocative efficiency of markets with zero-intelligence traders: Market as a partial substitute for individual rationality. Journal of Political Economy, 101(1), 119–137.

    Google Scholar 

  42. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).

  43. Green, E., & Heffernan, D. M. (2019). An agent-based model to explain the emergence of stylised facts in log returns. arXiv:1901.05053.

  44. Greene, W. H. (2017). Econometric analysis (8th ed.). London: Pearson.

    Google Scholar 

  45. Grondman, I., Busoniu, L., Lopes, G., & Babuska, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems Man and Cybernetics, 42, 1291–1307.

    Google Scholar 

  46. Gualdi, S., Tarzia, M., Zamponi, F., & Bouchaud, J. P. (2015). Tipping points in macroeconomic agent-based models. Journal of Economic Dynamics and Control, 50, 29–61.

    Google Scholar 

  47. Heinrich, J. (2017). Deep RL from self-play in imperfect-information games. Ph.D. thesis, University College London.

  48. Hu, Y. J., & Lin, S. J. (2019). Deep reinforcement learning for optimizing portfolio management. In 2019 Amity international conference on artificial intelligence.

  49. Huang, W., Lehalle, C. A., & Rosenbaum, M. (2015). Simulating and analyzing order book data: The queue-reactive model. Journal of the American Statistical Association, 110, 509.

    Google Scholar 

  50. Huang, Z. F., & Solomon, S. (2000). Power, Lévy, exponential and Gaussian-like regimes in autocatalytic financial systems. European Physical Journal B, 20, 601–607.

    Google Scholar 

  51. IG fees of Contracts For Difference. Retrieved May 19, 2020 from https://www.ig.com.

  52. Katt, S., Oliehoek, F. A., & Amato, C. (2017). Learning in Pomdps with Monte Carlo tree search. In Proceedings of the 34th international conference on machine learning.

  53. Keramati, M., & Gutkin, B. (2011). A reinforcement learning theory for homeostatic regulation. NIPS.

  54. Keramati, M., & Gutkin, B. (2014). Homeostatic reinforcement learning for integrating reward collection and physiological stability. Elife, 3, e04811.

    Google Scholar 

  55. Kim, G., & Markowitz, H. M. (1989). Investment rules, margin and market volatility. Journal of Portfolio Management, 16, 45–52.

    Google Scholar 

  56. Konovalov, A., & Krajbich, I. (2016). Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning. Nature Communications, 7, 12438.

    Google Scholar 

  57. Lefebvre, G., Lebreton, M., Meyniel, F., Bourgeois-Gironde, S., & Palminteri, S. (2017). Behavioural and neural characterization of optimistic reinforcement learning. Nature Human Behaviour, 1(4), 1–19.

    Google Scholar 

  58. Levy, M., Levy, H., & Solomon, S. (1994). A microscopic model of the stock market: Cycles, booms, and crashes. Economics Letters, 45, 103–111.

    Google Scholar 

  59. Levy, M., Levy, H., & Solomon, S. (1995). Microscopic simulation of the stock market: The effect of microscopic diversity. Journal de Physique, I(5), 1087–1107.

    Google Scholar 

  60. Levy, M., Levy, H., & Solomon, S. (1997). New evidence for the power-law distribution of wealth. Physica A, 242, 90–94.

    Google Scholar 

  61. Levy, M., Levy, H., & Solomon, S. (2000). Microscopic simulation of financial markets: From investor behavior to market phenomena. New York: Academic Press.

    Google Scholar 

  62. Levy, M., Persky, N., & Solomon, S. (1996). The complex dynamics of a simple stock market model. International Journal of High Speed Computing, 8, 93–113.

    Google Scholar 

  63. Levy, M., & Solomon, S. (1996a). Dynamical explanation for the emergence of power law in a stock market model. International Journal of Modern Physics C, 7, 65–72.

    Google Scholar 

  64. Levy, M., & Solomon, S. (1996b). Power laws are logarithmic Boltzmann laws. International Journal of Modern Physics C, 7, 595–601.

    Google Scholar 

  65. Liang, H., Yang, L., Tu, H. C. W., & Xu, M. (2017). Human-in-the-loop reinforcement learning. In 2017 Chinese automation congress.

  66. Lipski, J., & Kutner, R. (2013). Agent-based stock market model with endogenous agents’ impact. arXiv:1310.0762.

  67. Lobato, I. N., & Savin, N. E. (1998). Real and spurious long-memory properties of stock-market data. Journal of Business and Economics Statistics, 16, 261–283.

    Google Scholar 

  68. Lux, T., & Marchesi, M. (1999). Scaling and criticality in a stochastic multi-agent model of a financial market. Nature, 397, 498–500.

    Google Scholar 

  69. Lux, T., & Marchesi, M. (2000). Volatility clustering in financial markets: A microsimulation of interacting agents. Journal of Theoretical and Applied Finance, 3, 67–70.

    Google Scholar 

  70. Mandelbrot, B. (1963). The variation of certain speculative prices. The Journal of Business, 39, 394–419.

    Google Scholar 

  71. Mandelbrot, B., Fisher, A., & Calvet, L. (1997). A multifractal model of asset returns. Cowles Foundation for Research and Economics.

  72. Martino, A. D., & Marsili, M. (2006). Statistical mechanics of socio-economic systems with heterogeneous agents. Journal of Physics A, 39, 465–540.

    Google Scholar 

  73. McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.

  74. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., et al. (2016). Asynchronous methods for deep reinforcement learning. arXiv:1602.01783.

  75. Momennejad, I., Russek, E., Cheong, J., Botvinick, M., Daw, N. D., & Gershman, S. J. (2017). The successor representation in human reinforcement learning. Nature Human Behavior, 1, 680–692.

    Google Scholar 

  76. Murray, M. P. (1994). A drunk and her dog: An illustration of cointegration and error correction. The American Statistician, 48(1), 37–39.

    Google Scholar 

  77. Mota Navarro, R., & Larralde, H. (2016). A detailed heterogeneous agent model for a single asset financial market with trading via an order book. arXiv:1601.00229.

  78. Naik, P. K., Gupta, R., & Padhi, P. (2018). The relationship between stock market volatility and trading volume: Evidence from South Africa. The Journal of Developing Areas, 52(1), 99–114.

    Google Scholar 

  79. Neuneier, R. (1997). Enhancing q-learning for optimal asset allocation. In Proceeding of the 10th international conference on neural information processing systems.

  80. Ng, A. Y., Harada, D., & Russell, S. (1999). Theory and application to reward shaping.

  81. Pagan, A. (1996). The econometrics of financial markets. Journal of Empirical Finance, 3, 15–102.

    Google Scholar 

  82. Palminteri, S., Khamassi, M., Joffily, M., & Coricelli, G. (2015). Contextual modulation of value signals in reward and punishment learning. Nature Communications, 6, 1–14.

    Google Scholar 

  83. Palminteri, S., Lefebvre, G., Kilford, E., & Blakemore, S. (2017). Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing. PLoS Computational Biology, 13(8), e1005684.

    Google Scholar 

  84. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn, machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.

    Google Scholar 

  85. Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. arXiv:1703.02702.

  86. Plerou, V., Gopikrishnan, P., Amaral, L. A., Meyer, M., & Stanley, H. E. (1999). Scaling of the distribution of fluctuations of financial market indices. Physical Review E, 60(6), 6519.

    Google Scholar 

  87. Potters, M., & Bouchaud, J. P. (2001). More stylized facts of financial markets: Leverage effect and downside correlations. Physica A, 299, 60–70.

    Google Scholar 

  88. Preis, T., Golke, S., Paul, W., & Schneider, J. J. (2006). Multi-agent-based order book model of financial markets. Europhysics Letters, 75(3), 510–516.

    Google Scholar 

  89. Ross, S., Pineau, J., Chaib-draa, B., & Kreitmann, P. (2011). A Bayesian approach for learning and planning in partially observable Markov decision processes. Journal of Machine Learning Research, 12, 1729–1770.

    Google Scholar 

  90. Sbordone, A. M., Tambalotti, A., Rao, K., & Walsh, K. J. (2010). Policy analysis using DSGE models: An introduction. Economic Policy Review, 16(2), 23–43.

    Google Scholar 

  91. Schreiber, T., & Schmitz, A. (1997). Discrimination power of measures for nonlinearity in a time series. Physical Review E, 55(5), 5443.

    Google Scholar 

  92. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529, 484–489.

    Google Scholar 

  93. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., et al. (2018a). A general reinforcement learning algorithm that masters chess, shogi and go through self-play. Science, 362(6419), 1140–1144.

    Google Scholar 

  94. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In Proceedings of the 31st international conference on machine learning (Vol. 32).

  95. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., et al. (2018b). Mastering the game of go without human knowledge. Nature, 550, 354–359.

    Google Scholar 

  96. Sirignano, J., & Cont, R. (2019). Universal features of price formation in financial markets: Perspectives from deep learning. Quantitative Finance, 19(9), 1449–1459.

    Google Scholar 

  97. Solomon, S., Weisbuch, G., de Arcangelis, L., Jan, N., & Stauffer, D. (2000). Social percolation models. Physica A, 277(1), 239–247.

    Google Scholar 

  98. Spooner, T., Fearnley, J., Savani, R., & Koukorinis, A. (2018). Market making via reinforcement learning. In Proceedings of the 17th AAMAS.

  99. Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

    Google Scholar 

  100. Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12, 1057–1063.

    Google Scholar 

  101. Szepesvari, C. (2010). Algorithms for reinforcement learning. San Rafael: Morgan and Claypool Publishers.

    Google Scholar 

  102. Tessler, C., Givony, S., Zahavy, T., Mankowitz, D. J., & Mannor, S. (2016). A deep hierarchical approach to lifelong learning in minecraft. arXiv:1604.07255.

  103. UK one-year gilt reference prices. Retrieved May 19, 2020 from https://www.dmo.gov.uk.

  104. Vandewalle, N., & Ausloos, M. (1997). Coherent and random sequences in financial fluctuations. Physica A, 246, 454–459.

    Google Scholar 

  105. Vernimmen, P., Quiry, P., Dallocchio, M., Fur, Y. L., & Salvi, A. (2014). Corporate finance: Theory and practice (4th ed.). New York: Wiley.

    Google Scholar 

  106. Wang, J. X., Kurth-Nelson, Z., Kumaran, D., Tirumala, D., Soyer, H., Leibo, J. Z., et al. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 21, 860–868.

    Google Scholar 

  107. Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.

    Google Scholar 

  108. Way, E., & Wellman, M. P. (2013). Latency arbitrage, market fragmentation, and efficiency: A two-market model. In Proceedings of the fourteenth ACM conference on electronic commerce (pp. 855–872).

  109. Wellman, M. P., & Way, E. (2017). Strategic agent-based modeling of financial markets. The Russell Sage Foundation Journal of the Social Sciences, 3(1), 104–119.

    Google Scholar 

  110. Weron, R. (2001). Levy-stable distributions revisited: Tail index \(> 2\) does not exclude the levy-stable regime. International Journal of Modern Physics C, 12, 209–223.

    Google Scholar 

  111. Wiering, M., & van Otterlo, M. (2012). Reinforcement learning: State-of-the-art. Berlin: Springer.

    Google Scholar 

Download references

Acknowledgements

We graciously acknowledge this work was supported by the RFFI Grant No. 16-51-150007 and CNRS PRC No. 151199, and received support from FrontCog ANR-17-EURE-0017. I. L.’s work was supported by the Russian Science Foundation, Grant No. 18-11-00294. S. B.-G. received funding within the framework of the HSE University Basic Research Program funded by the Russian Academic Excellence Project No. 5-100.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Johann Lussange.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Classification Accuracy

We show on Fig. 16 the accuracy of both the testing (left) and training (right) sets, as functions of time-series sample size, for samples containing larger numbers of timestamps than in the Fig. 13. The saturating accuracy dynamics can be observed for both testing and training sets: for the value distribution feature set, the former does not exceed \(70\%\) and the latter \(75\%\), while for the full time-series feature set, the former saturates above \(90\%\) and the latter above \(95\%\). One can notice that the accuracy values on the training set are generally higher than for the testing set, and do not show such a pronounced saturation dynamic. The accuracy on the training set is not too large because the trees in the random forest have been regularized (with maximal depth equal to 5), since we found it is necessary for a good generalization on the testing set.

Fig. 16
figure16

Accuracy of the testing set (left) and training set (right), as a function of time-series sample size. The value distribution features include only features that do not depend on the order of values in the time-series (e.g. mean, median, variance, kurtosis, skewness of the value distribution, etc.), whereas all time-series features correspond to the total set of time-series features including those that depend on the temporal structure of the series (e.g. autocorrelation, entropy, FFT coefficients, etc.). Both testing and training subsets are balanced in terms of class distribution, and their respective accuracy is achieved with samples containing up to 200 timestamps. The simulations are generated with parameters \(I=500\), \(J=1\), \(T=2875\), and \(S=20\)

Top Statistical Features

We provide a general grouping and examples of the top statistical features used in the dimensionality reduction performed in Sect. 4.2. The exact ranking of particular features found in our experiments together with their importance metric value \(\varTheta\) is as follows. The imporance metric \(\varTheta\) is summed from 30 random forest models trained on different random splits of the training/testing sets.

  1. 1.

    Partial autocorrelation value of lag 1, \(\varTheta =1.2240\).

  2. 2.

    First coefficient of the fitted AR(10) process, \(\varTheta =1.0777\).

  3. 3.

    Kurtosis of the FFT coefficient distribution, \(\varTheta =1.0214\).

  4. 4.

    Skewness of the FFT coefficient distribution, \(\varTheta =1.0001\).

  5. 5.

    Autocorrelation value of lag 1, \(\varTheta =0.9861\).

  6. 6.

    60th percentile of the value distribution, \(\varTheta =0.9044\).

  7. 7.

    Kurtosis of the FFT coefficient distribution, \(\varTheta =0.7347\).

  8. 8.

    Mean of consecutive changes in the series for values in between the 0th and the 80th percentiles of the value distribution, \(\varTheta =0.6349\).

  9. 9.

    Variance of consecutive changes in the series for values in between the 0th and the 20th percentiles of the value distribution, \(\varTheta =0.5948\).

  10. 10.

    Approximate entropy value (length of compared run of data is 2, filtering level is 0.1), \(\varTheta =0.5878\).

  11. 11.

    70th percentile of the value distribution, \(\varTheta =0.5589\).

  12. 12.

    Variance of absolute consecutive changes in the series for values in between the 0th and the 20th percentiles of the value distribution, \(\varTheta =0.5584\).

  13. 13.

    Mean of consecutive changes in the series for values in between the 40th and the 100th percentiles of the value distribution, \(\varTheta =0.4755\).

  14. 14.

    Ratio of values that are more than 1 standard deviation away from the mean value, \(\varTheta =0.3282\).

  15. 15.

    Median of the value distribution, \(\varTheta =0.2957\).

  16. 16.

    Skewness of the value distribution, \(\varTheta =0.2894\).

  17. 17.

    Measure of time series nonlinearity from Schreiber and Schmitz (1997) of lag 1, \(\varTheta =0.2867\).

  18. 18.

    Second coefficient of the fitted AR(10) process, \(\varTheta =0.2726\).

  19. 19.

    Partial autocorrelation value of lag 1, \(\varTheta =0.2575\).

  20. 20.

    Time reversal symmetry statistic from Fulcher and Jones (2014) of lag 1, \(\varTheta =0.2418\).

The top-10 features referenced in Sect. 4.2 are the first 10 features taken from the list above. The PCA and UMAP mappings of the top-10 features onto a two-dimensional space demonstrated some separability between the two classes (real vs. simulated data), as measured by training a linear classifier on these two-dimensional data representations (see Sect. 4.2 for details), as well as by calculating the Kolmogorov–Smirnov (KS) statistic for each embedding component. The KS statistic value between the two classes is 0.24 and 0.11 for PCA (for the first and second component, respectively) and 0.30 and 0.25 for UMAP.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lussange, J., Lazarevich, I., Bourgeois-Gironde, S. et al. Modelling Stock Markets by Multi-agent Reinforcement Learning. Comput Econ 57, 113–147 (2021). https://doi.org/10.1007/s10614-020-10038-w

Download citation

Keywords

  • Agent-based
  • Reinforcement learning
  • Multi-agent system
  • Stock markets