Skip to main content

Stock Price Formation: Precepts from a Multi-Agent Reinforcement Learning Model


In the past, the bottom-up study of financial stock markets relied on first-generation multi-agent systems (MAS) , which employed zero-intelligence agents and often required the additional implementation of so-called noise traders to emulate price formation processes. Nowadays, thanks to the tools developed in cognitive science and machine learning, MAS can quantitatively gauge agent learning, a pivotal element for information and stock price estimation in finance. In our previous work, we therefore devised a new generation MAS stock market simulator , which implements two key features: firstly, each agent autonomously learns to perform price forecasting and stock trading via model-free reinforcement learning ; secondly, all agents ’ trading decisions feed a centralised double-auction limit order book, emulating price and volume microstructures. Here, we study which trading strategies (represented as reinforcement learning policies) the agents learn and the time-dependency of their heterogeneity. Our central result is that there are more ways to succeed in trading than to fail. More specifically, we find that : i- better-performing agents learn in time more diverse trading strategies than worse-performing ones, ii- they tend to employ a fundamentalist, rather than chartist, approach to asset price valuation, and iii- their transaction orders are less stringent (i.e. larger bids or lower asks).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Availability of data and material

Not applicable.

Code availability

Not applicable.


  • Current dividend impacts of FTSE-250 stocks. (2019). Accessed: 2020-05-19.

  • IG fees of Contracts For Difference. (2019). Accessed: 2020-05-19.

  • Symba code repository. (2019). url=, note = Accessed: 2021-10-30.

  • UK one-year gilt reference prices. (2019). Accessed: 2020-05-19.

  • Aloud, M. (2014). Agent-based simulation in finance: design and choices. Proceedings in Finance and Risk Perspectives 14.

  • Barde, S. (2015). A practical, universal, information criterion over nth order markov processes. University of Kent, School of Economics Discussion Papers 04.

  • Bartolozzi, M. (2010). A multi agent model for the limit order book dynamics. The European Physical Journal B, 78(2), 265–273.

    Article  Google Scholar 

  • Benzaquen, M., & Bouchaud, J. P. (2018). A fractional reaction-diffusion description of supply and demand. The European Physical Journal B, 91(23), 1–7.

    Google Scholar 

  • Bera, A. K., Ivliev, S., & Lillo, F. (2015). Financial Econometrics and Empirical Market Microstructure. Springer.

  • Biondo, A. E. (2019). Order book modeling and financial stability. Journal of Economic Interaction and Coordination, 14(3), 469–489.

    Article  Google Scholar 

  • Boero, R., Morini, M., Sonnessa, M., & Terna, P. (2015). Agent-based models of the economy, from theories to applications. Palgrave Macmillan.

  • Bouchaud, J.-P. (2018). Chapter 7: Market Microstructure, in Computational Economics: Heterogeneous Agent Modeling, 1st Edn.

  • Bouchaud, J. P. (2019). Econophysics: Still fringe after 30 years? arXiv:1901.03691.

  • Challet, D., & Stinchcombe, R. (2003). Non-constant rates and over-diffusive prices in a simple model of limit order markets. Quantitative Finance, 3(3), 155.

    Article  Google Scholar 

  • Chen, T. T., Zheng, B., Li, Y., & Jiang, X. F. (2017). New approaches in agent-based modelling of complex financial systems. Frontiers of Physics, 12(6), 128905.

    Article  Google Scholar 

  • Chiarella, C., Iori, G., & Perelló, J. (2009). The impact of heterogeneous trading rules on the limit order book and order flows. Journal of Economic Dynamics and Control, 33(3), 525–537.

    Article  Google Scholar 

  • Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance, 1, 223–236.

    Article  Google Scholar 

  • Cont, R. (2005). Chapter 7 - Agent-Based Models for Market Impact and Volatility. A Kirman and G Teyssiere: Long memory in economics, Springer.

  • da Costa Pereira, C., Mauri, A., & Tettamanzi, A. G. (2009). Cognitive-agent-based modeling of a financial market. In 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (vol. 2, pp. 20–27). IEEE.

  • Cristelli, M. (2014). Complexity in Financial Markets. Springer.

  • Dayan, P., & Daw, N. D. (2008). Decision theory, reinforcement learning, and the brain. Cognitive, Affective, and Behavioral Neuroscience, 8(4), 429–453.

    Article  Google Scholar 

  • Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2017). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3), 653–664.

    Article  Google Scholar 

  • Dodonova, A., & Khoroshilov, Y. (2018). Private information in futures markets: An experimental study. Managerial and Decision Economics, 39, 65–70.

    Article  Google Scholar 

  • Duncan, K., Doll, B. B., Daw, N. D., & Shohamy, D. (2018). More than the sum of its parts: A role for the hippocampus in configural reinforcement learning. Neuron, 98, 645–657.

    Article  Google Scholar 

  • Eickhoff, S. B., Yeo, B. T. T., & Genon, S. (2018). Imaging-based parcellations of the human brain. Nature Reviews Neuroscience, 19, 672–686.

    Article  Google Scholar 

  • Erev, I., Roth, E., & A. (2014). Maximization, learning and economic behaviour. PNAS, 111, 10818–10825.

    Article  Google Scholar 

  • Fama, E. (1970). Efficient capital markets: A review of theory and empirical work. Journal of Finance, 25, 383–417.

    Article  Google Scholar 

  • Farmer, J. D., & Foley, D. (2009). The economy needs agent-based modelling. Nature, 460(7256), 685–686.

    Article  Google Scholar 

  • Farmer, J. D., Patelli, P., & Zovko, I. I. (2005). The predictive power of zero intelligence in financial markets. Proceedings of the National Academy of Sciences of the United States of America, 102(6), 2254–2259.

    Article  Google Scholar 

  • Franke, R., & Westerhoff, F. (2011). Structural stochastic volatility in asset pricing dynamics: Estimation and model contest. BERG Working Paper Series on Government and Growth 78.

  • Frydman, C., & Camerer, C. F. (2016). The psychology and neuroscience of financial decision making. Trends in Cognitive Sciences, 20, 661–675.

    Article  Google Scholar 

  • Ganesh, S., Vadori, N., Xu, M., Zheng, H., Reddy, P., & Veloso, M. (2019). Reinforcement learning for market making in a multi-agent dealer market. arXiv:1911.05892.

  • Gao, J., Buldyrev, S. V., Stanley, H. E., & Havlin, S. (2012). Networks formed from interdependent networks. Nature physics, 8, 40–48.

    Article  Google Scholar 

  • Gode, D., & Sunder, S. (1993). Allocative efficiency of markets with zero-intelligence traders: Market as a partial substitute for individual rationality. Journal of Political Economy, 101(1), 119–137.

    Article  Google Scholar 

  • Greene, W. H. (2017). Econometric Analysis (8th ed.). Pearson.

  • Grossman, S. J., & Stiglitz, J. E. (1980). On the impossibility of informationally efficient markets. The American Economic Review, 70(3), 393–408.

    Google Scholar 

  • Gualdi, S., Tarzia, M., Zamponi, F., & Bouchaud, J. P. (2015). Tipping points in macroeconomic agent-based models. Journal of Economic Dynamics and Control, 50, 29–61.

    Article  Google Scholar 

  • Hanson, T. A. (2011). The effects of high frequency traders in a simulated market. In: Midwest Finance Association 2012 Annual Meetings Paper.

  • Hardiman, S. J., Bercot, N., & Bouchaud, J. P. (2013). Critical reflexivity in financial markets: a hawkes process analysis. arXiv:1302.1405.

  • Hu, Y. J., & Lin, S. J. (2019). Deep reinforcement learning for optimizing portfolio management. In 2019 amity international conference on artificial intelligence.

  • Huang, W., Lehalle, C. A., & Rosenbaum, M. (2015). Simulating and analyzing order book data: The queue-reactive model. Journal of the American Statistical Association, 110, 509.

    Google Scholar 

  • Kendall, G., Su, Y. (2003). The co-evolution of trading strategies in a multi-agent based simulated stock market through the integration of individual learning and social learning. In Proceedings of IEEE (pp. 2298–2305).

  • Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Perolat, J., et al. (2017). A unified game-theoretic approach to multiagent reinforcement learning. Advances in Neural Information Processing Systems, 30(NIPS 2017), 4190–4203.

    Google Scholar 

  • Leal, S. J., Napoletano, M., Roventini, A., & Fagiolo, G. (2016). Rock around the clock: An agent-based model of low-and high-frequency trading. Journal of Evolutionary Economics, 26(1), 49–76.

    Article  Google Scholar 

  • LeBaron, B. (2002). Building the santa fe artificial stock market. Physica A pp. 1–20.

  • Lee, D., Seo, H., & Jung, M. W. (2012). Neural basis of reinforcement learning and decision making. Annual Review of Neuroscience, 35(1), 287–308.

    Article  Google Scholar 

  • Lee, J. W., Park, J., Jangmin, O., Lee, J., & Hong, E. (2007). A multiagent approach to \( q \)-learning for daily stock trading. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 37(6), 864–877.

    Article  Google Scholar 

  • Lefebvre, G., Lebreton, M., Meyniel, F., Bourgeois-Gironde, S., & Palminteri, S. (2017). Behavioural and neural characterization of optimistic reinforcement learning. Nature Human Behaviour, 1(4), 1–9.

    Article  Google Scholar 

  • Lipski, J., & Kutner, R. (2013). Agent-based stock market model with endogenous agents’ impact. arXiv:1310.0762.

  • Lussange, J., Belianin, A., Bourgeois-Gironde, S., & Gutkin, B. (2020). Learning and cognition in financial markets: A paradigm shift for agent-based models. In Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1252).

  • Lussange, J., Lazarevich, I., Bourgeois-Gironde, S., Palminteri, S., & Gutkin, B. (2020). Modelling stock markets by multi-agent reinforcement learning. Computational Economics, 57, 113–147.

    Article  Google Scholar 

  • Lux, T., & Marchesi, M. (1999). Scaling and criticality in a stochastic multi-agent model of a financial market. Nature, 397(6719), 498–500.

    Article  Google Scholar 

  • Maslov, S. (2000). Simple model of a limit order-driven market. Physica A: Statistical Mechanics and its Applications, 278(3–4), 571–578.

    Article  Google Scholar 

  • Momennejad, I., Russek, E., Cheong, J., Botvinick, M., Daw, N. D., & Gershman, S. J. (2017). The successor representation in human reinforcement learning. Nature Human Behavior, 1, 680–692.

    Article  Google Scholar 

  • Mota Navarro, R., & Larralde, H. (2017). A detailed heterogeneous agent model for a single asset financial market with trading via an order book. PloS one, 12(2), e0170766.

    Article  Google Scholar 

  • Murray, M. P. (1994). A drunk and her dog: An illustration of cointegration and error correction. The American Statistician, 48(1), 37–39.

    Google Scholar 

  • Naik, P. K., Gupta, R., & Padhi, P. (2018). The relationship between stock market volatility and trading volume: Evidence from South Africa. The Journal of Developing Areas, 52(1), 99–114.

    Article  Google Scholar 

  • Neuneier, R. (1997). Enhancing q-learning for optimal asset allocation. In Proc. of the 10th International Conference on Neural Information Processing Systems.

  • Palminteri, S., Khamassi, M., Joffily, M., & Coricelli, G. (2015). Contextual modulation of value signals in reward and punishment learning. Nature communications, 6, 1–14.

    Article  Google Scholar 

  • Platt, D., & Gebbie, T. (2018). Can agent-based models probe market microstructure? Physica A: Statistical Mechanics and its Applications, 503, 1092–1106.

    Article  Google Scholar 

  • Potters, M., & Bouchaud, J. P. (2001). More stylized facts of financial markets: Leverage effect and downside correlations. Physica A, 299, 60–70.

    Article  Google Scholar 

  • Preis, T., Golke, S., Paul, W., & Schneider, J. J. (2006). Multi-agent-based order book model of financial markets. EPL (Europhysics Letters), 75(3), 510.

    Article  Google Scholar 

  • Ross, S. (1973). The economic theory of agency: The principal’s problem. American Economic Review, 63(2), 134–39.

    Google Scholar 

  • Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi and go through self-play. Science, 362(6419), 1140–1144.

    Article  Google Scholar 

  • Sirignano, J., & Cont, R. (2019). Universal features of price formation in financial markets: Perspectives from deep learning. Quantitative Finance, 19(9), 1449–1459.

    Article  Google Scholar 

  • Sornette, D. (2014). Physics and financial economics (1776–2014): puzzles, ising and agent-based models. Reports on Progress in Physics, 77(6), 062001.

    Article  Google Scholar 

  • Spooner, T., Fearnley, J., Savani, R., & Koukorinis, A. (2018). Market making via reinforcement learning. In Proceedings of the 17th AAMAS.

  • Sutton, R., & Barto, A. (2018). Reinforcement Learning, second edition: An Introduction. Bradford Books

  • Szepesvari, C. (2010). Algorithms for Reinforcement Learning. Morgan and Claypool Publishers.

    Book  Google Scholar 

  • Way, E., & Wellman, M. P. (2013). Latency arbitrage, market fragmentation, and efficiency: a two-market model. In Proceedings of the fourteenth ACM conference on Electronic commerce (pp. 855–872).

  • Westerhoff, F. H. (2008). The use of agent-based financial market models to test the effectiveness of regulatory policies. Jahrbucher Fur Nationalokonomie Und Statistik, 228(2), 195.

    Article  Google Scholar 

  • Wiering, M., & van Otterlo, M. (2012). Reinforcement Learning: State-of-the-Art. Springer.

    Book  Google Scholar 

  • Xu, H. C., Zhang, W., Xiong, X., & Zhou, W. X. (2014). An agent-based computational model for china’s stock market and stock index futures market. Mathematical Problems in Engineering, 2014, 563912.

    Google Scholar 

Download references


We graciously acknowledge this work was supported by the HSE Basic Research Program and the Russian Academic Excellence Project “5-100” and CNRS PRC nr. 151199, and received support from FrontCog ANR-17-EURE-0017. Also, S.P. is supported by an ATIP-Avenir grant (R16069JS), the Programme Emergence(s) de la Ville de Paris, the Fondation Fyssen, the Fondation Schlumberger pour l’Education et la Recherche and the IRESP (project EPELNOR).


See previous section.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Johann Lussange.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 259 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lussange, J., Vrizzi, S., Bourgeois-Gironde, S. et al. Stock Price Formation: Precepts from a Multi-Agent Reinforcement Learning Model. Comput Econ (2022).

Download citation

  • Accepted:

  • Published:

  • DOI:


  • Agent-based
  • Reinforcement learning
  • Multi-agent system
  • Stock markets