Skip to main content

Advertisement

Log in

Reinforcement Learning in Economics and Finance

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards induced by other action choices. In reinforcement learning, his actions have consequences: they influence not only rewards, but also future states of the world. The goal of reinforcement learning is to find an optimal policy – a mapping from the states of the world to the set of actions, in order to maximize cumulative reward, which is a long term strategy. Exploring might be sub-optimal on a short-term horizon but could lead to optimal long-term ones. Many problems of optimal control, popular in economics for more than forty years, can be expressed in the reinforcement learning framework, and recent advances in computational science, provided in particular by deep learning algorithms, can be used by economists in order to solve complex behavioral problems. In this article, we propose a state-of-the-art of reinforcement learning techniques, and present applications in economics, game theory, operation research and finance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Abbeel, P., & Ng, A. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference in Machine Learning (ICML 2004).

  • Abel, D. (2019). Concepts in Bounded Rationality: Perspectives from Reinforcement Learning. PhD thesis, Brown University.

  • Aguirregabiria, V., & Mira, P. (2002). Swapping the nested fixed point algorithm: a class of estimators for discrete markov decision models. Econometrica, 70(4), 1519–1543.

    Article  Google Scholar 

  • Aguirregabiria, V., & Mira, P. (2010). Dynamic discrete choice structural models: a survey. Journal of Econometrics, 156(1), 38–67.

    Article  Google Scholar 

  • Almahdi, S., & Yang, S. Y. (2017). An adaptive portfolio trading system: a risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications, 87, 267–279.

    Article  Google Scholar 

  • Arthur, W. B. (1991). Designing economic agents that act like human agents: a behavioral approach to bounded rationality. The American Economic Review, 81(2), 353–359.

    Google Scholar 

  • Arthur, W. B. (1994). Inductive reasoning and bounded rationality. The American Economic Review, 84(2), 406–411.

    Google Scholar 

  • Athey, S., & Imbens, G. W. (2016). The econometrics of randomized experiments. ArXiv e-prints.

  • Athey, S., & Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11(1), 685–725.

    Article  Google Scholar 

  • Aumann, R. J. (1997). Rationality and bounded rationality. Games and Economic Behavior, 21(1), 2–14.

    Article  Google Scholar 

  • Bain, M., & Sammut, C. (1995). A framework for behavioural cloning. In Machine Intelligence 15.

  • Baldacci, B. Manziuk, I., Mastrolia, T., & Rosenbaum, M. (2019). Market making and incentives design in the presence of a dark pool: a deep reinforcement learning approach. arXiv preprint arXiv:1912.01129.

  • Barto,A. G., & Singh, S. P. (1991). On the computational economics of reinforcement learning. In D. S. Touretzky, J. L. Elman, T. J. Sejnowski, and G. E. Hinton (eds), Connectionist Models, pp. 35 – 44. Morgan Kaufmann.

  • Basci, E. (1999). Learning by imitation. Journal of Economic Dynamics and Control, 23(9), 1569–1585.

    Article  Google Scholar 

  • Bellman, R. (1957). Dynamic Programming. Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016). Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940.

  • Bergemann, D., & Hege, U. (1998). Venture capital financing, moral hazard and learning. Journal of Banking and Finance, 22(6), 703–735.

    Article  Google Scholar 

  • Bergemann, D., & Hege, U. (2005). The financing of innovation: Learning and stopping. The RAND Journal of Economics, 36(4), 719–752.

    Google Scholar 

  • Bergemann, D., & Välimäki, J. (1996). Learning and strategic pricing. Econometrica, 64(5), 1125–1149.

    Article  Google Scholar 

  • Bernheim, B. D. (1984). Rationalizable strategic behavior. Econometrica, 52(4), 1007–1028.

    Article  Google Scholar 

  • Berry, D. A., & Fristedt, B. (1985). Bandits Problems Sequential Allocation of Experiments. — (Monographs on statistics and applied probability). Chapman and Hall.

  • Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-Dynamic Programming. Athena Scientific.

  • Bottou, L. (1998). Online algorithms and stochastic approximations. In D. Saad (ed), Online Learning and Neural Networks.

  • Brown, G. W. (1951). Iterative solutions of games by fictitious play. In T. Koopmans (Ed.), Activity Analysis of Production and Allocation (pp. 374–376). NewYork: Wiley.

    Google Scholar 

  • Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271–1291.

    Article  Google Scholar 

  • Börgers, T., Morales, A. J., & Sarin, R. (2004). Expedient and monotone learning rules. Econometrica, 72(2), 383–405.

    Article  Google Scholar 

  • Cai, H., Ren, K., Zhang, W., Malialis, K., Wang, J., Yu, Y., & Guo, D. (2017). Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17, pp. 661–670. Association for Computing Machinery: New York, USA.

  • Charpentier, A., Flachaire, E., & Ly, A. (2018). Econometrics and machine learning. Economics and Statistics, 505(1), 147–169.

    Google Scholar 

  • Chattopadhyay, R., & Duflo, E. (2004). Women as policy makers: Evidence from a randomized policy experiment in india. Econometrica, 72(5), 1409–1443.

    Article  Google Scholar 

  • Cherniak, C. (1986). Minimal Rationality. MIT Press: MIT Press.

    Google Scholar 

  • Christofides, N. (1976). Worst-case analysis of a new heuristic for the travelling salesman problem. Graduate School of Industrial Administration, CMU: Technical report.

    Google Scholar 

  • Croes, G. A. (1958). A method for solving traveling-salesman problems. Operations research, 6(6), 791–812.

    Article  Google Scholar 

  • Cyert, R. M., & DeGroot, M. H. (1974). Rational expectations and bayesian analysis. Journal of Political Economy, 82(3), 521–536.

    Article  Google Scholar 

  • Dai, H., Khalil, E. B., Zhang, Y., Dilkina, B., & Song, L. (2017). Learning combinatorial optimization algorithms over graphs. arXiv preprint arXiv:1704.01665.

  • Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3), 653–664.

    Article  Google Scholar 

  • Deudon, M., Cournut, P., Lacoste, A., Adulyasak, Y., & Rousseau, L.-M. (2018). Learning heuristics for the tsp by policy gradient. Artificial Intelligence, and Operations Research. In W.-J. van Hoeve (Ed.), Integration of Constraint Programming (pp. 170–181). Cham: Springer International Publishing.

    Google Scholar 

  • Devaine, M., Gaillard, P., Goude, Y., & Stoltz, G. (2013). Forecasting electricity consumption by aggregating specialized experts. Machine Learning, 90(2), 231–260.

    Article  Google Scholar 

  • Dilaver, O., Calvert Jump, R., & Levine, P. (2018). Agent-based macroeconomics and dynamic stochastic general equilibrium models: Where do we go from here? Journal of Economic Surveys, 32(4), 1134–1159.

    Article  Google Scholar 

  • Doraszelski, U., & Satterthwaite, M. (2010). Computable markov-perfect industry dynamics. The RAND Journal of Economics, 41(2), 215–243.

    Article  Google Scholar 

  • Dorigo,M., & Gambardella, L. M. (1996). Ant colonies for the traveling salesman problem. Istituto Dalle Molle di Studi sull’Intelligenza Artificiale, 3.

  • Dütting, P., Feng, Z., Narasimhan, H., Parkes, D. C., & Ravindranath, S. S. (2017). Optimal auctions through deep learning.

  • Elie, R., Perolat, J., Laurière, M., Geist, M., & Pietquin, O. (2020). On the convergence of model free learning in mean field games. In AAAI Conference one Artificial Intelligence (AAAI 2020).

  • Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 88(4), 848–881.

    Google Scholar 

  • Ericson, R., & Pakes, A. (1995). Markov-perfect industry dynamics: a framework for empirical work. The Review of Economic Studies, 62(1), 53–82.

    Article  Google Scholar 

  • Escobar, J. F. (2013). Equilibrium analysis of dynamic models of imperfect competition. International Journal of Industrial Organization, 31(1), 92–101.

    Article  Google Scholar 

  • Even Dar, E., Mirrokni, V. S., Muthukrishnan, S., Mansour, Y., & Nadav, U. (2009). Bid optimization for broad match ad auctions. In Proceedings of the 18th International Conference on World Wide Web, WWW ’09, pages 231–240. Association for Computing Machinery: New York, USA.

  • Feldman, M. (1987). Bayesian learning and convergence to rational expectations. Journal of Mathematical Economics, 16(3), 297–313.

    Article  Google Scholar 

  • Feng, Z., Narasimhan, H., Parkes, D. C. (2018). Deep learning for revenue-optimal auctions with budgets. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, pp. 354–362. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC.

  • Fershtman, C., & Pakes, A. (2012). Dynamic Games with Asymmetric Information: a Framework for Empirical Work*. The Quarterly Journal of Economics, 127(4), 1611–1661.

    Article  Google Scholar 

  • Flood, M. M. (1956). The travelling salesman problem. Operations Research, 4, 61–75.

    Article  Google Scholar 

  • Folkers, A., Rick, M., & Buskens, C. (2019). Controlling an autonomous vehicle with deep reinforcement learning. 2019 IEEE Intelligent Vehicles Symposium (IV). https://doi.org/10.1109/ivs.2019.8814124.

  • Franke, R. (2003). Reinforcement learning in the el farol model. Journal of Economic Behavior and Organization, 51(3), 367–388.

    Article  Google Scholar 

  • Fudenberg, D., & Levine, D. (1998). The Theory of Learning in Games. USA: Massachusetts Institute of Technology (MIT) Press.

    Google Scholar 

  • Fécamp, S., Mikael, J., & Warin, X. (2019). Risk management with machine-learning-based algorithms. arXiv preprint arXiv:1902.05287,.

  • Gabaix, X. (2014). A sparsity-based model of bounded rationality. The Quarterly Journal of Economics, 129(4), 1661–1710.

    Article  Google Scholar 

  • Galichon, A. (2017). Optimal transport methods in economics. USA: Princeton University Press.

    Book  Google Scholar 

  • Gambardella, L. M., & Dorigo, M. (1995). Ant-Q: A reinforcement learning approach to the traveling salesman problem. In A. Prieditis and S. Russell, editors, Machine Learning Proceedings 1995, pp. 252–260. Morgan Kaufmann.

  • Ganesh, S., Vadori, N., Xu, M., Zheng, H., Reddy, P., & Veloso, M. (2019). Reinforcement learning for market making in a multi-agent dealer market. arXiv preprint arXiv:1911.05892.

  • Garcia, J. (1981). The nature of learning explanations. Behavioral and Brain Sciences, 4(1), 143–144.

    Article  Google Scholar 

  • Gennaioli, N., & Shleifer, A. (2010). What Comes to Mind*. The Quarterly Journal of Economics, 125(4), 1399–1433.

    Article  Google Scholar 

  • Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245), 273–278.

    Article  Google Scholar 

  • Gibson, B. (2007). A multi-agent systems approach to microeconomic foundations of macro. Economics Department Working Paper, University of Massachusetts, 2007-10.

  • Gigerenzer, G., & Goldstein, D. (1996). Reasoning the fast and frugal way: models of bounded rationality. Psychological review, 103(4), 650.

    Article  Google Scholar 

  • Gittins, J. (1989). Bandit processes and dynamic allocation indices. NewYork: Wiley.

    Google Scholar 

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org.

  • Granato, J., Guse, E. A., & Wong, M. C. S. (2008). Learning from the expectations of others. Macroeconomic Dynamics, 12(3), 345–377. https://doi.org/10.1017/S1365100507070186.

    Article  Google Scholar 

  • Guéant, O., & Manziuk, I. (2020). Deep reinforcement learning for market making in corporate bonds: beating the curse of dimensionality. Applied Mathematical Finance, 26(5), 387–452.

    Article  Google Scholar 

  • Hansen, L. P., & Sargent, T. J. (2013). Recursive Models of Dynamic Linear Economies. The Gorman Lectures in Economics. Princeton University Press.

  • Hart, S., & Mas-Colell, A. (2003). Uncoupled dynamics do not lead to nash equilibrium. American Economic Review, 93(5), 1830–1836.

    Article  Google Scholar 

  • Hasselt, H. V. (2010). Double q-learning. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pp. 2613–2621. Curran Associates, Inc.

  • Hellwig, M. F. (1973). Sequential models in economic dynamics. PhD thesis, Massachusetts Institute of Technology, Department of Economics.

  • Holland, J. H. (1975). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. USA: University of Michigan Press.

    Google Scholar 

  • Holland, J. H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine Learning: An Artificial Intelligence Approach (Vol. 2). Los Altos, CA: Morgan Kaufmann.

    Google Scholar 

  • Hopkins, E. (2002). Two competing models of how people learn in games. Econometrica, 70(6), 2141–2166.

    Article  Google Scholar 

  • Horst, U. (2005). Stationary equilibria in discounted stochastic games with weakly interacting players. Games and Economic Behavior, 51(1), 83–108.

    Article  Google Scholar 

  • Hotz, V. J., & Miller, R. A. (1993). Conditional choice probabilities and the estimation of dynamic models. The Review of Economic Studies, 60(3), 497–529.

    Article  Google Scholar 

  • Howard, R. A. (1960). Dynamic Programming and Markov Processes. Cambridge, Massachusetts: MIT Press.

    Google Scholar 

  • Huang, M., Malhamé, R. P., & Caines, P. E. (2006). Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle. Communications in Information and Systems, 6(3), 221–252.

    Article  Google Scholar 

  • Hughes, N. (2014). Applying reinforcement learning to economic problems. Technical report, Australian National University.

  • Igami, M. (2017). Artificial intelligence as structural estimation: Economic interpretations of deep blue, bonanza, and alphago. arXiv preprint arXiv:1710.10967.

  • Ito, K., & Reguant, M. (2016). Sequential markets, market power, and arbitrage. American Economic Review, 106(7), 1921–57. https://doi.org/10.1257/aer.20141529.

    Article  Google Scholar 

  • Jenkins, H. M. (1979). Animal learning and behavior theory. In E. Hearst (ed), The first century of experimental psychology, pp. 177–228.

  • Jovanovic, B. (1982). Selection and the evolution of industry. Econometrica, 50(3), 649–670.

    Article  Google Scholar 

  • Kahneman, D. (2011). Thinking, fast and slow. NewYork: Macmillan.

    Google Scholar 

  • Kasy, M., & Sautmann, A. (2019). Adaptive treatment assignment in experiments for policy choice. Technical report, Harvard University.

  • Keller, G., & Rady, S. (1999). Optimal experimentation in a changing environment. The Review of Economic Studies, 66(3), 475–507.

    Article  Google Scholar 

  • Kimbrough, S. O., & Murphy, F. H. (2008). Learning to collude tacitly on production levels by oligopolistic agents. Computational Economics, 33(1), 47.

    Article  Google Scholar 

  • Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Sallab, A. A. A., Yogamani, S., & Pérez, P. (2020). Deep reinforcement learning for autonomous driving: A survey. arXiv preprint arXiv:2002.00444, 2020.

  • Kiyotaki, N., & Wright, R. (1989). On money as a medium of exchange. Journal of Political Economy, 97(4), 927–954.

    Article  Google Scholar 

  • Klein, E., Geist, M., Piot, B., & Pietquin, O. (2012). Inverse reinforcement learning through structured classification. Advances in Neural Information Processing Systems, 1007–1015.

  • Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.

    Google Scholar 

  • Lasry, J.-M., & Lions, P.-L. (2006a). Jeux à champ moyen. i - le cas stationnaire. Comptes Rendus Mathematique, 343(9), 619–625.

    Article  Google Scholar 

  • Lasry, J.-M., & Lions, P.-L. (2006b). Jeux à champ moyen. ii - horizon fini et contrôle optimal. Comptes Rendus Mathematique, 343(10), 679–684.

    Article  Google Scholar 

  • Leimar, O., & McNamara, J. (2019). Learning leads to bounded rationality and the evolution of cognitive bias in public goods games. Nature Scientific Reports, 9, 16319.

    Article  Google Scholar 

  • Lettau, M., & Uhlig, H. (1999). Rules of thumb versus dynamic programming. American Economic Review, 89(1), 148–174.

    Article  Google Scholar 

  • Levina, T., Levin, Y., McGill, J., & Nediak, M. (2009). Dynamic pricing with online learning and strategic consumers: an application of the aggregating algorithm. Operations Research, 57(2), 327–341.

    Article  Google Scholar 

  • Li, B., & Hoi, S. C. (2014). Online portfolio selection: a survey. ACM Computing Surveys (CSUR), 46(3), 1–36.

    Google Scholar 

  • Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Machine Learning Proceedings 1994, pp. 157–163. Elsevier.

  • Ljungqvist, L., & Sargent, T. J. (2018). Recursive macroeconomic theory (Vol. 4). USA: MIT Press.

    Google Scholar 

  • Magnac, T., & Thesmar, D. (2002). Identifying dynamic discrete decision processes. Econometrica, 70(2), 801–816.

    Article  Google Scholar 

  • Marcet, A., & Sargent, T. J. (1989a). Convergence of least-squares learning in environments with hidden state variables and private information. Journal of Political Economy, 97(6), 1306–1322.

    Article  Google Scholar 

  • Marcet, A., & Sargent, T. J. (1989b). Convergence of least squares learning mechanisms in self-referential linear stochastic models. Journal of Economic Theory, 48(2), 337–368.

    Article  Google Scholar 

  • Maskin, E., & Tirole, J. (1988a). A theory of dynamic oligopoly, I: Overview and quantity competition with large fixed costs. Econometrica, 56, 549–569.

    Article  Google Scholar 

  • Maskin, E., & Tirole, J. (1988b). A theory of dynamic oligopoly, II: Price competition, kinked demand curves, and edgeworth cycles. Econometrica, 56, 571–579.

    Article  Google Scholar 

  • McLennan, A. (1984). Price dispersion and incomplete learning in the long run. Journal of Economic Dynamics and Control, 7(3), 331–347.

    Article  Google Scholar 

  • Miller, R. A. (1984). Job matching and occupational choice. Journal of Political Economy, 92(6), 1086–1120.

    Article  Google Scholar 

  • Minsky, M. (1961). Steps toward artificial intelligence. Transactions on Institute of Radio Engineers, 49, 8–30.

    Google Scholar 

  • Misra, K., Schwartz, E. M., & Abernethy, J. (2019). Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Science, 38(2), 226–252.

    Article  Google Scholar 

  • Moody, J., & Saffell, M. (2001). Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4), 875–889.

    Article  Google Scholar 

  • Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87–106.

    Article  Google Scholar 

  • Nedić, A., & Bertsekas, D. P. (2003). Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13, 79–110.

    Article  Google Scholar 

  • Ng, A. Y., Russell, S. J., et al. (2000). Algorithms for inverse reinforcement learning. Proceedings of the International Conference on Machine Learning (ICML), 663–670.

  • O’Neill, D., Levorato, M., Goldsmith, A., & Mitra, U. (Oct 2010). Residential demand response using reinforcement learning. In 2010 First IEEE International Conference on Smart Grid Communications, pp. 409–414.

  • Pakes, A. (1986). Patents as options: some estimates of the value of holding european patent stocks. Econometrica, 54(4), 755–784.

    Article  Google Scholar 

  • Pakes, A., & Schankerman, M. (1984). The rate of obsolescence of patents, research gestation lags, and the private rate of return to research resources (pp. 73–88). Chicago: University of Chicago Press.

    Google Scholar 

  • Pearce, D. G. (1984). Rationalizable strategic behavior and the problem of perfection. Econometrica, 52(4), 1029–1050.

    Article  Google Scholar 

  • Pearl, J. (2019). The seven tools of causal inference, with reflections on machine learning. Commununications of the ACM, 62(3), 54–60.

    Article  Google Scholar 

  • Perolat, J., Piot, B., & Pietquin, O. (2018). Actor-critic fictitious play in simultaneous move multistage games. International Conference on Artificial Intelligence and Statistics, pp. 919–928.

  • Rescorla, R. A. (1979). Aspects of the reinforcer learned in second-order Pavlovian conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 5(1), 79–95.

    Google Scholar 

  • Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.

    Article  Google Scholar 

  • Robinson, J. (1951). An iterative method of solving a game. Annals of mathematics, 296–301.

  • Rosenkrantz, D. J., Stearns, R. E., & Lewis, P. M. (Oct 1974). Approximate algorithms for the traveling salesperson problem. In 15th Annual Symposium on Switching and Automata Theory (swat 1974), pp. 33–42.

  • Rothkopf, C. A., & Dimitrakakis, C. Preference elicitation and inverse reinforcement learning. In Machine Learning and Knowledge Discovery in Databases, pp. 34–48. Springer: Berlin.

  • Rothschild, M. (1974). A two-armed bandit theory of market pricing. Journal of Economic Theory, 9(2), 185–202.

    Article  Google Scholar 

  • Rubinstein, A. (1998). Modeling Bounded Rationality. USA: MIT Press.

    Book  Google Scholar 

  • Russell, S. J., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall: New Jersey.

    Google Scholar 

  • Russell, S. J., & Subramanian, D. (1995). Provably bounded-optimal agents. Journal of Artificial Intelligence Research, 2(1), 575–609.

    Article  Google Scholar 

  • Rust, J. (1987). Optimal replacement of gmc bus engines: An empirical model of harold zurcher. Econometrica, 55(5), 999–1033.

    Article  Google Scholar 

  • Rustichini, A. (1999). Optimal properties of stimulus-response learning models. Games and Economic Behavior, 29(1), 244–273. https://doi.org/10.1006/game.1999.0712.

    Article  Google Scholar 

  • Samuelson, L. (1997). Evolutionary games and equilibrium selection. Mass: MIT Press Cambridge.

    Google Scholar 

  • Sargent, T. (1993). Bounded rationality in macroeconomics. Oxford: Oxford University Press.

    Google Scholar 

  • Schaal, S. (1996). Learning from demonstration. In Proceedings of the 9th International Conference on Neural Information Processing Systems, NIPS’96, pp.1040-1046, Cambridge, MA, USA. MIT Press.

  • Schwalbe, U. (2019). Algorithms, machine learning, and collusion. Journal of Competition Law and Economics, 14(4), 568–607.

    Article  Google Scholar 

  • Schwind, M. (2007). Dynamic pricing and automated resource allocation for complex information services: reinforcement learning and combinatorial auctions. Berlin: Springer-Verlag.

    Google Scholar 

  • Semenova, V. (2018). Machine learning for dynamic discrete choice. arXiv preprint arXiv:1808.02569.

  • Shapley, L. (1964). Some topics in two-person games. Advances in Game Theory, 52, 1–29.

    Google Scholar 

  • Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., et al. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419), 1140–1144.

    Article  Google Scholar 

  • Simon, H. A. (1972). Theories of bounded rationality. Decision and Organization, 1(1), 161–176.

    Google Scholar 

  • Sinitskaya, E., & Tesfatsion, L. (2015). Macroeconomies as constructively rational games. Journal of Economic Dynamics and Control, 61, 152–182.

    Article  Google Scholar 

  • Skinner, B. F. (1938). The behavior of organisms: an experimental analysis. New York: Appleton-Century-Crofts.

    Google Scholar 

  • Spooner, T., Fearnley, J., Savani, R., & Koukorinis, A. (2018). Market making via reinforcement learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 434–442. International Foundation for Autonomous Agents and Multiagent Systems.

  • Stokey, N. L., Lucas, R. E., & Prescott, E. C. (1989). Recursive methods in economic dynamics. Cambridge: Harvard University Press.

    Book  Google Scholar 

  • Su, C.-L., & Judd, K. L. (2012). Constrained optimization approaches to estimation of structural models. Econometrica, 80(5), 2213–2230.

    Article  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: expectation and prediction. Psychological Review, 88(2), 135.

    Article  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIP Press.

  • Tamar, A., Chow, Y., Ghavamzadeh, M., & Mannor, S. (2015). Policy gradient for coherent risk measures. Advances in Neural Information Processing Systems, 1468–1476.

  • Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.

    Article  Google Scholar 

  • Thorndike, E. L. (1911). Animal Intelligence. New York, NY: Macmillan.

    Google Scholar 

  • Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological review, 55(4), 189.

    Article  Google Scholar 

  • Vyetrenko, S., & Xu, S. (2019). Risk-sensitive compact decision trees for autonomous execution in presence of simulated market response. arXiv preprint arXiv:1906.02312

  • Waltman, L., & Kaymak, U. (2008). \(q\)-learning agents in a cournot oligopoly model. Journal of Economic Dynamics and Control, 32(10), 3275–3293.

    Article  Google Scholar 

  • Wang, H., & Zhou, X.Y. (2019). Continuous-time mean-variance portfolio optimization via reinforcement learning. arXiv preprint arXiv:1904.11392

  • Watkins, C.J. (1989). Learning from delayed reward. PhD thesis, Cambridge University

  • Watkins, C. J. C. H., & Dayan, P. (1992). \(q\)-learning. Machine Learning, 8(3), 279–292.

    Article  Google Scholar 

  • Weber, R. (1992). On the gittins index for multiarmed bandits. The Annals of Applied Probability, 2(4), 1024–1033.

    Article  Google Scholar 

  • Weinan, E., Han, J., & Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics, 5(4), 349–380.

    Article  Google Scholar 

  • Weitzman, M. L. (1979). Optimal search for the best alternative. Econometrica, 47(3), 641–654.

    Article  Google Scholar 

  • Whittle, P. (1983). Optimization Over Time (Vol. 1). Chichester, UK: Wiley.

    Google Scholar 

  • Wiese, M., Bai, L., Wood, B., & Buehler, H. (2019a). Deep hedging: learning to simulate equity option markets. Available at SSRN 3470756

  • Wiese, M., Knobloch, R., Korn, R., & Kretschmer, P. (2019b). Quant gans: deep generation of financial time series. arXiv preprint arXiv:1907.06673

  • Wintenberger, O. (2017). Optimal learning with bernstein online aggregation. Machine Learning, 106(1), 119–141.

    Article  Google Scholar 

  • Wolpin, K. I. (1984). An estimable dynamic stochastic model of fertility and child mortality. Journal of Political Economy, 92(5), 852–874.

    Article  Google Scholar 

  • Zhang, K., Yang, Z., & Başar, T. (2019). Multi-agent reinforcement learning: a selective overview of theories and algorithms

  • Zhang, W., Yuan, S., & Wang, J. (2014). Optimal real-time bidding for display advertising. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pp. 1077–1086, New York, NY, USA. Association for Computing Machinery.

  • Zhao, J., Qiu, G., Guan, Z., Zhao, W., He, X. (2018). Deep reinforcement learning for sponsored search real-time bidding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, pp. 1021–1030. Association for Computing Machinery: New York, USA.

Download references

Funding

Arthur Charpentier acknowledges the financial support of the AXA Research Fund through the joint research initiative Use and value of unusual data in actuarial science, as well as NSERC grant 2019-07077.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur Charpentier.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Arthur Charpentier acknowledges the financial support of the AXA Research Fund through the joint research initiative Use and value of unusual data in actuarial science, as well as NSERC grant 2019-07077.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Charpentier, A., Élie, R. & Remlinger, C. Reinforcement Learning in Economics and Finance. Comput Econ 62, 425–462 (2023). https://doi.org/10.1007/s10614-021-10119-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-021-10119-4

Keywords

JEL Classification

Navigation