Abstract
This paper analyzes the learning behavior of firms in a repeated Cournot oligopoly game. Literature shows the degree of information and cognitive capacity of learning firms is a key factor that determines long run outcome of an oligopoly market. In particular, when firms possess the knowledge of market demand and are capable of computing the optimal production quantity given the output of other firms, the resulting market outcome is the Nash equilibrium. On the other hand, imitation that assumes low behavioral sophistication of firms generally favors higher output and converges to the Walrasian equilibrium. In this paper, a reinforcement learning algorithm with low cognitive requirement is adopted to model firms’ learning behavior. Reinforcement learning firms observe past production choices and fine tune them to improve profits. Analytical result shows that the Nash equilibrium is the only fixed point of the reinforcement learning process. Convergence to the Nash equilibrium is observed in computational simulations. When firms are allowed to imitate the most profitable competitor, all states between the Nash equilibrium and the Walrasian equilibrium can be reached. Furthermore, the long run outcome shifts towards the Nash equilibrium as the length of firms’ memory increases.
Similar content being viewed by others
Notes
We have considered other probability distributions including Cauchy and Logistic. We show in the Appendix that the results obtained in this paper are not sensitive to the choice of probability distributions.
Readers may be interested to see a variant of treatment III with imitation. Recall that firms in treatment III learn a production rule. Imitation therefore rests on a sharp assumption that firms observe coefficients in the production rule adopted by the most profitable competitor. This is clearly a drawback since the assumption gives learning firms unreasonably strong cognitive abilities. To address this issue, one can introduce a noise, as an information diffusion mechanism, to the production rule that imitating firms follow. However, Granato et al. (2008) show that the noise itself can lead to possibility of multiple equilibria. Since the primary interest of this paper is to study the outcome of oligopoly economy under learning instead of information diffusion, we do not consider the case of treatment III with imitation in this paper and leave it to future work.
The probability density function of Cauchy distribution is symmetric, but its mean and variance are not well-defined. To simulate firms’ production choices through random experimentation, we assume that firms use \(\mu (s,\theta )\) as the mean and \(\gamma (s,\theta )\) as the variance.
References
Alós-Ferrer, C. (2004). Cournot versus Walras in dynamic oligopolies with memory. International Journal of Industrial Organization, 22(2), 193–217.
Alós-Ferrer, C., & Buckenmaier, J. (2017). Cournot vs. Walras A reappraisal through simulations. Journal of Economic Dynamics and Control, 82, 257–272.
Anufriev, M., & Kopányi, D. (2018). Oligopoly game: Price makers meet price takers. Journal of Economic Dynamics and Control, 91, 84–103.
Apesteguia, J., Huck, S., & Oechssler, J. (2007). Imitation theory and experimental evidence. Journal of Economic Theory, 136(1), 217–235.
Arifovic, J. (1994). Genetic algorithm learning and the cobweb model. Journal of Economic Dynamics and Control, 18(1), 3–28.
Arifovic, J., & Maschek, M. K. (2006). Revisiting individual evolutionary learning in the cobweb model—An illustration of the virtual spite-effect. Computational Economics, 28(4), 333–354.
Bergin, J., & Bernhardt, D. (2004). Comparative learning dynamics. International Economic Review, 45(2), 431–465.
Bischi, G. I, Naimzada, A. K, & Sbragia, L. (2007). Oligopoly games with local monopolistic approximation. Journal of Economic Behavior & Organization, 62(3), 371–388.
Bischi, G. I., Lamantia, F., & Radi, D. (2015). An evolutionary Cournot model with limited market knowledge. Journal of Economic Behavior & Organization, 116, 219–238.
Dawid, H. (2011). Adaptive learning by genetic algorithms: Analytical results and applications to economic models. Berlin: Springer.
Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 1, 848–881.
Evans, G. W., & Honkapohja, S. (2001). Learning and Expectations in Macroeconomics. Princeton: Princeton University Press.
Franke, Reiner. (1998). Coevolution and stable adjustments in the cobweb model. Journal of Evolutionary Economics, 8(4), 383–406.
Granato, J., Guse, E. A., & Sunny Wong, M. C. (2008). Learning from the expectations of others. Macroeconomic Dynamics, 12(3), 345–377.
Huck, S., Normann, H.-T., & Oechssler, J. (1999). Learning in Cournot oligopoly-an experiment. The Economic Journal, 109(454), 80–95.
Kutschinski, E., Uthmann, T., & Polani, D. (2003). Learning competitive pricing strategies by multi-agent reinforcement learning. Journal of Economic Dynamics and Control, 27(11–12), 2207–2218.
Offerman, T., Potters, J., & Sonnemans, J. (2002). Imitation and belief learning in an oligopoly experiment. The Review of Economic Studies, 69(4), 973–997.
Pape, A., Xiao, W. (2014). Case-based learning in the cobweb model. Technical report, Working Paper.
Radi, D. (2017). Walrasian versus Cournot behavior in an oligopoly of boundedly rational firms. Journal of Evolutionary Economics, 27(5), 933–961.
Riechmann, T. (2006). Cournot or Walras? Long-run results in oligopoly games. Journal of Institutional and Theoretical Economics JITE, 162(4), 702–720.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. Cambridge: MIT Press.
Tharakunnel, K., & Bhattacharyya, S. (2009). Single-leader-multiple-follower games with boundedly rational agents. Journal of Economic Dynamics and Control, 33(8), 1593–1603.
Vallée, T., & Yıldızoğlu, M. (2009). Convergence in the finite Cournot oligopoly with social and individual learning. Journal of Economic Behavior & Organization, 72(2), 670–690.
Vallée, T., & Yıldızoğlu, M. (2013). Can they beat the Cournot equilibrium? Learning with memory and convergence to equilibria in a cournot oligopoly. Computational Economics, 41(4), 493–516.
Vega-Redondo, F. (1997). The evolution of Walrasian behavior. Econometrica: Journal of the Econometric Society, 1, 375–384.
Vriend, N. J. (2000). An illustration of the essential difference between individual and social learning, and its consequences for computational analyses. Journal of Economic Dynamics and Control, 24(1), 1–19.
Waltman, L., & Kaymak, U. (2008). Q-learning agents in a Cournot oligopoly model. Journal of Economic Dynamics and Control, 32(10), 3275–3293.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In this section, we change the probability distribution adopted in the reinforcement learning algorithm and perform a robustness check on results obtained in the text. In particular, we consider two alternative probability distributions: a Logistic distribution and a Cauchy distribution.
We first investigate the case where firms select production quantities from a Logistic distribution. The probability density function of production choice q follows
where \(\mu (s,\theta ) = \theta _{\mu }^{\prime } s\) is the mean and \(\sigma (s,\theta ) = \exp (\theta _{\sigma }^{\prime }s)\) is a scale parameter that determines the variance. \(\theta =(\theta _{\mu }^{\prime },\theta _{\sigma }^{\prime })'\) is a vector of parameters that firms learn through the reinforcement learning process in (10).
As an example, we look at the oligopoly economy calibrated above and choose market configurations with \(n=2\). When imitation is present, we let \(k=50\) be the length of memory of firms. Figure 8 is a representative of common outcome in our simulations. The top panel illustrates the learning dynamics of aggregate production quantity in treatment I. The middle panel and the bottom panel present results for treatment II and III, respectively. These results are similar to those from benchmark simulations based on Gaussian distribution: firms are able to learn the Nash quantity and reinforcement learning converges to the Nash equilibrium (treatment I and III); moreover, long run market output deviates from the Nash value when firms perform imitation (treatment II).
We next consider the learning process with a Cauchy distribution. The probability density function of Logistic distribution is
Here \(\mu (s,\theta )\) is the location parameter that determines the mode of the Cauchy distribution, and \(\gamma (s,\theta )\) is the scale parameter that specifies the half-width at half-maximum.Footnote 3 We run the same simulations with this distribution and re-examine our results above. In Fig. 9, we plot dynamics of aggregate market output for three treatments with \(n=2\) and \(k=50\). We find that our conclusions still hold. In particular, convergence to the Nash equilibrium takes place under reinforcement learning and imitation drives market outcome towards the Walrasian equilibrium. Therefore, we conclude that the reinforcement learning process considered in this paper produce robust outcome that is not sensitive to the choice of probability distributions.
Rights and permissions
About this article
Cite this article
Xu, J. Reinforcement Learning in a Cournot Oligopoly Model. Comput Econ 58, 1001–1024 (2021). https://doi.org/10.1007/s10614-020-09982-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-020-09982-4