Skip to main content
Log in

Reinforcement Learning in a Cournot Oligopoly Model

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

This paper analyzes the learning behavior of firms in a repeated Cournot oligopoly game. Literature shows the degree of information and cognitive capacity of learning firms is a key factor that determines long run outcome of an oligopoly market. In particular, when firms possess the knowledge of market demand and are capable of computing the optimal production quantity given the output of other firms, the resulting market outcome is the Nash equilibrium. On the other hand, imitation that assumes low behavioral sophistication of firms generally favors higher output and converges to the Walrasian equilibrium. In this paper, a reinforcement learning algorithm with low cognitive requirement is adopted to model firms’ learning behavior. Reinforcement learning firms observe past production choices and fine tune them to improve profits. Analytical result shows that the Nash equilibrium is the only fixed point of the reinforcement learning process. Convergence to the Nash equilibrium is observed in computational simulations. When firms are allowed to imitate the most profitable competitor, all states between the Nash equilibrium and the Walrasian equilibrium can be reached. Furthermore, the long run outcome shifts towards the Nash equilibrium as the length of firms’ memory increases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. We have considered other probability distributions including Cauchy and Logistic. We show in the Appendix that the results obtained in this paper are not sensitive to the choice of probability distributions.

  2. Readers may be interested to see a variant of treatment III with imitation. Recall that firms in treatment III learn a production rule. Imitation therefore rests on a sharp assumption that firms observe coefficients in the production rule adopted by the most profitable competitor. This is clearly a drawback since the assumption gives learning firms unreasonably strong cognitive abilities. To address this issue, one can introduce a noise, as an information diffusion mechanism, to the production rule that imitating firms follow. However, Granato et al. (2008) show that the noise itself can lead to possibility of multiple equilibria. Since the primary interest of this paper is to study the outcome of oligopoly economy under learning instead of information diffusion, we do not consider the case of treatment III with imitation in this paper and leave it to future work.

  3. The probability density function of Cauchy distribution is symmetric, but its mean and variance are not well-defined. To simulate firms’ production choices through random experimentation, we assume that firms use \(\mu (s,\theta )\) as the mean and \(\gamma (s,\theta )\) as the variance.

References

  • Alós-Ferrer, C. (2004). Cournot versus Walras in dynamic oligopolies with memory. International Journal of Industrial Organization, 22(2), 193–217.

    Article  Google Scholar 

  • Alós-Ferrer, C., & Buckenmaier, J. (2017). Cournot vs. Walras A reappraisal through simulations. Journal of Economic Dynamics and Control, 82, 257–272.

    Article  Google Scholar 

  • Anufriev, M., & Kopányi, D. (2018). Oligopoly game: Price makers meet price takers. Journal of Economic Dynamics and Control, 91, 84–103.

    Article  Google Scholar 

  • Apesteguia, J., Huck, S., & Oechssler, J. (2007). Imitation theory and experimental evidence. Journal of Economic Theory, 136(1), 217–235.

    Article  Google Scholar 

  • Arifovic, J. (1994). Genetic algorithm learning and the cobweb model. Journal of Economic Dynamics and Control, 18(1), 3–28.

    Article  Google Scholar 

  • Arifovic, J., & Maschek, M.  K. (2006). Revisiting individual evolutionary learning in the cobweb model—An illustration of the virtual spite-effect. Computational Economics, 28(4), 333–354.

    Article  Google Scholar 

  • Bergin, J., & Bernhardt, D. (2004). Comparative learning dynamics. International Economic Review, 45(2), 431–465.

    Article  Google Scholar 

  • Bischi, G. I, Naimzada, A.  K, & Sbragia, L. (2007). Oligopoly games with local monopolistic approximation. Journal of Economic Behavior & Organization, 62(3), 371–388.

    Article  Google Scholar 

  • Bischi, G.  I., Lamantia, F., & Radi, D. (2015). An evolutionary Cournot model with limited market knowledge. Journal of Economic Behavior & Organization, 116, 219–238.

    Article  Google Scholar 

  • Dawid, H. (2011). Adaptive learning by genetic algorithms: Analytical results and applications to economic models. Berlin: Springer.

    Google Scholar 

  • Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 1, 848–881.

    Google Scholar 

  • Evans, G. W., & Honkapohja, S. (2001). Learning and Expectations in Macroeconomics. Princeton: Princeton University Press.

    Book  Google Scholar 

  • Franke, Reiner. (1998). Coevolution and stable adjustments in the cobweb model. Journal of Evolutionary Economics, 8(4), 383–406.

    Article  Google Scholar 

  • Granato, J., Guse, E. A., & Sunny Wong, M. C. (2008). Learning from the expectations of others. Macroeconomic Dynamics, 12(3), 345–377.

    Article  Google Scholar 

  • Huck, S., Normann, H.-T., & Oechssler, J. (1999). Learning in Cournot oligopoly-an experiment. The Economic Journal, 109(454), 80–95.

    Article  Google Scholar 

  • Kutschinski, E., Uthmann, T., & Polani, D. (2003). Learning competitive pricing strategies by multi-agent reinforcement learning. Journal of Economic Dynamics and Control, 27(11–12), 2207–2218.

    Article  Google Scholar 

  • Offerman, T., Potters, J., & Sonnemans, J. (2002). Imitation and belief learning in an oligopoly experiment. The Review of Economic Studies, 69(4), 973–997.

    Article  Google Scholar 

  • Pape, A., Xiao, W. (2014). Case-based learning in the cobweb model. Technical report, Working Paper.

  • Radi, D. (2017). Walrasian versus Cournot behavior in an oligopoly of boundedly rational firms. Journal of Evolutionary Economics, 27(5), 933–961.

    Article  Google Scholar 

  • Riechmann, T. (2006). Cournot or Walras? Long-run results in oligopoly games. Journal of Institutional and Theoretical Economics JITE, 162(4), 702–720.

    Article  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. Cambridge: MIT Press.

    Google Scholar 

  • Tharakunnel, K., & Bhattacharyya, S. (2009). Single-leader-multiple-follower games with boundedly rational agents. Journal of Economic Dynamics and Control, 33(8), 1593–1603.

    Article  Google Scholar 

  • Vallée, T., & Yıldızoğlu, M. (2009). Convergence in the finite Cournot oligopoly with social and individual learning. Journal of Economic Behavior & Organization, 72(2), 670–690.

    Article  Google Scholar 

  • Vallée, T., & Yıldızoğlu, M. (2013). Can they beat the Cournot equilibrium? Learning with memory and convergence to equilibria in a cournot oligopoly. Computational Economics, 41(4), 493–516.

    Article  Google Scholar 

  • Vega-Redondo, F. (1997). The evolution of Walrasian behavior. Econometrica: Journal of the Econometric Society, 1, 375–384.

    Article  Google Scholar 

  • Vriend, N. J. (2000). An illustration of the essential difference between individual and social learning, and its consequences for computational analyses. Journal of Economic Dynamics and Control, 24(1), 1–19.

    Article  Google Scholar 

  • Waltman, L., & Kaymak, U. (2008). Q-learning agents in a Cournot oligopoly model. Journal of Economic Dynamics and Control, 32(10), 3275–3293.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junyi Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this section, we change the probability distribution adopted in the reinforcement learning algorithm and perform a robustness check on results obtained in the text. In particular, we consider two alternative probability distributions: a Logistic distribution and a Cauchy distribution.

We first investigate the case where firms select production quantities from a Logistic distribution. The probability density function of production choice q follows

$$\begin{aligned} f(q|s,\theta ) = \frac{\exp \left( -\frac{q-\mu (s,\theta )}{\sigma (s,\theta )}\right) }{\sigma (s,\theta )\left( 1+\exp \left( -\frac{q-\mu (s,\theta )}{\sigma (s,\theta )}\right) \right) ^2}, \end{aligned}$$
(30)

where \(\mu (s,\theta ) = \theta _{\mu }^{\prime } s\) is the mean and \(\sigma (s,\theta ) = \exp (\theta _{\sigma }^{\prime }s)\) is a scale parameter that determines the variance. \(\theta =(\theta _{\mu }^{\prime },\theta _{\sigma }^{\prime })'\) is a vector of parameters that firms learn through the reinforcement learning process in (10).

As an example, we look at the oligopoly economy calibrated above and choose market configurations with \(n=2\). When imitation is present, we let \(k=50\) be the length of memory of firms. Figure 8 is a representative of common outcome in our simulations. The top panel illustrates the learning dynamics of aggregate production quantity in treatment I. The middle panel and the bottom panel present results for treatment II and III, respectively. These results are similar to those from benchmark simulations based on Gaussian distribution: firms are able to learn the Nash quantity and reinforcement learning converges to the Nash equilibrium (treatment I and III); moreover, long run market output deviates from the Nash value when firms perform imitation (treatment II).

Fig. 8
figure 8

Learning dynamics of joint production based on Logistic distribution with n = 2 and k = 50 for three treatments. Top: Treatment I. Middle: Treatment II. Bottom: Treatment III

We next consider the learning process with a Cauchy distribution. The probability density function of Logistic distribution is

$$\begin{aligned} f(q|s,\theta ) = \frac{1}{\pi \gamma (s,\theta )\left[ 1+(\frac{q-\mu (s,\theta )}{\gamma (s,\theta )})^2\right] }. \end{aligned}$$
(31)

Here \(\mu (s,\theta )\) is the location parameter that determines the mode of the Cauchy distribution, and \(\gamma (s,\theta )\) is the scale parameter that specifies the half-width at half-maximum.Footnote 3 We run the same simulations with this distribution and re-examine our results above. In Fig. 9, we plot dynamics of aggregate market output for three treatments with \(n=2\) and \(k=50\). We find that our conclusions still hold. In particular, convergence to the Nash equilibrium takes place under reinforcement learning and imitation drives market outcome towards the Walrasian equilibrium. Therefore, we conclude that the reinforcement learning process considered in this paper produce robust outcome that is not sensitive to the choice of probability distributions.

Fig. 9
figure 9

Learning dynamics of joint production based on Cauchy distribution with n = 2 and k = 50 for three treatments. Top: Treatment I. Middle: Treatment II. Bottom: Treatment III

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, J. Reinforcement Learning in a Cournot Oligopoly Model. Comput Econ 58, 1001–1024 (2021). https://doi.org/10.1007/s10614-020-09982-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-020-09982-4

Keywords

JEL Classification

Navigation