Reinforcement Learning in a Cournot Oligopoly Model

Xu, Junyi

doi:10.1007/s10614-020-09982-4

Reinforcement Learning in a Cournot Oligopoly Model

Published: 27 April 2020

Volume 58, pages 1001–1024, (2021)
Cite this article

Computational Economics Aims and scope Submit manuscript

Junyi Xu¹

569 Accesses
2 Citations
Explore all metrics

Abstract

This paper analyzes the learning behavior of firms in a repeated Cournot oligopoly game. Literature shows the degree of information and cognitive capacity of learning firms is a key factor that determines long run outcome of an oligopoly market. In particular, when firms possess the knowledge of market demand and are capable of computing the optimal production quantity given the output of other firms, the resulting market outcome is the Nash equilibrium. On the other hand, imitation that assumes low behavioral sophistication of firms generally favors higher output and converges to the Walrasian equilibrium. In this paper, a reinforcement learning algorithm with low cognitive requirement is adopted to model firms’ learning behavior. Reinforcement learning firms observe past production choices and fine tune them to improve profits. Analytical result shows that the Nash equilibrium is the only fixed point of the reinforcement learning process. Convergence to the Nash equilibrium is observed in computational simulations. When firms are allowed to imitate the most profitable competitor, all states between the Nash equilibrium and the Walrasian equilibrium can be reached. Furthermore, the long run outcome shifts towards the Nash equilibrium as the length of firms’ memory increases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Adaptive Learning Model for Competing Firms in an Industry

Self-Adaptive Agents in a Dynamic Pricing Duopoly: Competition, Collusion, and Risk Considerations

Article Open access 20 April 2022

Dynamic pricing under competition with data-driven price anticipations and endogenous reference price effects

Article 12 August 2019

Notes

We have considered other probability distributions including Cauchy and Logistic. We show in the Appendix that the results obtained in this paper are not sensitive to the choice of probability distributions.
Readers may be interested to see a variant of treatment III with imitation. Recall that firms in treatment III learn a production rule. Imitation therefore rests on a sharp assumption that firms observe coefficients in the production rule adopted by the most profitable competitor. This is clearly a drawback since the assumption gives learning firms unreasonably strong cognitive abilities. To address this issue, one can introduce a noise, as an information diffusion mechanism, to the production rule that imitating firms follow. However, Granato et al. (2008) show that the noise itself can lead to possibility of multiple equilibria. Since the primary interest of this paper is to study the outcome of oligopoly economy under learning instead of information diffusion, we do not consider the case of treatment III with imitation in this paper and leave it to future work.
The probability density function of Cauchy distribution is symmetric, but its mean and variance are not well-defined. To simulate firms’ production choices through random experimentation, we assume that firms use $\mu (s,\theta )$ as the mean and $\gamma (s,\theta )$ as the variance.

References

Alós-Ferrer, C. (2004). Cournot versus Walras in dynamic oligopolies with memory. International Journal of Industrial Organization, 22(2), 193–217.
Article Google Scholar
Alós-Ferrer, C., & Buckenmaier, J. (2017). Cournot vs. Walras A reappraisal through simulations. Journal of Economic Dynamics and Control, 82, 257–272.
Article Google Scholar
Anufriev, M., & Kopányi, D. (2018). Oligopoly game: Price makers meet price takers. Journal of Economic Dynamics and Control, 91, 84–103.
Article Google Scholar
Apesteguia, J., Huck, S., & Oechssler, J. (2007). Imitation theory and experimental evidence. Journal of Economic Theory, 136(1), 217–235.
Article Google Scholar
Arifovic, J. (1994). Genetic algorithm learning and the cobweb model. Journal of Economic Dynamics and Control, 18(1), 3–28.
Article Google Scholar
Arifovic, J., & Maschek, M. K. (2006). Revisiting individual evolutionary learning in the cobweb model—An illustration of the virtual spite-effect. Computational Economics, 28(4), 333–354.
Article Google Scholar
Bergin, J., & Bernhardt, D. (2004). Comparative learning dynamics. International Economic Review, 45(2), 431–465.
Article Google Scholar
Bischi, G. I, Naimzada, A. K, & Sbragia, L. (2007). Oligopoly games with local monopolistic approximation. Journal of Economic Behavior & Organization, 62(3), 371–388.
Article Google Scholar
Bischi, G. I., Lamantia, F., & Radi, D. (2015). An evolutionary Cournot model with limited market knowledge. Journal of Economic Behavior & Organization, 116, 219–238.
Article Google Scholar
Dawid, H. (2011). Adaptive learning by genetic algorithms: Analytical results and applications to economic models. Berlin: Springer.
Google Scholar
Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 1, 848–881.
Google Scholar
Evans, G. W., & Honkapohja, S. (2001). Learning and Expectations in Macroeconomics. Princeton: Princeton University Press.
Book Google Scholar
Franke, Reiner. (1998). Coevolution and stable adjustments in the cobweb model. Journal of Evolutionary Economics, 8(4), 383–406.
Article Google Scholar
Granato, J., Guse, E. A., & Sunny Wong, M. C. (2008). Learning from the expectations of others. Macroeconomic Dynamics, 12(3), 345–377.
Article Google Scholar
Huck, S., Normann, H.-T., & Oechssler, J. (1999). Learning in Cournot oligopoly-an experiment. The Economic Journal, 109(454), 80–95.
Article Google Scholar
Kutschinski, E., Uthmann, T., & Polani, D. (2003). Learning competitive pricing strategies by multi-agent reinforcement learning. Journal of Economic Dynamics and Control, 27(11–12), 2207–2218.
Article Google Scholar
Offerman, T., Potters, J., & Sonnemans, J. (2002). Imitation and belief learning in an oligopoly experiment. The Review of Economic Studies, 69(4), 973–997.
Article Google Scholar
Pape, A., Xiao, W. (2014). Case-based learning in the cobweb model. Technical report, Working Paper.
Radi, D. (2017). Walrasian versus Cournot behavior in an oligopoly of boundedly rational firms. Journal of Evolutionary Economics, 27(5), 933–961.
Article Google Scholar
Riechmann, T. (2006). Cournot or Walras? Long-run results in oligopoly games. Journal of Institutional and Theoretical Economics JITE, 162(4), 702–720.
Article Google Scholar
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. Cambridge: MIT Press.
Google Scholar
Tharakunnel, K., & Bhattacharyya, S. (2009). Single-leader-multiple-follower games with boundedly rational agents. Journal of Economic Dynamics and Control, 33(8), 1593–1603.
Article Google Scholar
Vallée, T., & Yıldızoğlu, M. (2009). Convergence in the finite Cournot oligopoly with social and individual learning. Journal of Economic Behavior & Organization, 72(2), 670–690.
Article Google Scholar
Vallée, T., & Yıldızoğlu, M. (2013). Can they beat the Cournot equilibrium? Learning with memory and convergence to equilibria in a cournot oligopoly. Computational Economics, 41(4), 493–516.
Article Google Scholar
Vega-Redondo, F. (1997). The evolution of Walrasian behavior. Econometrica: Journal of the Econometric Society, 1, 375–384.
Article Google Scholar
Vriend, N. J. (2000). An illustration of the essential difference between individual and social learning, and its consequences for computational analyses. Journal of Economic Dynamics and Control, 24(1), 1–19.
Article Google Scholar
Waltman, L., & Kaymak, U. (2008). Q-learning agents in a Cournot oligopoly model. Journal of Economic Dynamics and Control, 32(10), 3275–3293.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics and Finance, Murray State University, Murray, KY, 42071, USA
Junyi Xu

Authors

Junyi Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junyi Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this section, we change the probability distribution adopted in the reinforcement learning algorithm and perform a robustness check on results obtained in the text. In particular, we consider two alternative probability distributions: a Logistic distribution and a Cauchy distribution.

We first investigate the case where firms select production quantities from a Logistic distribution. The probability density function of production choice q follows

$$\begin{aligned} f(q|s,\theta ) = \frac{\exp \left( -\frac{q-\mu (s,\theta )}{\sigma (s,\theta )}\right) }{\sigma (s,\theta )\left( 1+\exp \left( -\frac{q-\mu (s,\theta )}{\sigma (s,\theta )}\right) \right) ^2}, \end{aligned}$$

(30)

where $\mu (s,\theta ) = \theta _{\mu }^{\prime } s$ is the mean and $\sigma (s,\theta ) = \exp (\theta _{\sigma }^{\prime }s)$ is a scale parameter that determines the variance. $\theta =(\theta _{\mu }^{\prime },\theta _{\sigma }^{\prime })'$ is a vector of parameters that firms learn through the reinforcement learning process in (10).

As an example, we look at the oligopoly economy calibrated above and choose market configurations with $n=2$. When imitation is present, we let $k=50$ be the length of memory of firms. Figure 8 is a representative of common outcome in our simulations. The top panel illustrates the learning dynamics of aggregate production quantity in treatment I. The middle panel and the bottom panel present results for treatment II and III, respectively. These results are similar to those from benchmark simulations based on Gaussian distribution: firms are able to learn the Nash quantity and reinforcement learning converges to the Nash equilibrium (treatment I and III); moreover, long run market output deviates from the Nash value when firms perform imitation (treatment II).

We next consider the learning process with a Cauchy distribution. The probability density function of Logistic distribution is

$$\begin{aligned} f(q|s,\theta ) = \frac{1}{\pi \gamma (s,\theta )\left[ 1+(\frac{q-\mu (s,\theta )}{\gamma (s,\theta )})^2\right] }. \end{aligned}$$

(31)

Here $\mu (s,\theta )$ is the location parameter that determines the mode of the Cauchy distribution, and $\gamma (s,\theta )$ is the scale parameter that specifies the half-width at half-maximum.^{Footnote 3} We run the same simulations with this distribution and re-examine our results above. In Fig. 9, we plot dynamics of aggregate market output for three treatments with $n=2$ and $k=50$. We find that our conclusions still hold. In particular, convergence to the Nash equilibrium takes place under reinforcement learning and imitation drives market outcome towards the Walrasian equilibrium. Therefore, we conclude that the reinforcement learning process considered in this paper produce robust outcome that is not sensitive to the choice of probability distributions.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J. Reinforcement Learning in a Cournot Oligopoly Model. Comput Econ 58, 1001–1024 (2021). https://doi.org/10.1007/s10614-020-09982-4

Download citation

Accepted: 01 April 2020
Published: 27 April 2020
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10614-020-09982-4

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning in a Cournot Oligopoly Model

Abstract

Access this article

Similar content being viewed by others

An Adaptive Learning Model for Competing Firms in an Industry

Self-Adaptive Agents in a Dynamic Pricing Duopoly: Competition, Collusion, and Risk Considerations

Dynamic pricing under competition with data-driven price anticipations and endogenous reference price effects

Notes

References