Abstract
In this paper, we propose a meta-learning model to hierarchically integrate individual learning and social learning schemes. This meta-learning model is incorporated into an agent-based model to show that Herbert Scarf’s famous counterexample on Walrasian stability can become stable in some cases under a non-tâtonnement process when both learning schemes are involved, a result previously obtained by Herbert Gintis. However, we find that the stability of the competitive equilibrium depends on how individuals learn—whether they are innovators (individual learners) or imitators (social learners), and their switching frequency (mobility) between the two. We show that this endogenous behavior, apart from the initial population of innovators, is mainly determined by the agents’ intensity of choice. This study grounds the Walrasian competitive equilibrium based on the view of a balanced resource allocation between exploitation and exploration. This balance, achieved through a meta-learning model, is shown to be underpinned by a behavioral/psychological characteristic.
Similar content being viewed by others
Notes
Velupillai (2015) succinctly summarizes the different facets of the problem, viz. the ‘existence of a solution, a method of finding it (if ‘proved’ to exist), the ‘reality’ of the method considered as a dynamic process and its stability’ (Ibid, p. 1556).
This is expressed with admirable clarity in Clower (Clower 1975, p. 13).
At a systemic level, it has been shown that the computational complexity of a decentralized system of interacting agents is lower than that of the centralized system that is based on a Walrasian auctioneer (Axtell 2005).
In order to examine the role played by learning, we choose a version that is closer to Scarf’s original version and augment it with learning. It is therefore different from Gintis (2007), which has production.
As Gintis (2013, p. 119) notes:
...This is the fact that in a decentralized market economy out of equilibrium, there is no price vector for the economy at all. The assumption that there is a system of prices that are common knowledge to all participants (we may call these public prices) is reasonable in equilibrium, because all agents can, at least in principle, observe the same prices. However, out of equilibrium there is no single set of prices determined by market exchange. Rather, every agent has a subjective prior concerning prices, based on personal experience, that he uses to make and carry out trading plans.
More specifically, t refers to the whole market day t, i.e., the interval \([t,t-1)\).
For a more extended list, see Chen et al. (2016).
It is known that these two approaches can generally lead to different results (Grimm and Railsback 2005). We, however, will leave this issue to a separate study.
We certainly can consider more generalized propensity updating dynamics with three parameters as proposed by Camerer and Ho (1999), but that can complicate our analysis at this initial stage. Hence, we plan to start with this ‘minimal’ model.
By following Arthur (1993), a normalization scheme is also applied to normalize the propensities \(q_{i,k}(t+1)\) as follows:
$$\begin{aligned} q_{i,k}(t+1) \leftarrow \frac{q_{i,k}(t+1)}{q_{i,a_{il}}(t+1)+ q_{i,a_{sl}}(t+1)}. \end{aligned}$$(18)Results do not vary qualitatively for perturbations of these parameters.
We have examined the system by simulating the same treatments for much longer periods and we find that the results are robust to this extended horizon.
Numéraire normalized processes and their convergence properties have been studied widely in the tatônnement literature. We do not analyze the non-normalized case in this paper. See Kitti (2010), for example, on non-normalized iterative processes and the associated convergence conditions.
This property can also been found in Table 2, the results of Simulation Series 3, where we can see that for each type of agent the price expectations of own consumption goods are biased upward, whereas the price expectations of own production goods are biased downward.
The behavior of the price of good 2 is qualitatively similar.
For \(\lambda = 0\), past performance should not influence the current choice and therefore \(Prob_{i;k}^{t + 1} = 1/2\). Thus, we would expect the market fractions to be 50–50, in contrast to what is observed. However, the past performance does exert an indirect influence through the reference point mechanism. Note that the agents consider switching only when the utility falls below their current reference point. Since innovators have relatively high pay-offs for \(\lambda =0\) (and other lower values), the reference point mechanism introduces a bias in favor of the innovators, which explains the deviations we observe in Fig. 8.
At this point, \(mks_{a_{il}}(0)\) is evenly distributed over two different learning schemes (Table 1).
So far, we have not seen many empirical studies directly devoted to examining the payoff distribution among different heuristics, schemes or strategies in the context of adaptive belief systems or heuristic switching models, neither from the simulation studies, nor from the experimental studies. In this regard, the only study close to us is Bossan et al. (2015), but their adjustment is made at the mesoscopic level (a kind of replicator dynamics), and not at a microscopic level.
We are not able to show here the \(\lambda \) which can serve as the equalizer. However, since the payoff reversal happens when \(\lambda \) increases from 3 to 4, we suspect that there may exist some \(\lambda \)s (\(\lambda \in (3,5)\)) which may remove the gap.
In the psychology literature, the power law of practice indicates that subjects’ early learning experiences have a dominating effect in their limiting behavior; it is normally characterized by initially steep but then flatter learning curves. In the machine learning literature, it is also known as premature convergence, and is a familiar result corresponding to the path-dependence property of learning dynamics. In our case, when agents’ memory never decays (\(\phi =0\)) and \(\lambda \) is large, say, \(\lambda =10\), the path dependency effect can become extreme.
As we show in “Appendix C”, the inferior performance is mainly contributed by innovators rather than imitators.
The correlation is based on the pool of 550 pairs of the MAPE (of good 1) and the average number of switches (over the last 500 periods). There are 550 pairs because we have 50 repetitions for 11 \(\lambda \)s.
See Chen and Venkatachalam (2017) for limits to information aggregation and price discovery in a related context.
References
Albin, P., & Foley, D. (1992). Decentralized, dispersed exchange without an auctioneer: A simulation study. Journal of Economic Behavior and Organization, 18(1), 27–51.
Alós-Ferrer, C., & Schlag, K. H. (2009). Imitation and learning. In P. Anand, P. Pattanaik, & C. Puppe (Eds.), The handbook of rational and social choice. New York: Oxford University Press.
Anderson, C., Plott, C., Shimomura, K., & Granat, S. (2004). Global instability in experimental general equilibrium: The Scarf example. Journal of Economic Theory, 115(2), 209–249.
Anufriev, M., & Hommes, C. (2012). Evolution of market heuristics. Knowledge Engineering Review, 27(2), 255–271.
Apesteguia, J., Huck, S., & Oechssler, J. (2007). Imitation—Theory and experimental evidence. Journal of Economic Theory, 136(1), 217–235.
Arrow, K. (1974). General economic equilibrium: Purpose, analytic techniques, collective choice. American Economic Review, 64(3), 253–272.
Arthur, B. (1993). On designing economic agents that behave like human agents. Journal of Evolutionary Economics, 3(1), 1–22.
Axelrod, R. (1997). Advancing the art of simulation in the social sciences. In R. Conte, R. Hegselmann, & P. Terna (Eds.), Simulating social phenomena (pp. 21–40). Berlin: Springer.
Axtell, R. (2005). The complexity of exchange. The Economic Journal, 115, F193–F210.
Benassy, J. P. (1982). The economics of market disequilibrium. Cambridge: Academic Press.
Bossan, B., Jann, O., & Hammerstein, P. (2015). The evolution of social learning and its economic consequences. Journal of Economic Behavior and Organization, 112, 266–288.
Brenner, T. (1998). Can evolutionary algorithms describe learning processes? Journal of Evolutionary Economics, 8(3), 271–283.
Brock, W., & Hommes, C. (1997). A rational route to randomness. Econometrica, 65(5), 1059–1095.
Brock, W., & Hommes, C. (1998). Heterogeneous beliefs and routes to chaos in a simple asset pricing model. Journal of Economic Dynamics and Control, 22(8–9), 1235–1274.
Camerer, C., & Ho, T.-K. (1999). Experience-weighted attraction learning in normal form games. Econometrica, 67(4), 827–874.
Chen, S.-H., Chang, C.-L., & Du, Y.-R. (2012). Agent-based economic models and econometrics. Knowledge Engineering Review, 27(2), 187–219.
Chen, S.-H., Kao, Y.-H., & Ragupathy, V. (2016). Computational behavioral economics. In R. Frantz, S.-H. Chen, K. Dopfer, F. Heukelom, & S. Mousavi (Eds.), Routledge handbook of behavioral economics (pp. 297–319). London: Routledge.
Chen, S.-H., & Venkatachalam, R. (2017). Information aggregation and computational intelligence. Evolutionary and Institutional Economics Review, 14(1), 231–252.
Clower, R. (1975). Reflections on the Keynesian perplex. Journal of Economics, 35(1), 1–24.
Ellison, G., & Fudenberg, D. (1993). Rules of thumb for social learning. Journal of Political Economy, 101(4), 612–643.
Erev, I., & Rapoport, A. (1998). Coordination, “magic,” and reinforcement learning in a market entry game. Games and Economic Behavior, 23, 146–175.
Erev, I., & Roth, A. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 88(4), 848–881.
Fisher, F. M. (1983). Disequilibrium foundation of equilibrium economics. Cambridge: Cambridge University Press.
Gintis, H. (2006). The emergence of a price system from decentralized bilateral exchange. Contributions in Theoretical Economics, 6(1), 1–15.
Gintis, H. (2007). The dynamics of general equilibrium. Economic Journal, 117(523), 1280–1309.
Gintis, H. (2013). Hayek’s contribution to a reconstruction of economic theory. In R. Frantz & R. Leeson (Eds.), Hayek and behavioral economics, chapter 5 (pp. 111–126). New York: Palgrave Macmillan.
Grimm, V., & Railsback, S. (2005). Individual-based modeling and ecology. New York: Princeton University Press.
Hahn, F., & Negishi, T. (1962). A theorem on non-tâtonnement stability. Econometrica, 30(3), 463–469.
Hayek, F. A. (1945). The use of knowledge in society. American Economic Review, 35(4), 519–530.
Hommes, C., & Zeppini, P. (2014). Innovate or imitate? Behavioural technological change. Journal of Economic Dynamics and Control, 48, 308–324.
Hommes, C. (2006). Heterogeneous agent models in economics and finance. In L. Tesfatsion & K. L. Judd (Eds.), Handbook of computational economics (Vol. 2, pp. 1109–1186). Amsterdam: Elsevier.
Hommes, C. (2011). The heterogeneous expectations hypothesis: Some evidence from the Lab. Journal of Economic Dynamics and Control, 35(1), 1–24.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291.
Kitti, M. (2010). Convergence of iterative tâtonnement without price normalization. Journal of Economic Dynamics and Control, 34(6), 1077–1091.
Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge: MIT Press.
Malinvaud, E. (1977). The theory of unemployment reconsidered. London: Blackwell.
Mandel, A. (2012). Agent-based dynamics and the general equilibrium model. Complexity Economics, 1(1), 105–121.
Mandel, A., Landini, S., Gallegati, M., & Gintis, H. (2015). Price dynamics, financial fragility and aggregate volatility. Journal of Economic Dynamics and Control, 51, 257–277.
Rendell, L., Boyd, R., Cownden, D., Enquist, M., Eriksson, K., Feldman, M. W., et al. (2010). Why copy others? Insights from the social learning strategies tournament. Science, 328(5975), 208–213.
Roth, A., & Erev, I. (1995). Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term. Games and Economic Behaviour, 8, 164–212.
Samuelson, L. (1998). Evolutionary games and equilibrium selection. Cambridge: MIT Press.
Scarf, H. (1960). Some examples of global instability of the competitive economy. International Economic Review, 1(3), 157–172.
Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Tesfatsion, L. (2006). Agent-based computational economics: A constructive approach to economic theory. In L. Tesfatsion & K. Judd (Eds.), Handbook of computational economics: Agent-based computational economics (Vol. 2, pp. 831–880). Amsterdam: North Holland.
Tversky, A., & Kahneman, D. (1991). Loss aversion in riskless choice: A reference-dependent model. Quarterly Journal of Economics, 106, 1039–1061.
Uzawa, H. (1960). Walras’ tâtonnement in the theory of exchange. The Review of Economic Studies, 27(3), 182–194.
Velupillai, K. (2015). Iteration, tâtonnement, computation and economic dynamics. Cambridge Journal of Economics, 39(6), 1551–1567.
Vriend, N. (2000). An illustration of the essential difference between individual and social learning, and its consequences for computational analyses. Journal of Economic Dynamics and Control, 24(1), 1–19.
Wiering, M., & van Otterlo, M. (2012). Reinforcement learning: State of the art. Heidelberg: Springer.
Acknowledgements
We thank the two anonymous referees for their helpful and constructive comments, which have helped us greatly in improving the quality and clarity of the paper. The first and the last author are grateful for the research support in the form of the Ministry of Science and Technology (MOST) grants, MOST 103-2410-H-004-009-MY3 and MOST 104-2811-H-004-003, respectively. We thank Wolfgang Magerl for the able research assistantship in the execution of this project.
Author information
Authors and Affiliations
Corresponding author
Appendices
Implementation Details
The details regarding the implementation of the simulations are provided in this “Appendix”. All simulations and analyses were performed using NetLogo 5.2.0 and Matlab R2015a. The NetLogo interface of our simulations is provided in Fig. 13. To comply with the current community norm, we have made the computer program available at the OPEN ABM.Footnote 25 Figure 13 is the typical NetLogo operation interface. We classify the figure into two blocks. The first block (the left most block) is a list of control bars for users to supply the values of the control parameters to run the simulation. The parameters are detailed in Table 1, including N, M, S, T, \(\varphi \), \(\theta _{1}\), \(\theta _{2}\), K, \(\lambda \), \(\textit{POP}_{RE}\) (defined in Sect. 5.3), and \(\textit{POP}_{a_{il}}(0)\). In addition to these variables, other control bars are the on-off choices for the running model, including individual learning (only), social learning (only), an exogenously given market fraction, and the meta-learning model. For the exogenously given market fraction, \(\textit{POP}_{a_{il}}(0)\) needs to be given additionally.
On the left of the control panels are the real-time demonstrations of the economy under operation. The six subfigures shown in the upper right block are information related to price expectations sustained for a trading day (3) and excess demand and supply settled at the end of a trading day (10). The results are plotted in a time series. The leftmost three subfigures refer to the mean of price expectations (by good and by type), and the middle three subfigures refer to the mean of excess demand and supply (by good and by type).
The top leftmost three subfigures give the summary of the market: prices, quantities, and market fraction (population of innovators). The first one gives the time series plot of the mean of the actual trading prices of goods 1 and 2, denoted by M1_t and M2_t in contrast to its expectations averaged over all agents, M1 and M2 (good 3 serves as the numéraire). The middle one gives the time series of aggregate demand, summed over all agents’ planned demand (4). The third one gives the time series of the fraction of agents who adopt the individual learning scheme.
Immediately below the above nine subfigures is the snapshot distribution (dispersion) of price expectations, displayed by goods. The histogram of good 3 is trivial because it serves as the numéraire. The last two subfigures at the bottom provide the information on the relative price of each pair of goods. On the left is the time series of the mean relative price of each pair of goods and on the right is the time series of the respective standard deviation (price dispersion).
Endogenously Determined Market Fractions
This appendix provides the table describing the simulation results concerning endogenously generated market fractions starting from different initial conditions.
Payoff Inequality and Two Types of Errors
In Sect. 5.8, we have seen that large populations of immobile agents associated with large \(\lambda \)s cause the market mechanism to malfunction due to both the possible presence of ‘type-I’ and ‘type-II’ errors. In this section, we shall have a further look at the relative importance of these two types of errors at the individual level by examining the payoffs to these two types of agents who may contribute to these errors. In Fig. 14, we present the results in parallel to Fig. 11 except that here we only restrict our attention to those innovators and imitators who are immobile. Since there is only a negligible number of immobile agents when \(\lambda < 5\) (Fig. 12, the left panel), we only report the payoffs of these two groups for \(\lambda \ge 5\) in Fig. 14.
As in Figs. 11 and 14 shows that, even for the immobile agents, the imitators’ performance is superior to that of the innovators. In fact, these two figures together show that the payoffs to innovators drop substantially with an increase in immobile agents, whereas the payoffs to imitators are not affected substantially by the prevalence of immobile agents. Therefore, despite our taxonomy of two types of errors, what contributes to the ‘malfunction’ of the market mechanism the most is the ‘type-one error’. This result demonstrates that the market mechanism is a joint function of exploration and exploitation. They help each other, but when one function is not working, exploitation alone can do more harm than exploration alone.
The above finding resonates well with what we have observed in Sect. 5.2, in which one economy has only exploitation (Sect. 5.2.1) and one economy has only exploration (Sect. 5.2.2). The reason why exploitation can do more harm than exploration alone is probably because it results in a lower spread of information and hence prevents markets from pooling information effectively. Nevertheless, we have also seen that the performance of innovators (exploitation) can be generally beefed up when learning from others is possible, i.e., being mobile agents. Indeed, when the market is filled with mobile agents (the case with low \(\lambda \)s), innovators on average perform better than imitators, as shown in Fig. 11 (the sub-figures with small \(\lambda \)s).
The above results also shed light on our earlier result that the presence of general-equilibirum agents does not automatically ensure that all agents will adopt equilibrium prices in our meta-learning model (Sect. 5.3). Why do these ‘superior’ price expectations fail to spread across the whole economy? The reason is due to the existence of immobile agents who not only block themselves away from the adoption of the ‘superior’ belief, but may generate lots of ‘noises’ to prevent others from copying it.
Rights and permissions
About this article
Cite this article
Chen, SH., Chie, BT., Kao, YF. et al. Agent-Based Modeling of a Non-tâtonnement Process for the Scarf Economy: The Role of Learning. Comput Econ 54, 305–341 (2019). https://doi.org/10.1007/s10614-017-9721-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-017-9721-5