# Economies with heterogeneous interacting learning agents

- 284 Downloads
- 3 Citations

## Abstract

Economic agents differ from physical atoms because of the learning capability and memory, which lead to strategic behaviour. Economic agents learn how to interact and behave by modifying their behaviour when the economic environment changes. We show that business fluctuations are endogenously generated by the interaction of learning agents via the phenomenon of *regenerative-coordination*, i.e. agents choose a learning strategy which leads to a pair of output and price which feedback on learning, possibly modifying it. Mathematically, learning is modelled as a chemical reaction of different species of elements, while inferential analysis develops combinatorial master equation, a technique, which is an alternative approach in modelling heterogeneous interacting learning agents.

### Keywords

Heterogeneous interacting ABM Learning Master equations### JEL Classification

C5 C6 D83 E1 E3## 1 Introduction

The asymmetric information revolution challenged the economic profession to rebuild its macro analytical tools upon sound micro-foundations (Stiglitz 1973, 1975, 1976). Statistical Physics offers tools for analyzing systems with heterogeneous interacting agents through the master equation (ME) approach (Weidlich and Braun 1992; Foley 1994; Aoki 1996, 2002; Aoki and Yoshikawa 2006). Social Sciences have a different status, because they analyze “social atoms” (Buchanan 2007), i.e. agents who act strategically in an intelligent way. Statistical Physics tools, such as ME are certainly suitable but, to take care of learning capabilities of social atoms, Combinatorial (Chemical) ME (CME) are more comfortable, as we propose in this paper.

Our model is populated by many heterogeneous interacting agents: their behaviour generates aggregate (emergent) phenomena, from which they learn and to which they adapt. This produces two consequences: (1) because heterogeneity and interaction produce strong non-linearities, aggregation cannot be solved using the Representative Agent framework; (2) the individuals may learn to achieve a state of statistical equilibrium, according to which the market is balanced but the agents can be in disequilibrium.

Individuals follow simple rules, interact and learn. We model the reactions of other agents to an individual’s choice of actions. In the words of Kirman (2012): “we can let the agent learn about the rules that he uses and we can find out if our simple creatures can learn to be the sophisticated optimisers of economic theory.”

To move the first steps into this direction, we use a simplified version of Greenwald and Stiglitz (1993) model with learning agents. Learning capabilities are represented by a set of rules to model learning behaviour as concerning the output strategy given the actual net worth of a firm and the market-price level. To couple with it, we introduce a method to get analytic solutions to heterogeneous interacting and learning agents models by a CME (Nicolis and Prigogine 1977; Gardiner and Chaturvedi 1977; Gardiner 1985) for the distribution of firms on the behavioural state space (the learning rules: see Sect. 3).

Allowing for learning might lead to a phenomenon we call *re-configurative learning*: within a certain state of the market a rule (say \(j\)) becomes dominant (i.e. most of the agents adopt it) when a critical mass of the agents uses it since they find it the most profitable one. But that rule is the most profitable according to the previous market conditions: when most of the agents move toward the winning rule they produce, e.g., more lowering the aggregate price. At the new price, rule \(j\), may become not “optimal” and agents start adopting a new rule, say \(i\). If it becomes dominant at its turn, aggregate output and price will be affected, causing another phase transition.^{1}

All in all, we might say that the short term success of a strategy leads to its medium term failure because of the phase transitions produced by agents’ behaviour.

In this model, firms’ population is financially heterogeneous. This might be modeled by a ME, which describes the dynamics of the probability distribution of the population over states of financial soundness. What comes at hands is an analytic model for a dynamic estimator, together with its volatility, for the expected concentration of the heterogeneous class of firms in the system. This is found as the general solution of an ordinary differential equation (*the macroscopic equation*) for the expected value of the a state distribution, which depends on the transition rates, involved in the ME to model (mean-field) interaction, and the initial condition.^{2}

Firms are characterized by a second kind of heterogeneity due to the learning rule. This can be modeled by means of a Combinatorial ME (CME). According to a metric which compares profits of two learning rule at time, a flow of firms from one rule to another is obtained. The solution the CME provides is a tool distributing a volume of firms over a set of rules. The model provides a differential equation for the probability distribution of agents over a set of behavioural states, each characterised by a given learning rule, and some characteristic levels for observables involved in the model. The ME provides an analytic model to describe the dynamic behavior of a complex system whose constituents perform non-linear behavior: it is an inferential methodology which allows finding the estimator of the expected value of any transferable quantity in the system.

Even though one knew the equations of motion of all the observables characterising every single agent in the system, she would not be able to manage an analytic solution if some non-linearity came into play and if the equation were coupled. The ME approach ends up with a small system of differential equation to model drifting dynamics and spreading fluctuations it.

The paper is organised as follows. Section 2 describes the economic model. Section 3 develops the learning mechanism and the rules agents behave with and how they choose among them through time (analytics inside profit curves and their maximisation are developed in “Appendix A” and B, respectively). Section 4 deals with CME to model learning at mean-field level in order to make inference from the aggregate simulation data and macro-dynamics. Section 5 comments ABM simulation results and the inferred macro-dynamics by using the CME set-up. Section 6 concludes.

## 2 The model

Our closed economy without Government is populated by \(I\) heterogeneous firms producing the same perishable good, \(Q\), using labour, \(l\), as the only input (provided by the \(I\) households at the given wage level \(w\)), according to a financially constrained production function (\(A\) is the net worth of the firm: see Delli Gatti et al. 2010), and one bank which supply the credit the firms demand for, at the constant interest rate \(r\), and pays no interests on deposits.

*self-financed*(

*SF*, or

*hedge*, in Minsky’s jargon) if she can pay the wage bill fully with its own financial resources, \(A(i,t)\ge W(i,t)\), otherwise the firm is

*not self-financed*(

*NSF*, or

*speculative*, in Minsky’s jargon), \(A(i,t)<W(i,t)\).

the firm goes bankrupt when \(A(i,t)\le 0\) or when \(Q(i,t)=0\);

the number of firms is constant, such that there exists an entry-exit mechanism 1 to 1, and the new entry \(a\) is randomly assigned within the range 0–20.

## 3 Learning

This is a model in which agents are supposed to learn. In particular, they set the value of \(\alpha \), the “financial” parameter of Eq. (1).^{3} We allow them to freely chose between a certain set of rules (7; of course the list is far from being exhaustive and we get rid of the phenomenon of *learning to learn*, but the 4 branches are exhaustive: Sargent 1993; Kirman 2011) whose reinforcement, or not, is given by own return on investment (profits) or by imitation, and to move from one rule to another without costs. Once a certain rule is adopted, i.e. the financial parameter is set, a firm produce and bring its output to the market, where aggregate demand meets aggregate supply, providing the price of it. Once the idiosyncratic shock is taken into account, the rate of change of profit is evaluated corroborating or not previous decisions. If firms are satisfied with the pace of profits, they hold the rule, otherwise they shift to a new one.^{4} If there exist more than one rule with the same expected equity, the firm chooses the simplest one.

*Non-interactive without learning*Firms set a value which never updates even though the system changes: \(\alpha [1]\);

Firms set \(\alpha \) as a random variable: \(\alpha [2]\); it equals the previous period own \(\alpha \) plus a \(\pm 30\,\, \% \) change.

*Non-interactive with learning*Firms set \(\alpha \) following a profit maximising rule: \(\alpha [3]\) (see “Appendix B”);

\(\alpha [4]\) average of the firm’s historical values \(\alpha (i,t-s)\) in the last \(\tau \) periods with positive profit.

*Learning with global interaction*\(\alpha [5]\) average value \(\sum _i {{\alpha (i,t-1)}/I} \) over all the firms in the previous period; eventually firms copy it.

*Learning with local interaction*The firm randomly chooses \(M\) firms from its neighbourhood, i.e. among those in the same condition (NSF or SF), collecting information about past period values \(\{\alpha (i_m ,t-1)\}\) and sets \(\alpha [6]\) to the average;

The firm looks at its own subgroup (NSF or SF) and uses the ratio of profit to equity to measure other firms performance. The firm calculates its own \(\alpha [7]\) as the average value of the best performers parameter values.

*mutation*of the system when the interactions leads to some critical point. For instance, when a considerable (

*a critical mass*) group of firms concentrates on a behavioural strategy, the system undergoes a phase transition. The market price is the driving force

*(pilot-quantity*, in Physics) of the aggregate dynamics, because it embeds behavioural heterogeneity, \(\{\alpha (i,t)\}\), and endowments heterogeneity, \(\{A(i,t)\}\), of all agents.

This allows us to go into a more deep understanding of the notion of *re-configuration*. If a mass of firms is adopting a given behaviour (*rule*), what is \(a\) successful strategy becomes *the* winning strategy up to a *critical mass *of firms adopt it. The convergence to the new output level changes the price, i.e. destroys the environment which allowed to adopt the winning rule and a new one may enter the drama: in a way, the success of a rule destroys the success itself and a new “equilibrium” is ready to enter, i.e. it leads to different configuration.^{5} Those firstly experiencing a different strategy improving production efficiency might realize better performances inducing other firms to do the same time by time. Therefore, the system itself is going to change its “learning-induced” configuration destroying that regularity it created to assume a new configuration.

## 4 The CME and the mean-field learning

Different agents species live in the system characterised by heterogeneity in endowments, w.r.t. the state of financial soundness \(\varsigma \in \Sigma =\left\{ {\varsigma _k :k\le S} \right\} \), and in behavioural strategies, \(\lambda \in \Lambda =\{\lambda _h :h\le K\}.\, \Xi =\Lambda \times \Sigma ={\xi _{j}=\lambda _{h}\wedge \zeta _{k}}\) qualifies the species: \(x_i (\lambda _h \wedge \varsigma _k ;t)\equiv \xi _j \) means the \(i\)-th agent is a firm of \(j\)-th species being in the \(k\)-th state of financial soundness while adopting the \(h\)-th rule in scheduling output. The occupation number \(I_j (t)\) evaluates the concentration of \(j\)-th species, how many firms belong to the \(j=j(h,k)\)-th state on \(\Xi \). Since the total number of firms is assumed to be constant, the vector \(\mathbf{I}(t)=\left( {I_1 (t),\ldots ,I_J (t)} \right) \), with \(J=KS\), gives the configuration of the system such that the total number of agents is conserved through time. The following sections develop a mean-field approach in a combinatorial (chemical) interaction framework to take care of the learning mechanism to infer a model for the dynamics of species in the system over \(\Xi \).

By following an analogy with chemical reactions, a simplified description of combinatorial (chemical) interactions is introduced to model learning at aggregate level.

At a mean-field (i.e. aggregate) level, reactant \(L_h\) represents the species of those firms scheduling output according to the \(h\)-th rule while being in a given state of financial soundness. A ’simple’ interaction between two species \(L_h\) and \(L_k\) is a reaction channel: \(L_h +L_k \equiv \rho _k (L_h )\) where \(L_h\) is here called the ‘effective reactant’ and \(L_k\) is the ‘virtual’ one.

At the social atoms level the learning mechanism is a procedure of \(K\) steps, each of which tests a single output scheduling strategy \(\lambda _k\) along a small ’test-period’ \([t+(k-1)dt,t+kdt)\): the sequence of all such periods is said the ’learning period’ \([t,t+\Delta )\). The firm starts at \(t\) being \(i\in L_p \), i.e. scheduling output according to rule \(\lambda _p \), and at \(t+\Delta \) it ends choosing to be \(i\in L_q \): if \(L_q =L_p \) then the firm has learnt that for the moment, among all the rules, it is better to maintain the rule its was behaving with, if \(L_q \ne L_p \) the firm has learnt it is better to change accordingly. The firm makes this decision after the learning mechanism has been completed passing through \(K\) learning steps along the ‘learning period’: in this paper it coincides with a single time step in ABM–DGP of Sect. 2.^{6}

To describe the mechanism assume being at the \(k\)-th step and that a firm is temporarily found to be \(i\in L_h \). Now it interacts with the species of those behaving with \(\lambda _k\) before choosing to maintain its own rule for the next step or to switch it so that \(i\in L_k \). The same applies to mean-field reactants: interaction is between two species; this is here developed according to a chemical reaction representation to describe the learning mechanism as a ‘complex reaction’, that is an ordered sequence (or chain) of simple interactions between two species at time with different concentrations.

The learning mechanism of Sect. 3 develops along the learning-period partitioned into \(K\) sub-intervals of length \(dt, [t+(k-1)dt,t+kdt)\), called the test-period for rule \(\lambda _k \): each effective reactant tests all the \(K\) rules, one after the other. Along the \(k\)-th test-period \(L_h\)(effective) interacts with \(L_k\)(virtual), the event returns an outcome called the ‘product’: it might be \(L_h\) or \(L_k\), whatever it is it becomes the effective reactant for the next step along \([t+kdt,t+(k+1)dt)\). Hence, along the interaction chain the effective reactant *may* change while the virtual reactant *must* change: this is because the firm may temporarily switch or not its rule while testing all the behavioural possibilities before making the final decision at the end of the learning period. According to the chemical reaction formalism \(L_h +L_k\) is therefore a simple interaction or a ‘learning-channel’, \(\rho _k (L_h )\), characterized by the virtual species to interact with at the \(k\)-th step. Therefore, the index \(k\) points to the output scheduling strategy \(\lambda _k \)defining \(L_k\) and it is also called the ‘degree of advancement’ in the interactions chain \(L_p +\{L_k \}\).^{7} The outcome of the learning mechanism is \(L_p +\{L_k \}\rightarrow L_p +L_q \): in case of maintenance it reads as \(L_p +L_q \), in case of switching it reads as \(2L_q \). Anyway, before the end of the learning-period neither the observer nor the social atom know what the outcome would be: while learning the social atoms live in sort of superimposition of states but, at the end, one and only one outcome will realize. So, the best thing one can do is to develop a probabilistic model to estimate probabilities for the outcomes: \(w_\varsigma (p,q\ne p;t+\Delta )\) for the final decision to switch the strategy from \(\lambda _{p}\) to a different \(\lambda _q \) and \(w_\varsigma (p,q=p;t+\Delta )\) for maintenance of the previous strategy. These probabilities need to take care of the whole learning steps before making the decision, hence their specification depends on a probabilistic model for simple interactions, the simple fragments of the chain.

As an example, assume \(L_p +\{L_k \}=L_p +L_7 \) is realised according to the path of \(K=7\) steps shown if Fig. 1: (a) \(L_p +L_1 =2L_1 \), (b) \(L_1 +L_2=L_1 +L_2 \), (c) \(L_1 +L_3 =2L_3 \), (d) \(L_3 +L_4 =L_3 +L_4 \), (e) \(L_3 +L_5 =2L_5 \), (f) \(L_5 +L_6 =L_5 +L_6 \) and (g) \(L_6 +L_7 =2L_7 \). The final outcome is known to be reached passing through the sample path made of four temporary switching events (\(a,c,e,g\)) and three maintenance events (\(b,d,f\)). Maybe one would have thought that its probability is \(r_{p1|1} r_{11|2} r_{13|3} r_{33|4} r_{35|5} r_{55|6} r_{57|7} \), but this is not correct: indeed, this is just one of the many possible paths connecting \(L_p \) to \(L_7 \). Therefore, since nobody knows what is the learning path a social atom is following,^{8} to find \(w_\varsigma (p,7;t+\Delta )\) one should consider all such feasible paths the learning process might follow to become \(L_7\) from being \(L_p \): the probability is therefore give by (50) of “Appendix C”.

Consider another example: the aim is to estimate the probability \(w_\varsigma (p,2;t+\Delta )\) to become \(L_2\) at \(t+\Delta \) being \(L_p\) at \(t\); whatever \(L_p\) is, all the possible paths leading to \(L_2\) are considered.

Following the paths \((L_p +L_1 =2L_1 )-(L_1 +L_2 =L_1 +L_2 )-(L_1 +L_k =\cdots )\) and \((L_p +L_1 =L_p +L_1 )-(L_p +L_2 =L_p +L_2 )-(L_p +L_k =\cdots )\) the product \(L_2\) cannot be reached: both paths explain that when \(L_2\) is met it is rejected to proceed further, hence these paths do not contribute to the estimation of \(w_\varsigma (p,2;t+\Delta )\). On the other hand,\((L_p +L_1 =2L_1 )-(L_1 +L_2 =2L_2 )-(L_2 +L_k =\cdots )\) and \((L_p +L_1 =L_p +L_1 )-(L_p +L_2 =2L_2 )-(L_2 +L_k =\cdots )\) explain that when \(L_2\) is met it is maintained. Accordingly, the probability is \(w(p,2)=H_{p,2} \left[ {r_{pp|1} r_{p2|2} +r_{p1|1} r_{12|2} } \right] \prod _{k\ge 3} {r_{22|k} } \) as shown in (45) of “Appendix C”, where the constraint exclude those paths not contributing to the estimation.^{9}

Mean-field interactions have been described in Sect. 4 to represent the learning mechanism in mean-field terms. Interactions^{10} are now developed in terms of what in literature is known as *Combinatorial Kinetics*.^{11} Learning essentially implies moving on \(\Lambda \) so that concentrations change, therefore transformation probabilities are transition probabilities over \(\Lambda \) due to interactions between reactants.

^{12}It is now worth considering that the economic model described so far, as well as the learning mechanism involved, both concern a set of \(C=K\) interaction channels like those in (9). Accordingly, taking care of all the interaction channels, the compact form combinatorial master equation is

^{13}and, as known, it can be used to evaluate the probability for a given configuration realisation. That is, if \(\mathbf{n}^{h}=\mathbf{n}^{h-1}+\upsilon _k \mathbf{u}^{h}\) then

^{14}.

*Poisson representation*developed by Gardiner and Chaturvedi (1977) and Gardiner (1985): following statistical-mechanic reasoning on the canonical ensemble, the authors show that the technique expands \(P(\mathbf{n},t)\) in Poisson distributions along each coordinate and a Fokker–Planck equation can be obtained to approximate the combinatorial master equation. In the present framework, the expected value \(m_h =\left\langle {n_h } \right\rangle \) plays the same role the macroscopic equation plays in the system size expansion due to van Kampen (2007) and developed by Aoki (1996) as the macroeconomic equation: the next section provides a model for this expectation.

^{15}of the transition matrices \({\hat{\mathbf{W}}}_\varsigma \) and \({\hat{\mathbf{G}}}_\varsigma \) are obtained as time averages to be used to set a dynamic model, hence (28) gives

## 5 Dynamics and re-generative learning using ACE and CME

Once one introduces heterogeneous and interacting agents, two different approaches has been developed in the ABM literature. One is called ACE (Agent Computational Economics) based on computer simulation (see Tesfatsion and Judd 2006; Delli Gatti et al. 2010); the other, ASHIA (Analytic Systems with Heterogeneous Interacting Agents) derives from the statistical physics and analyzes economic agents as interacting intelligent atoms (recently Alfarano et al. 2005; Di Guilmi et al. 2011). A new strand goes beyond it, introduces the mechanism of learning: this paper is a first pace in that direction.

In the following, we will describe the dynamics of the AB system, the proximate cause-effect relations and the links between fluctuations and learning; moreover, we show using the analytical tools of Sect. 4, that it is possible to get rid of millions of equation by the meso-foundation provided by the CME.

Figure 3 shows aggregate time series from the ABM–DGP with \(N=1{,}000\) firms along \(T=1,000\) periods estimated with 50 Montecarlo runs with the following parameter values: \(A(i,0)\mathop {\longrightarrow }\limits ^{iid}U(0,20), \beta =0.5, \gamma =0.8, \delta =1.4, w = 0.8\) and \(r=5\,\% \).

^{16}We believe there are at least three causae for it:

the NSF perform well in learning activity because they have to be quite aggressive in production to look for very performing strategies;

there exist implicit bankruptcy costs;

the empirical evidence tells us of growth rate as Laplacian distributed because of the different behaviour of firms of different sizes: smaller firms’ (de)growth is fat tail distributed.

*regime*as a sub-period characterized by a given dominant configuration and a

*phase*as a sub-period along which a quantity \(Z\) is found in a subset of states with specific qualitative meaning (e.g. expansion or recession).

^{17}Accordingly, the system can face both regime and phase transitions, as emergent phenomena. Moreover, since the individual behaviour is complex (involving heterogeneity, interaction and learning), and due to the external field effects (market price and profitability), it can happen that even those phases synchronized with some regimes along certain sub-periods can be found in different combinations along other sub-periods (Fig. 2).

Aggregate states of the system matter, as well as the effects of the environment, but learning agents change their behaviour through time. Therefore the same subset of conditions assumes different relevance: if a subset of firms found convenient behaving in a certain way when they *were* NSF, the same behavior is not expected to be convenient as well when they *are* SF.

According to Fig. 3, SF firms change frequently but choosing between only two configurations (3764215, 7364215) which are almost the same configuration but a switch in the first two rules, while NSF firms choose among seven configurations (2573416, 2573461, 257416, 2574361, 5273416, 5274316, 5274361): hence, we may say that the NSF sub-system is more active.

Let’s also note that there is a discrepancy between current production and the aggregate profit: there is no guarantee that the “optimal” behavior of the individual agent leads to the welfare of society.

the state of financial soundness, weak heterogeneity, matters and that the difference in firms’ behavioral attitudes, strong heterogeneity, conditioned on the state of financial soundness, is due to different outcomes from learning activity.

the expected NSF scheduling parameter triples the SF one: this aspect allows to conclude that NSF firms are more “aggressive” than SF ones because the vital impulse of NSF push them mainly to recover their financial fragility seeking for profit-improving behavioral rules while SF firms are more prudential.

Figure 3 shows NSF profit and output is about 65 % of totals while NSF and SF concentrations are balancing (48 vs. 52 %); therefore,

*the social-welfare of the system is sustained by the more active and lively firms*. This finding can be read in a different way. The sounder the financial health the less the incentive to change: if SF were the majority in the system, due to this rigidity, the system itself would have been more exposed to adverse phase-transitions due to a low resilience capability which, being more prone to change, pertains NSF firms.

Time averages expected concentrations in each behavioural state given the state of financial

\(\Xi =\Sigma \times \Lambda \) | \(\lambda _1 \) | \(\lambda _2 \) | \(\lambda _3 \) | \(\lambda _4 \) | \(\lambda _5 \) | \(\lambda _6 \) | \(\lambda _7 \) | Tot. |
---|---|---|---|---|---|---|---|---|

NSF | 11.41 | 11.86 | 15.66 | 19.69 | 20.81 | 6.94 | 13.65 | 100 |

SF | 9.15 | 8.73 | 19.33 | 20.37 | 19.13 | 8.11 | 15.18 | 100 |

From Table 1 NSF dominant rules are: \(\lambda _5 \succ \lambda _4 \succ \lambda _3 \succ \lambda _7 \succ \lambda _2 \succ \lambda _1 \succ \lambda _6 \). As regards SF: \(\lambda _4 \succ \lambda _3 \succ \lambda _5 \succ \lambda _7 \succ \lambda _1 \succ \lambda _2 \succ \lambda _6 \).

According to our results, NSF firms are prone to interaction, although in a weak sense, because they behave in order to improve their financial soundness avoiding the risk of bankruptcy; SF firms are more individualistic and precautious by looking at their past or profit maximising to maintain their status more than improving their richness (which improves through the St.Matthew effect on profits, being subject to a multiplicative shock).

The left and right top panels of Fig. 4 show that in the beginning NSF are more concentrated on rule \(\lambda _5 \) but, as time goes by, they become diffused on rules \(\lambda _3 , \lambda _4 \) and \(\lambda _5 \); on the contrary, SF firms begin almost spread over rules \(\lambda _3 , \lambda _4 , \lambda _5 \) and \(\lambda _7 \) while, at the end, they concentrate mostly on \(\lambda _4 \). The NSF propensity to diversification against the propensity to concentration of SF might be interpreted as need for NSF to put forward more and more behavioural attitudes to improve the state of financial fragility while SF firms seem to have reached a satisfactory configuration. Even though the SF firms are the majority (on average about 52 %), the share of SF output is lower than the NSF one (65 %). This shows NSF firms are more active than SF ones because their aim is to improve as fast as possible their financial soundness: the improvement follows by increasing equity, which is possible only with an increase in profits, but profits increase only if output increases. Therefore, NSF firms try to do their best to increase output to become SF in the short run, nevertheless they did so as reasonably as possible: this is shown by the more diversified portfolio of behaviours they behave with. On the other hand, SF firms have been found to be more precautious in preserving their status: they have less interest in increasing output to become richer preferring to remain self financing with profits’ marginal increments.

Still considering Fig. 4, when NSF output is fairly above the upper confidence band then the density of NSF has a peak and an increase in the concentration of NSF firms on the dominant rules is observed. When output is within or below the bands there are periods with smaller peaks or with peaks toward the minimum, respectively, corresponding to periods in which the distribution of firms spreads more uniformly.

When the NSF density increases firms concentrate on those rules they find more profitable to improve their financial soundness and this determines increments in output. These periods are almost short and essentially dominated by trend inversions in the market price dynamics, which drives the share of NSF firms with a high correlation.

Being the market price determined by total previous period demand (3) the learning activity on the output scheduling parameter affects both total demand and output. Since labour demand is a function of the scheduled output, an increase of total demand is itself a consequence of an increase in total output: in the end, the price increases when the learning activity push firms to increase their output. There is therefore a cyclical effect between learning activity on output and demand, of output on price and of price on the share of NSF.

These cyclical effects determine phase transitions of the system from increasing to decreasing periods of NSF firms concentration on the behavioural rules space. Phases, in which the profitability of some dominant rule polarises the volume of firms, then realise inducing a gradual increase in output up to a certain level which needs of an increase of labour (total) demand lower than the increase in output. When this happens it determines a downturn for the price and, as a consequence, for the density of NSF and of their concentration on dominant rules. Accordingly there is a transition to a period along which firms are less concentrated on dominant rules to spread more uniformly over the behavioural states space. In case of SF firms there is essentially a similar but opposite mechanism: when the concentration on the dominant rules is higher this corresponds to positive peaks of SF density (i.e. negative peaks of NSF density) but, differently from the NSF case, this is associated to downturns in output corresponding to increasing price periods.

This representation confirms the state of financial soundness makes a big difference in firms behaviour, as it has been found for their behavioural preferences.

## 6 Conclusive remarks

In a socioeconomic complex system, as an ensemble of feedbacks between individual behaviours and emerging regularities, no precise prediction can be made but inferential in the view of what has been called *regenerative learning*. Both Montecarlo simulations of an ABM and ME techniques confirmed that regimes of dominant behaviours configurations grow into a regularities destroying themselves and reconfiguring the system into newer ones, inducing system phase-transitions. In such a framework, unpredictability of system dynamics has been found due to learning capability of economic learning agents interacting with one another and with their environment which is driven by those force-fields (i.e. market price and profits in our model).

Heterogeneity and interaction have been found to be individuals’ entangled characteristics which cannot be treated to macroscopic inference in the representative agent framework, new tools are needed to manage the aggregation problem.

Learning capability emphasizes the coexistence of multiple equilibria for a system whose equilibrium is not a point in the space, where opposite forces balance, but a probability distribution over a space of behaviours and characteristics.

Allowing agents for learning enhance the ontological perspective in complex systems theory by qualifying an agent as an intelligent one. The paper shows that the learning agent is not an isolated *homo oeconomicus*, since she cares of the others and of the environment she belongs to. Intelligent agents learn, and by learning they modify the system. All in all, she is different from her natural counterpart, the atom, since she behaves the way she wants and not in the only way she must. This is not an irrelevant detail because it requires the social scientist not to draw analytic tools from hard sciences as they are but it compels to suitably adapt techniques to social phenomena, or finding newer and sounder ones.^{18}

In this respect, the present paper aims to promote, stimulate and, maybe, move forward in the research stream opened by Masanao Aoki about thirty years ago in socioeconomic complex systems analysis.

## Footnotes

- 1.
Note that business fluctuations are due to the idiosyncratic price shocks and the endogenous self organization of the market.

- 2.
For the sake of simplicity this modeling has not been considered here: the reader is referred to Delli Gatti et al. (2012).

- 3.
\(\alpha \) can be considered as a “financial” parameter since it represents the

*leverage*, i.e. the ratio between external and internal financial needs. - 4.
Firms may change their behavior because the price change, i.e. they take into account the Lucas’ critique.

- 5.
As it will be shown, a configuration is a string of rules’ codes ordering the production strategies according to their diffusion degree, from the most diffused to the lest diffused rule.

- 6.
In the ABM–DGP \(\Delta =1\), that means that two adjacent dates are separated by a time-span of length \(\Delta \). Since in the present paper time has no specific relevant meaning, \(\Delta =1\) is only the simulation reference unit of time, it might be a quarter of a year or a year or whatever.

- 7.
By analogy with chemical reactions it is a “progress variable”, or “degree of advancement”, as described in de Groot and Mazur (1984) p. 199, see also van Kampen (2007) p. 168. Therefore, the complexity degree of rule \(\lambda _k \in \Lambda \) is the index \(k\le K=|\Lambda |\) labelling the \(k\)-th rule the social atom is testing while learning.

- 8.Having set up an ABM one can certainly take note of each single step each single agent is taking while learning. But this would be very time and memory consuming, even with not so huge systems like the one here involved with 1,000 firms and 1,000 periods. Moreover, in the end, it would be useless because what needed is an inferential approach, like the Statistical Physics one: taking care of all the positions on the learning space would be like integrating the motion differential equations for particles in a complex system, which is almost an impossible task.
- 9.
The formulae presented in “Appendix C” have been analytically obtained by involving a suitable algebraic method: its development is far beyond the aim of the present paper; notes are available by the authors.

- 10.
The reader might refer to Gardiner (1985) chapter 7, from Sects. 5 to 7, for a rigorous development of the following exposition which aims to resemble the main features of the

*Poisson representation*technique for the many variable birth-death systems in terms of combinatorial kinetics. See also Gardiner and Chaturvedi (1977) for an early exposition of the technique. - 11.
This term is due to Gardiner and Chaturvedi (1977) to extend the field of

*Chemical Kinetics*. The reader interested in Chemical Physics and Physical Chemistry, upon which the following development is based, is suggested to refer also to McQuarrie (1967) and Gillespie (2007), and references cited therein. To appreciate the probabilistic and combinatorial nature of these disciplines, and for extensions of the tools in other fields of applicability, an important reference is Nicolis and Prigogine (1977). - 12.
According to Gardiner (1985) combinatorial transition rates are usually not explicitly time dependent. In the present paper time dependence is maintained to take care that the configuration of the system changes \(\mathbf{I}_\varsigma (t)=\mathbf{n}\) due to learning, but time is considered as a sequential parameter.

- 13.
Note that \(P_e ({\bullet } ;t):\chi \rightarrow [0,1]\) is the stationary solution where time is an indexing parameter as in transition rates \(T_k^\pm ({\bullet } ;t)\). The interest on time indexing is essentially motivated by the fact that the present modelling is grounded on an ABM–DGP where time is an iteration counter.

- 14.
See van Kampen (2007), page 168, for a geometric interpretation on this issue.

- 15.
The ergodic property is here conceived very loosely: basically, estimates of \(\mathbf{W}_\varsigma (t)\) have been found stable through time such that their series can be likely substituted with the time average; it has also been found the standard deviation is very small.

- 16.
For a given subsystem of SF or NSF firms, a

*dominant configuration*is a combination of behavioural rules that, at \(t\), concentrates fractions of a given quantity, say \(Z\), from the highest to the lowest share. As regarding the number of firms, \(Z=I\), the*diffusion-dominance*of a certain rules’ configuration allocates the highest shares of firms into behavioural states \(\lambda \in \Lambda \). If \(Z=A,Q,W,\Pi \)*effects-dominance*of a rules’ configuration identifies what should have been chosen to get the collective optimal configuration as regarding a given quantity. - 17.
Regimes concern dominance while phases concern the state levels of aggregate quantities.

- 18.
In order to fully appreciate the consequence of introducing learning in a complex system, let concentrate on the effect of a policy, say an easing of the monetary policy, i.e.. a reduction of the rate of interest. The share of SF firms will increase: resilience will be strengthed but the pace of growth could be modest; those effects themselves will depend on the S,s of the system; agents will change their behavior, according to the prescription of the Lucas critique.

- 19.
As regarding the profit maximizing rule \(3\), it can also be seen as function of the control (output scheduling) parameter \(\alpha \).

- 20.
Profit curves and their maximisation w.r.t. equity (state variable) previously developed is different from profit maximisation w.r.t. the scheduling parameter (control parameter): the former concerns the overall economic interpretation, the latter concerns the specific profit maximisation rule, which aims to set an optimal value for the control parameter.

- 21.
This means that equity is conditioning profit through the scheduling parameter, that is \(\Pi (\alpha |A)\).

- 22.
For the ease of exposition time and financial fragility state are suppressed therefore, from here on, all the quantities must be considered as time dependent in every state of financial fragility.

## Notes

### Acknowledgments

The authors thank an anonymous referee for his remarks; Patrick Xihao Li, Corrado di Guilmi and participants to the EEA conference, NY May 2013, PRIN Bologna, June 2013, for suggestions; the support of the Institute for New Economic Thinking Grant INO1200022, and the EFP7, MATHEMACS and NESS, is gratefully acknowledged.

### References

- Alfarano S, Lux T, Wagner F (2005) Estimation of agent-based models: the case of an asymmetric herding model. Comput Econ 26(1):19–49CrossRefGoogle Scholar
- Aoki M (1996) New approaches to macroeconomic modelling. Cambridge University Press, CambridgeGoogle Scholar
- Aoki M (2002) Modelling aggregate behaviour and fluctuations in economics. Cambridge University Press, CambridgeGoogle Scholar
- Aoki M, Yoshikawa H (2006) Reconstructing macroeconomics. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- Buchanan M (2007) The social atom: why the rich get richer, cheaters get caught, and your neighbour usually looks like you. Bloomsbury, LondonGoogle Scholar
- Delli Gatti D, Gallegati M, Greenwald B, Russo A, Stiglitz JE (2010) The financial accelerator in an evolving credit network. J Econ Dyn Control 34:1627–1650CrossRefGoogle Scholar
- Delli Gatti D, Di Guilmi C, Gallegati M, Landini S (2012) Reconstructing aggregate dynamics in heterogeneous agents models. A Markovian Approach, Revue de l’OFCE 124(5):117–146Google Scholar
- Delli Gatti D, Fagiolo G, Richiardi M, Russo A, Gallegati M (2014) Agent based models. A Premier (forthcoming)Google Scholar
- de Groot SR, Mazur P (1984) Non-equilibrium thermodynamics. Dover Publication, New YorkGoogle Scholar
- Di Guilmi C, Gallegati M, Landini S, Stiglitz JE (2011) Towards an analytic solution for agent based models: an application to a credit network economy. In: Aoki M, Binmore K, Deakin S, Gintis H (eds) Complexity and institutions: markets, norms and corporations. Palgrave Macmillan, London, IEA conference, vol. N.150-IIGoogle Scholar
- Feller W (1966) An introduction to probability theory and its applications. Wiley, New JerseyGoogle Scholar
- Foley DK (1994) A statistical equilibrium theory of markets. J Econ Theory 62:321–345CrossRefGoogle Scholar
- Gardiner CW (1985) Handbook of stochastic methods. Springer, BerlinGoogle Scholar
- Gardiner CW, Chaturvedi S (1977) The Poisson representation I. A new technique for chemical master equations. J Stat Phys 17(6):429–468CrossRefGoogle Scholar
- Gillespie DT (2007) Stochastic simulation of chemical kinetics. Annu Rev Phys Chem 58:35–55CrossRefGoogle Scholar
- Greenwald B, Stiglitz JE (1993) Financial markets imperfections and business cycles. Q J Econ 108(1):77–114CrossRefGoogle Scholar
- Godley W, Lavoie M (2007) Monetary economics. An integrated approach to credit, money, income, production and wealth. Palgrave MacMillan, BasingstokeGoogle Scholar
- Kirman A (2011) Learning in agent based models. East Econ J 37(1):20–27CrossRefGoogle Scholar
- Kirman A (2012) Can artificial economies help us understand real economies. Revue de l’OFCE, Debates and Policies 124Google Scholar
- McQuarrie DA (1967) Stochastic approach to chemical kinetics. J Appl Probab 4:413–478CrossRefGoogle Scholar
- Nicolis G, Prigogine I (1977) Self-organization in nonequilibrium systems: from dissipative structures to order through fluctuations. Wiley, New JerseyGoogle Scholar
- Sargent T (1993) Bounded rationality in macroeconomics. Clarendon Press, OxfordGoogle Scholar
- Stiglitz JE (1973) Taxation, corporate financial policy and the cost of capital. J Public Econ 2(1):1–34CrossRefGoogle Scholar
- Stiglitz JE (1975) The theory of screening, education and the distribution of income. Am Econ Rev 65(3):283–300Google Scholar
- Stiglitz JE (1976) The efficiency wage hypothesis, surplus labour and the distribution of income in L.D.C’.S. Oxford Econ Papers 28(2):185–207Google Scholar
- Tesfatsion L, Judd KL (2006) Agent-based computational economics. In: Handbook of computational economics. Handbooks in economics series, vol 2. North Holland, AmsterdamGoogle Scholar
- van Kampen NG (2007) Stochastic processes in physics and chemistry. North-Holland, AmsterdamGoogle Scholar
- Weidlich W, Braun M (1992) The master equation approach to nonlinear economics. J Evol Econ 2(3): 233–265Google Scholar