Smoluchowski Equation for Networks: Merger Induced Intermittent Giant Node Formation and Degree Gap

The dynamical phase diagram of a network undergoing annihilation, creation, and coagulation of nodes is found to exhibit two regimes controlled by the combined effect of preferential attachment for initiator and target nodes during coagulation and for link assignment to new nodes. The first regime exhibits smooth dynamics and power law degree distributions. In the second regime, giant degree nodes and gaps in the degree distribution are formed intermittently. Data for the Japanese firm network in 1994 and 2014 suggests that this network is moving towards the intermittent switching region.


Introduction
The Smoluchowski equation was developed to describe the distribution of clusters containing specific numbers of particles formed through coagulation processes conserving mass [1]. This has been used widely in many fields such as colloid, aerosol, virus, fish school, and asteroid [2][3][4][5][6][7][8][9][10][11][12][13][14]. The approach was generalized by Miura et al. in [15] to analyze the formation of networks with application to the Japanese firm network. A model (MTT model) is proposed, which encompasses three processes of node creation, annihilation, and coagulation, together with the preferential attachment rule [16,17], that is, a new link is more likely to be attached to a node having a larger number of links, so that a new node will be connected to node i, depending on the connectivity k λ i of that node. The third process is intended to represent mergers and acquisitions amongst firms. The behavior differs from the traditional Smoluchowski method since not only particles (i.e. nodes) but also links must be taken into account. Coagulation is studied from the viewpoint of the number of links (i.e. degree) connected to the nodes. The stationary state of the model exhibits a power law behavior for the degree distribution with exponents that depend on model parameters which allows the model to replicate the observations on the Japanese inter-firm transaction data for which the distribution of the number of transaction partners follow a power law with a cumulative exponent 1.3 ± 0.1 (for probability density, the exponent is 2.3±0.1) [18] (see Appendix A). The previous study was restricted to replicating the distribution of the network by making use of parameter values for both preferential attachment exponents (i.e. for a newcomer at creation process and for an acquirer at coagulation process) as well as the occurrence probabilities of node creation, annihilation, and coagulation. Here, we broaden the scope in order to center on the dynamics of the network formation and the emerging phase transactions arising from such dynamics.
In this study, we generalize the MTT model and discover two new types of long time dynamics and we determine the locus of control parameters separating the different types of asymptotic dynamics. Our generalized MTT model includes a more detailed analysis of the merger process in which we distinguish between a firm (i.e. node) acquiring another firm, that is initiating a merger, and a firm being the target of the acquisition, i.e. being selected for merger and has been verified validation by comparing empirical analysis results with simulation results (see [19] for details of the validation). The first process happens with a probability proportional to k α a , where k a is the degree of the acquiring node. The acquiring node selects the target for acquisition with a probability proportional to the degree k β t where k t is the degree of the target. This represents a generalization of the original MTT model. We study this generalized process numerically and analytically using the Smoluchowski equation representing this process. To do that we have included a nonlinear coagulation effect by introducing a collision kernel proportional to k α a k β t [19]. We plot the phase diagram consisting of two axes, the vertical axis given by the preferential attachment exponent for newcomers (λ), and the horizontal axis given by the sum of coagulation kernel's exponents (α + β). By observing temporal behaviors of the system for each point in this phase diagram through intensive numerical simulations, we are able to quantify the dynamical properties resulting from the maximum value of the difference of the sizes of between the largest and the second largest nodes, denoted as Diff. According to this value, we categorize the phase diagram into Type I and Type II, with marginal critical cases. Type I relates to the case where the value of Diff is small as shown in Fig. 1b and the size distribution is steady as shown in Fig.  1a. In contrast, Type II consists of the case that Diff is very large with intermittent appearance of giant degree node showing saw-teeth oscillations and the corresponding size distribution is not steady as typically shown in Fig. 1e and f. The marginal cases, typically shown in Fig.  1c and d are given by the value of Diff around 10 3 . The regions of Type I and II are separated by the marginal critical cases. We apply each value of model-parameter from the Japanese firm network data analysis [19] and find that the firm network during the last few decades has moved towards the Type II region, the region where a giant degree node will be formed.

Model Analysis
Time evolution of network growth is modeled by the following three processes which are chosen randomly at each time step. Newcomer A new node with n links is added into the system. Each new link is randomly connected to an already existing node with a probability given by the preferential attachment rule [16,17] with the exponent λ. The allocation of links to an existing node of degree k follows a probability that is proportional to k λ . That is, the probability Π that a new node will be connected to node i depends on the connectivity k λ i of that node, so that where N is total number of nodes which fluctuates over time. Annihilation A node, chosen randomly with uniform probability, is removed along with all links connected to this node. Coagulation A pair of nodes, an acquiring node a and a target node t, are chosen randomly taking into account their link numbers to become proportional to the coagulation kernel, Node t is absorbed by the node a. In this process the k t links emerging from the target node are transferred to the acquiring node. The merger (coagulation) process conserves the sum of the link numbers except overlapping links, that is, multiple links between the same nodes is not allowed within the simulation.
First, we study the model by use of Monte Carlo simulations (see Appendix B for the method). The initial state consists of 10, 000 nodes with zero links. We then develop the network by applying the three processes: newcomer, annihilation and coagulation (merger and acquisition) with the rates r n , r a and r c . By comparing with the Japanese firm network in 2014, see [19,20], the rates are chosen as 0.5 : 0.4 : 0.1. The simulations are done as described in previous studies [15,19]. We are particularly interested in the evolution of the degree distribution and the appearance of a gap between the link numbers of the high degree nodes. In panels a, c and e the black squares, pink circles, and light-green triangles indicate the distribution found after 1.5 × 10 5 , 2.5 × 10 5 , and 3.5 × 10 5 time steps, respectively. Panel b, d and f show the corresponding time dependence (in semi-log) of the number of links k for the nodes with with the largest degree (red), second largest degree (green), 10-th largest degree (blue), 100-th largest degree (light blue) and 1,000-th largest degree (gray) and for the median (yellow). We notice that as α and β are increased, the temporal fluctuation becomes larger.
For the smaller values of α and β the nodes of highest degree ranks have link numbers of the same order of magnitude as shown in panel a and b. We call this Type I behavior. A change occurs as the α and β exceed certain threshold values, this is shown in panel e and f. The time dependent degree distribution develops a form that we approximate below by a power law for probability density c k (τ ) ∝ k −γ with an approximate exponent γ = 2.3±0.1, where c k (τ ) is probability density of number of nodes with link k. Concurrently, a significant gap appears intermittently between the number of links of the node with the highest degree and the link number of the other nodes of lower rank, see panel f in Fig. 1. We call this Type II behavior. A noticeable decrease occurs in the link number of the nodes with rank smaller The black squares, pink circles, and light-green triangles indicate the distribution found after 150, 000 step, 250, 000 step, and 350, 000 step, respectively in a, c, and 210, 000 step, 240, 000 step, and 280, 000 step, respectively in e. The dashed-lines indicate slope −1.3 = −γ + 1. The distribution shown by each of the coloured symbols in a, c, and e relate to the times given by the vertical lines in b, d, and f of the same colors. b, d, f Simulated time dependence of the link number k corresponding to the distributions a, c and e, respectively. Each line shows 1st (red), 2nd (green), 10th (blue), 100th (light blue), 1,000th (gray) and median (yellow) link number k than the largest, while the largest rapidly increases its link number. The gap between 1st and 2nd grow quickly, then, stays at a certain level for a while and the 1st disappears suddenly by annihilation in the case of Type II. This categorization of Type I and II is available numerically for any initial condition after a brief transient state. This behavior will be discussed below in relation to Eq. (2).

Smoluchowski Equation
As the dynamics switches from Type I to Type II, the intermittent appearance of a node with giant degree after the appearance of power-law degree distribution becomes evident. This is similar to the formation of the gel in the sol-gel phase transition, which is an aggregation phenomena that has a long history in statistical physics. Within this process, it is possible to observe the existence of a steady state which obeys a robust power law distribution in the aggregation system with continuous injection and removal near the gelation point [4,5,8], so to understand better the nature of the change between the two behaviors we establish a master equation in the spirit of Smoluchowski's equation used to describe sol-gel phase transitions which gives the time evolution of c k (τ ). The following distinct processes and their probabilities describe the dynamics: Increase of the number of nodes with k links ; for connecting one new link to a node with k − 1 links so that the resulting node holds k links.
(ii) δ k,4 ; for directly creating a node of 4 links (δ denotes Kronecker delta). This constant is derived by the ratio between the total number of transaction partners (c. 4.1 million) and the total number of firms (c. 1,1 million) observed within the Japanese firm network real data in 2014 [19]. (iii) (k + 1)c k+1 (τ ); for annihilating a node which is connecting to a node of k + 1 links.
for merging a node of k links as an acquirer and that of k − k links as a target.
Decrease of the number of nodes with k links k λ +1 c k (τ ); for creating a node which is connecting to a node of k links. (vi) kc k (τ ); for annihilating a node which is connecting to a node of k links. (vii) c k (τ ); for directly annihilating a node of k links.
for merging a node of k links as an acquirer and that of k links as a target.
k β +1 c k (τ ); for merging between a node of k links as an acquirer and that of k links as a target.
In order to allow isolated nodes (degree zero nodes) to be able to connect to newcomers and to participate in the merger events, we have added one to the degree of the involved nodes in the numerator and denominator of the kernels describing these processes. The resulting master equation for the time evolution of c k referring to the previous study [15,21] is given by; Note that N represents the mean (i.e. ensemble average) of total number of nodes, which N = L s=1 N s (τ )/L where N s (τ ) the s-th sample's total number of nodes at time τ and L the total number of samples, so that N c k (τ ) is the mean number of nodes with degree k at time τ . Moreover, the mean number of nodes added (or deleted) to the network per time is r n − r a − r c . Therefore, the mean number of nodes with degree k after one step is We now describe the analytic criteria we use to determine the boundary between Type I and Type II dynamics. We follow previous studies [22][23][24]  The gel phase appears when Λ(Q 1 ) > 0 while Λ(Q 2 ) < 0, where Q 1 where Q 1 and Q 2 denote the link sizes of the largest node and the second largest node, respectively [23] (see Appendix C for a strong divergence of a size of a node of maximum degree Q 1 in finite time which is effected by aggregating between large nodes). For theoretical evaluation of Eq. (1), we pay attention to the boundary between Type I and Type II dynamics and we assume that the degree distribution follows a power law c k (τ ) = H (τ )k −γ , based on findings from previous studies [3][4][5]8,[22][23][24]. Further more we assume stationarity for the averaged number of nodes by choosing r n = 1/2 and r a = 1/2 − r c , we have the following evaluation (see Appendix D for detail derivation).
(2) can become negative in the limit of large Q. We note that in Eq. (2)B is a constant and the normalization factor H (τ ) is essentially independent of the number of nodes for γ > 2. So the sign of Λ(Q) for large Q is estimated by the exponents of Q of the terms in the right hand side of Eq. (2). We fix γ at 2.3 ± 0.1, suggested by the simulations and by the real value estimated for the Japanese firm network [18]. Then, Eq. (2) derives the following two conditions for Λ(Q).
We stress that this is the prediction for a node of giant degree to be able to form and we take this as an estimate of the boundary between the two types of dynamics observed in the simulations. We are going to check the validity of the analytic estimations, (A) and (B), by introducing phase diagrams as follows. To detect the onset of Type II dynamics, corresponding to the intermittent formation of a giant node, we measure the maximum difference between the first and second highest number of links, Q 1 and Q 2 during 500, 000 Monte Carlo steps of the processes newcomer, annihilation and coagulation. We pay attention to the following three cases. (a) r n = 0.5, r a = 0.49, r c = 0.01, as a representative case that the effect of coagulation is very weak and the condition (A) should holds. (b) r n = 0.5, r a = 0.01, r c = 0.49, as a representative case that the effect of coagulation is very strong and the condition 1, as a representative case that corresponds to the observed rates for Japanese business firms in the real world [19]. Each figure in Fig.  2 shows the phase diagram of the corresponding case using a colored heatmap showing the difference Q 1 − Q 2 with the theoretical boundary given by the dotted lines.
We notice that when the rate for merger, or coagulation, is very low, Fig. 2 panel a, the phase boundary obtained from the analytic analysis of the time dependence of the moments coincides with the observed change in Q 1 − Q 2 in the simulations. For larger values of the rate r c for merger events, the analytic prediction deviates quantitatively from the simulated boundary between the two types of dynamics. This is natural since the Smoluchowski is a mean field approach and allowing mergers increase significantly the stochastic fluctuations in the dynamics.

Discussion
We have analytically and numerically studied the threshold condition for the formation of a giant component, or gelation, in a network subject to the three processes of creation, annihilation and merger.
Analysis of the Japanese firms network shows that the preferential attachment exponent λ remains constant at about 1. In contrast to the α and β exponent describing the merger, which show a significant change with time. Specifically analysis of 1994 and 2014 leads to the following estimates of the exponents (λ, α, β) = (1.0, 0.8, 0.6) and (1.0, 1.1, 0.7), respectively [19]. This analysis also shows acceleration in the number of firms with large numbers of links over the last twenty years. This corresponds to the growth of the exponents α, β as a result of substantial merger and acquisitions activities. Therefore, our theoretical analysis indicates that the dynamics of the inter-firm network in 1994 and 2014 is likely to be located near and at the threshold for the onset of giant node formation, respectively. Extrapolation of this trend, see Fig. 2, suggests that the ecosystem of Japanese firms is moving into the Type II region and that intermittent formation and disappearance of giant components is likely to upset the economy with increasing intensity in the near future. Such behavior may obviously be consequential for the sustainability [25] and health of the Japanese economy. The results of our work, therefore, provide quantitatively measurable indicators that can aid assessment of the evolution and the stability of an economical ecosystem.
Conceptually, appearance of giant degree nodes in evolving network systems may look similar to the study developed by et al. [26]. They analytically showed appearance of giant nodes by the effect of stronger preferential attachment of newly added nodes in the limit of infinite time. By contrast, we add two more effects, coagulation of nodes and spontaneous annihilation, and find that the effect of coagulation of nodes can be so strong for some parameter region that we can observe tendency that the link number of the largest degree node diverges in a finite time corresponding to the physical phenomenon of gelation, that never occurs by the preferential attachment alone.
Additionally, the appearance of saw-teeth oscillation in Fig. 1f may also have apparent similarities to the study developed by Ball et al. [27] and Matveev et al. [28], where steady oscillations in the aggregation systems with removal of cluster or fragmentation of cluster into monomers is detected. In particular, Matveev et al. [28] obtained an analytical solution for the steady state cluster size distribution with numerical confirmation of the relaxation of the size distribution to this steady-state form. However, both of these studies aim to evaluate the steady oscillatory behavior of cluster size distribution, with focus on the kinetics of irreversible aggregation and clustering phenomena of the Smoluchowski coagulation equation. In contrast, our study focuses on the sol-gel phase transition, and the impact (separately and interactively) of each fundamental process of newcomers, annihilations, and coagulations, since these are the fundamental cornerstones to the understanding of business transaction networks and systems. As a result, our findings are unique and cannot be derived by these earlier studies. In addition, there is an important methodological difference within our work as the oscillation period after gelation is depended on the random event of the gel-node's removal, so that the dynamics are nontrivial. Figure 3 shows the cumulative distributions of firm size in Japan. As for the distribution of the annual sales in Panel a, an approximated power law exponent slightly decreases from 1.1 (1995) to 0. 9 (2015). This indicates that large size of firms grow larger and middle-small size of firms shrink in their sizes; the gap widens. As for the distribution of the number of business transactions in Panel b, an approximated power law exponent keeps 1.3 from 1995 to 2015. However, a difference between the first and second highest number of business transactions grows larger. This also indicates that the firm-ecosystem comes close to the gelation phase.

Appendix B: Monte Carlo Simulation
As for our numerical analysis, we conducted Monte Carlo simulation as follows; Step1 Start with N 0 firms without any business transactions (N 0 = 10, 000 in Figs. 1 and . 2). Step2 Choose one of the following three events stochastically.
Newcomer A new node with n links is added into the system. Each new link is randomly connected to an already existing node with a probability given by the preferential attachment rule [16,17] with exponent λ. We decided that each new node has four links in order to keep the rate between the number of firms and the number of transaction partners [19]. This condition is not essential, and the main results do not change by replacing the numbers of newcomer's links by random numbers with the same average, for example. Annihilation A node, chosen randomly with uniform probability, is removed along with all links connected to this node. This is because age of firms follow exponential distribution; it is roughly consistent with the simple assumption that a firm disappears randomly following a Poisson process [15,19]. Coagulation A pair of nodes, an acquiring node a and a target node t, are chosen randomly taking into account their link numbers to become proportional to the  [19]. The merger (coagulation) process conserves the sum of the link numbers except overlapping links.
Step3 Repeat Step2 for 5, 000, 000 times. Figure 4 shows our algorithm by schematic diagram. The occurrence probabilities of newcomers, annihilations, and coagulations are denoted by r n , r a , and r c , respectively, satisfying r n + r a + r c = 1 (systems are quasi-stationally states when r n = r a + r c in the number of nodes). Each colored area shows the critical point of gelation τ c as given by the criteria d M 1,Q 2 /dτ = 0. The time required to reach the onset of gelation τ c depends on the size of parameters and there would be parameter threshold effects as phase transition in the systems. We note that decrease of the size of the second largest, Q 2 , is caused both by annihilation of the nodes which are already connecting to Q 2 and by less attachment of newcomers to this node as the largest node attracts most of newcomers by the preferential attachment with its dominant size. Namely, the effect of annihilation plays an important role to realize the critical time, τ c , at a finite time, while gelation occurs only asymptotically in the infinite time in the model of Krapivsky et al. [26].

Appendix C: Simulated Time Depedence of Link Number k in Open Systems
By changing the scale factors for time τ and distribution c k as N τ, c k and ( k α +1)( k β + 1)c k , we have the following simpler representation of Eq. (4), which does not change the results.
To study the criteria in Eq. (2), we consider M λ,Q ≡ Q k=0 k λ c k [22,23]. We multiply both sides by k and sum these from 0 to Q in Eq. (5), where Q is maximum size of k. This way we obtain for the creation processes and for the annihilation processes Combining Eqs. (10), (11) and (12), we get Eq. (13).