Exponential random graph models for the Japanese bipartite network of banks and firms
Abstract
We use the exponential random graph models to understand the network structure and its generative process for the Japanese bipartite network of banks and firms. One of the well-known and simple models of the exponential random graph is the Bernoulli model which shows that the links in the bank–firm network are not independent from each other. Another popular exponential random graph model, the two-star model, indicates that the bank–firms are in a state where the macroscopic variables of the system can show large fluctuations. Moreover, the presence of high fluctuations reflects a fragile nature of the bank–firm network.
Keywords
Exponential random graph Bipartite network Bernoulli model Two-star modelIntroduction
Models of networks are useful in studying their structural properties as well as their dynamical behaviours. The approaches to construct models of networks can be classified into two broad categories considering the analogy with the theories of gases in statistical physics [1]. The two approaches are known as the kinetic theory approach and the ensemble approach. In kinetic theory approach, one considers the possible mechanisms to replicate some structural properties of the real-world network. For example, the well-known Barabási-Albert model [2] considers preferential attachment mechanisms to construct a growing network with a fat tail degree distribution. These models are easy to understand and give a qualitative understanding of the network, but have a limitation in quantitative accurate predictions. Thus, these models do not provide an overall understanding of the network, rather only mimics few features of the networks.
The other class of models, the ensemble models, are based on rigorous probabilistic arguments with a solid statistical foundation, useful for accurate predictions and quantitative study of the network. These models are based on the concept of statistical ensemble implying a large collection of all possible realizations of the network at particular values of the macroscopic observables. A particular graph in the ensemble of networks appears with a probability \({P(G)} \propto {\exp [H(G)]}\), where H(G) is known as the network Hamiltonian. As the probability is an exponential function of the network Hamiltonian, these models are popularly known as “exponential random graph (ERG) models”.
The ERG model was first introduced by Holland and Leinhardt [3], based on the framework laid by Besag [4]. Since the introduction of the ERG models, a variety of network Hamiltonians have been studied, which include models of random network [1], reciprocity model of directed network [1], the two-star model of network [5, 6], and the Strauss model of network with clustering [7, 8]. Far more complex Hamiltonians that include endogenous as well as exogenous observables of the network, have also been studied in the social network literature [9, 10, 11]. Moreover, there are many tools such as ERGM [12] and SIENA [13] packages to fit ERG model with social data. The problem with the complex non-linear Hamiltonian is that it cannot be solved exactly, the only linear Hamiltonian model can be solved exactly in the large system size limit. For a non-linear Hamiltonian, it can be solved approximately either using mean field theory and perturbation theory or by numerical simulation.
The ERG model has been studied extensively for monopartite networks except few studies in case of bipartite network [14]. In this paper, our focus is on the Japanese bipartite network of banks and firms. We model the bipartite network using the exponential random graph theory. We study the well-known Bernoulli model and two-star model to get a deep understanding of the network structure of the Japanese bipartite network of banks and firms.
Data
We use the Nikkei data set for the banks–firms lending–borrowing links in Japan. Lending data are available only for the listed firms and are restricted in our work to the long-term loans during 2005. Each node in this bipartite network (firms and banks) has its financial statements. However, only listed banks have available financial statements. Therefore, we consider the unweighted and undirected simple bipartite network for the long-term lending–borrowing links between listed firms and listed banks during 2005. The network is formed by \(M = 127\) banks, \(N = 2198\) firms and \(L = 11,842\) unweighted long-term links.
Method
Exponential random graph model
Markov chain Monte Carlo (MCMC) sampling algorithm
Let \(x_\mathrm{{obs}}\) be the observed graph. We would like to solve the moment equation \({\mathrm {E}}_{\theta }(z(X)) - z(x_{\mathrm{{obs}}}) = 0\), where X represents the networks sampled with MCMC.
MCMC sampling is used to estimate network statistics \({\mathrm {E}}_{\theta }(z(X))\). The most commonly used MCMC sampler is the Metropolis–Hastings algorithm, which was introduced in [15].
Stochastic approximation: the Robbins–Monro algorithm
Snijders proposed a stochastic approximation based on the Robbins–Monro algorithm to obtain the maximum likelihood estimation (MLE) for the ERG model [16]. Following [9], this approach is robust and does not require any particular starting point. The stochastic approximation algorithm is based on three phases as described in the following.
Initialization phase
With the initial parameter \(\tilde{\theta }\), this phase determines the scaling matrix \(D_0\). Let \(z_{\tilde{\theta }}(x_1), z_{\tilde{\theta }}(x_2),...,z_{\tilde{\theta }}(x_{M_i})\) be the statistics related to networks \(x_1, x_2,...,x_{M_i}\) generated with the MCMC sampler based on \(\tilde{\theta }\). Let \({\mathrm {E}}_{\tilde{\theta }}\) be the expectation vector of the network statistics, and let D be the covariance matrix. The scaling matrix is defined as \(D_0 = {\mathrm {diag}}(D)\), and \(\theta\) is initialized for the second phase, as follows: \(\theta _0 = \tilde{\theta } - a {\,\cdot\,} D_0^{-1} {\,\cdot\,} ({\mathrm {E}}_{\tilde{\theta }} - z(x_{\mathrm{{obs}}}))\). a is defined as the gain factor, which controls the size of the updating steps (\(a = 0.1\) at initialization).
Optimization phase
The goal is to solve the moment equation \({\mathrm {E}}_{\theta }(z(X)) - z(x_\mathrm{{obs}}) = 0\) based on the Newton-Raphson minimization scheme. The goal is then to update \(\theta\) under different sub-phases, where each sub-phase r reduces the gain factor \(a_r\).
Convergence phase
We want to check whether the returned value \(\hat{\theta }\) from the optimization phase is close to the true MLE. Therefore, \(M_c\) networks are sampled based on the MCMC sampler with a value of \(\hat{\theta }\). The convergence condition is reached when
Results
Bernoulli model of a bipartite network
The variation of connectance p of the Bernoulli model is plotted as a function of \(\theta\) for the Japanese bipartite network of banks and firms. The solid line represents the exact solution and red circles are the Monte Carlo simulation results. The filled red circle indicates the simulation result for the observed snapshot of the Japanese bipartite network of banks and firms
A bipartite network consisting of two distinct node set \(\mathcal {N, M}\) can be represented by a rectangular adjacency matrix with the elements \(A_{ij} = 1 \quad \{1 \le i \le N; 1 \le j \le M\}\) if and only if the ith node of one node set is connected to the jth node of the other set and \(A_{ij} = 0\) otherwise. The total number of links of the bipartite network \(E(G) = \sum _{i=1}^{N} \sum _{j=1}^{M} A_{ij}\).
The free energy of the network \(F=\ln {Z} = NM \ln (1 + e^\theta )\)
The degree distributions P(k) are plotted against degree k for a banks and b firms. Empirical, simulated and analytic results are indicated with different legends
Figure 1 shows the connectance \(p=\langle E \rangle /MN\) as a function of \(\theta\) for the model. The results indicate an excellent match between the analytic solution and simulation results. Simulation results are obtained using Markov chain Monte Carlo method as explained in Sect. 3. The data points are averaged over 1000 independent runs. The maximum standard deviation in the data points is found to be \(\sigma =0.001\). For the Japanese bipartite network of banks and firms, the analytic result gives \(\theta = 3.1167\) and our simulation estimates \(\theta = 3.1166 \pm 0.0015\) reflecting the sparse nature of the network.
The degree distribution p(k) implies that the number of nodes with degree k of this model has binomial form. For the bank–firm network, the degree distribution of the banks can be written as \(P_\mathrm{{bank}}(k) = \left( {\begin{array}{c}N\\ k\end{array}}\right) p^k(1-p)^{(N-k)}\) and for firms \(P_\mathrm{{firm}}(k) = \left( {\begin{array}{c}M\\ k\end{array}}\right) p^k(1-p)^{(M-k)}\). As can be seen from the Fig. 2, the degree distribution of the model does not fit with the empirical distribution which has a much broader shape for both bank and firm. We conclude that the Bernoulli model is a poor model for the Japanese bipartite network of banks and firms.
Two-star model of a bipartite network
As the Hamiltonian becomes linear with \(A_{ij}\), we can easily calculate the partition function \(\kappa = [1+\exp ( \Theta )]^{NM}\)
From the partition function, we can calculate other network observables:
Free energy \(F=ln(\kappa )=NM \ln [1+\exp ( \Theta )]\)
Total expected number of links \(<Z_L> = \frac{\partial F}{\partial \theta _L} = NM \frac{\exp ( \Theta )}{1+\exp ( \Theta )}\)
The phase diagram for the two-star model. The red circle indicates the position for the Japanese bipartite network of banks and firms for the year 2005
Estimated values of the coupling parameters of the two-star model for the Japanese bipartite network for the year 2005
Parameters | Estimated values | Standard deviation |
---|---|---|
\(\theta _L\) | \(-3.974\) | \(4.040 \times 10^{-3}\) |
\(\theta _{2SF}\) | \(6.307 \times 10^{-2}\) | \(3.445\times 10^{-4}\) |
\(\theta _{2SB}\) | \(3.334 \times 10^{-3}\) | \(1.328 \times 10^{-6}\) |
The hysteresis plot for the two-star model of the bank–firm network. The black curve indicates the variation of connectance p when \(\theta _L\) increases from low to high and red curve indicates when \(\theta _L\) decreases from high to low. The values of \(\theta _{2SF}\) and \(\theta _{2SB}\) are kept constant as in Table 1. The error bars indicate standard deviation in p
The degree distributions P(k) are plotted against degree k for a banks and b firms. Empirical and simulated results are indicated with different legends
Figure 5 shows the degree distribution for the two-star model. The distribution has a bi-modal nature [19], although the second peak near \(k=N\) for the degree distribution of banks is very small. This model also can not replicate the empirical nature of the degree distribution. In the future, we will consider more complex Hamiltonians that include endogenous as well as exogenous parameters to describe the system in a much better way.
Conclusions
We have studied the Japanese bipartite network of banks and firms using the Bernoulli model and the two-star model. The Bernoulli model assumes that links are formed between banks and firms independently. However, this model does not fit well with the empirical network structure indicating a relationship is present between the network structure and some hidden variables. As a first approximation, we consider two-star model that assumes adjacent links play a role in the link formation. This model indicates that the Japanese bipartite network of banks and firms has a fragile nature. Although this model can not capture the empirical network structure fully.
In the future, we would like to consider more complex Hamiltonians as well as the temporal evolution of the system in the phase space. We believe such complex Hamiltonians will be useful to understand the network structure in detail.
Footnotes
- 1.
x and \(x'\) are network states at simulation steps t and t+1, respectively.
References
- 1.Park, J., & Newman, M. E. (2004). Statistical mechanics of networks. Physical Review E, 70, 066117.CrossRefGoogle Scholar
- 2.Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.CrossRefGoogle Scholar
- 3.Holland, P. W., & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76, 33–50.CrossRefGoogle Scholar
- 4.Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological) 36(2),192–236.CrossRefGoogle Scholar
- 5.Park, J., & Newman, M. E. (2004). Solution of the two-star model of a network. Physical Review E, 70, 066146.CrossRefGoogle Scholar
- 6.Annibale, A., & Courtney, O. T. (2015). The two-star model: exact solution in the sparse regime and condensation transition. Journal of Physics A: Mathematical and Theoretical, 48, 365001.CrossRefGoogle Scholar
- 7.Strauss, D. (1986). On a general class of models for interaction. SIAM Review, 28, 513–527.CrossRefGoogle Scholar
- 8.Park, J., & Newman, M. (2005). Solution for the properties of a clustered network. Physical Review E, 72, 026136.CrossRefGoogle Scholar
- 9.Lusher, D., Koskinen, J., & Robins, G. (2013). Exponential random graph models for social networks: Theory, methods, and applications. Cambridge: Cambridge University Press.Google Scholar
- 10.Wong, L. H. H., Gygax, A. F., & Wang, P. (2015). Board interlocking network and the design of executive compensation packages. Social Networks, 41, 85–100.CrossRefGoogle Scholar
- 11.Simpson, S. L., Hayasaka, S., & Laurienti, P. J. (2011). Exponential random graph modeling for complex brain networks. PloS One, 6, e20039.CrossRefGoogle Scholar
- 12.Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M., & Morris, M. (2008). ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of statistical software, 24(3), nihpa54860.CrossRefGoogle Scholar
- 13.Ripley, R.M., Snijders, T.A., & Preciado, P. (2011). Manual for siena version 4.0. Oxford: University of OxfordGoogle Scholar
- 14.Wang, P., Pattison, P., & Robins, G. (2013). Exponential random graph model specifications for bipartite networksa dependence hierarchy. Social Networks, 35, 211–222.CrossRefGoogle Scholar
- 15.Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1), 97–109.CrossRefGoogle Scholar
- 16.Snijders, T. A. (2002). Markov chain monte carlo estimation of exponential random graph models. Journal of Social Structure, 3, 1–40.Google Scholar
- 17.Solomonoff, R., & Rapoport, A. (1951). Connectivity of random nets. The Bulletin of Mathematical Biophysics, 13, 107–117.CrossRefGoogle Scholar
- 18.Erdos, P., & Rényi, A. (1960). On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 5, 17–60.Google Scholar
- 19.Coolen, A. C., Annibale, A., & Roberts, E. (2017). Generating random networks and graphs. Oxford: Oxford University Press.CrossRefGoogle Scholar