Portfolio Correlations in the Bank-Firm Credit Market of Japan

The recent global financial crisis has shown portfolio correlations between agents as one of the major channels of risk contagion and amplification. In this work, we analyse the structure and dynamics of the cross-correlation matrix of banks’ loan portfolios in the yearly bank-firm credit network of Japan during the period from 1980 to 2012. Using the methods of Random Matrix Theory (RMT), Principal Component Analysis and complex networks, we aim to detect non-random patterns in the empirical cross-correlations as well as to identify different states of such correlations over time. Our findings suggest that although a majority of portfolio correlations between banks in lending relations to firms are contributed by noise, the top largest eigenvalues always deviate from the random bulk explained by RMT, indicating the presence of non-random patterns governing the correlation dynamics. In particular, we show that this dynamics is mainly driven by a global common factor and a couple of “groups” factors. Furthermore, different states in the credit market can be identified based on the evolution of eigenvalues and associated eigenvectors. For example, during the asset price bubble period in Japan from 1986 to 1991, we find that banks’ loan portfolios tend to be more correlated, showing a significant increase in the level of systemic risk in the credit market. In addition, building Planar Maximally Filtered Graphs from the correlations of different eigenmodes, notably, we observe that the local interaction structure between banks changes in different periods. Typically, when the dominance of a group of banks in one period gradually vanishes, the credit market starts to build-up a different structure in the next period in which another group of banks will become the main actors in the backbone of the cross-correlations.


Introduction
Over the last few years, catastrophic cascade of failures in interdependent systems has received a remarkable attention in network science. Findings in this line of research show that the robustness of a system crucially depends on both its internal structure as well as its pattern of relations to other interdependent ones (e.g. Smart et al. 2008;Buldyrev et al. 2010;Huang et al. 2011;Brummitt et al. 2012;Reis et al. 2014;Liu et al. 2016;Bianconi 2018).
In the banking system, the recent global financial turmoil has demonstrated that besides direct credit linkages 1 , indirect relations between banks are also the relevant channels of risk contagion and amplification. For instance, overlapping portfolios due to common asset holdings have been widely considered as one of the major sources of risk contagion (e.g. May and Arinaminpathy 2010;Beale et al. 2011;Huang et al. 2013;Caccioli et al. 2014Caccioli et al. , 2015Greenwood et al. 2015;Lillo and Pirino 2015;Levy-Carciente et al. 2015;Corsi et al. 2016). Analogously, credit relationships between banks and firms have been suggested as another important channel of the propagation of distress between the financial system and the real sector of the economy (e.g. Aoyama et al. 2013;Aoyama 2014;de Castro Miranda and Tabak 2013;Lux 2016). Accordingly, since banks can be indirectly linked through a set of joint exposures to firms (i.e. due to loan portfolio overlaps), the distress originating from a group of banks or firms can be propagated through the whole system. For these reasons, a number of empirical studies have been devoted to uncovering the complexity of the topological structure of bank-firm credit markets world-wide (e.g. Fujiwara et al. 2009;De Masi et al. 2011;De Masi and Gallegati 2012;Fricke 2016;Marotta et al. 2015Marotta et al. , 2016Fricke and Roukny 2020;Lux 2018, 2019;Lux 2020).
It is worthwhile to emphasize that besides portfolio overlaps, portfolio correlations also play a crucial role in risk management and have been seen as one of the major sources of financial instability. From a perspective of bankers, the understanding of the structure of correlations (or co-movements) between banks in providing loans to non-financial firms as well as the identification of factors governing such correlations are essential for managing credit risk (e.g. Bülbül and Lambert 2012;Fenech et al. 2015). The systematic factors driving the correlations represent market risks that cannot be diversified a way. In contrast, idiosyncratic risks associated with non-systematic components can be diversifiable as the dimension of the system grows (e.g. Chamberlain and Rothschild 1983). Furthermore, from a macroeconomic perspective, in a banking system where bank portfolios are highly unified or coupled, a common shock might trigger the entire system to follow the similar adjustment strategies. Even if the original shock is relatively small, it would result in a significant change in the credit supply to the other sectors of the economy, which in turn might lead to a large impact on macroeconomic fluctuations (e.g. Kiyotaki and Moore 1997;Khwaja and Mian 2008;Gertler and Kiyotaki 2010;Jorda et al. 2013;Bassett et al. 2014;Brunnermeier and Schnabel 2016;Cingano et al. 2016;Gambetti and Musso 2017;Bentolila et al. 2018;Azariadis 2018;Alfaro et al. 2018;Amiti and Weinstein 2018). However, so far, the structure and the key drivers of correlations between banks in lending to firms still remain a puzzling mystery.
This study, therefore, aims to contribute to a deeper understanding of the structure and the dynamics of the cross-correlation matrix of banks' loan portfolios. In particular, our main objectives are to answer the following research questions that have not been intensively studied in related literature: (i) how to detect non-random patterns and latent factors (if they exist) in the empirical cross-correlation matrix of banks' loan portfolios?; and (ii) how to identify different states in the correlation dynamics over time?; and (iii) how to quantify systemic risk from loan portfolio correlations in the credit markets?.
To guide our analysis, we employ the methods of Random Matrix Theory (RMT) and Principal Component Analysis (PCA). The methods of RMT have been widely used to filter noise as well as to extract latent information embedded in empirical correlations for time series data on stock prices/returns (e.g. Laloux et al. 1999Laloux et al. , 2000Plerou et al. 2000Plerou et al. , 2002Kim and Jeong 2005;Bouchaud and Potters 2009;Wang et al. 2011;Meng et al. 2014;Jiang et al. 2014;Uechi et al. 2015;MacMahon and Garlaschelli 2015), on financial transactions (e.g. Kyriakopoulos et al. 2009;Fricke 2012), etc. Empirical evidence also shows that RMT is a useful tool to enhance the Markowitz mean-variance optimization procedure (e.g. Plerou et al. 2000;Laloux et al. 2000;Sharifi et al. 2004;Daly et al. 2008;Bai et al. 2009;Quintana et al. 2015). Recently, RMT has been used to study synchronizations between different macroeconomic indicators (e.g. Ormerod and Mounfield 2000;Ormerod 2008;Iyetomi et al. 2011;Yoshikawa et al. 2015;Lux et al. 2020). Indeed, the applications of RMT to the spectral analysis of various types of complex networks have also received considerable attention in related literature (Bandyopadhyay and Jalan 2007;de Carvalho et al. 2009;Potestio et al. 2009;Jalan 2009;Tran et al. 2013;Dumitriu and Johnson 2016), but to the best of our knowledge none of them has studied the spectral properties of cross-correlations between nodes in one set in the formation of links or the strength of interactions with nodes in another set (e.g. like the correlations between banks in providing loans to firms in the other sectors of the economy as in the present study).
Practically, in this study, we will, first, compare the eigenvalue spectrum of the empirical cross-correlation matrix with the theoretical one implied by RMT. Second, focusing on a group of the largest eigenvalues significantly deviating from random bulk predicted by RMT, we extract latent information and infer the significant factors governing the correlation dynamics. This approach allows us to have a proper selection of the strongest factors containing genuine information about the properties of the observed data that are different from what would be expected under a null model. In the next step, in order to identify different states in the credit market, we examine the evolution of the largest eigenvalues and their corresponding eigenvectors over time. In addition, building Planar Maximally Filtered Graphs (PMFG) (Tumminello et al. 2005;Pozzi et al. 2008;Di Matteo et al. 2010) from the correlation structure for different eigenmodes, we also analyse the network structure of correlations and investigate how this network changes in different periods. Furthermore, to infer systemic risk arising from credit extensions from banks to firms, we measure the absorption ratios that capture the fraction of the total variance explained by a finite number of the first principal components (Kritzman et al. 2011;Billio et al. 2012;Zheng et al. 2012;Lux et al. 2020). It should be emphasized that the procedure used in the present paper can be employed to filter noise signals and extract significant patterns out of different large bipartite networks including those in economics and finance such as bank-borrower credit networks, investor-asset networks.
Perhaps, among different studies on the bank-firm credit relations, the work of Fujiwara et al. (2009) is closest to our study. Based on eigenvalue analysis, Fujiwara et al. (2009) measure the fragility scores in terms of dynamical propagation of influences from lenders to borrowers and vice versa. However, there exist important distinctions between ours and theirs. To name a few, first, it should be emphasized that the main difference to the present paper is, instead of analysing the correlations between banks in lending to firms, Fujiwara et al. (2009) define a new weighted matrix P capturing the accumulated interactions between the two sectors. Hence, their approach cannot answer whether the loan portfolios of banks tend to be more/less correlated in the credit market in different periods. Furthermore, in such a setting, the largest eigenvalue is always equal to 1, and few top largest eigenvalues (i.e. the second and the next largest eigenvalues of P) are important in the propagation of influences from banks to firms and vice versa over a finite time-scale. 2 Different from their approach, we focus on the largest eigenvalues that significantly deviate from the spectral distribution explained by an appropriate null model (e.g. the distribution given by the so-called Marchenko-Pastur law (Marčenko and Pastur 1967;Tao and Vu 2012)). 3 Such eigenvalues and the corresponding eigenvectors are analysed as they might contain the genuine information of particular economic mechanisms responsible for an unexpectedly high level of correlations between banks in lending to the non-financial sector of the economy.
The remainder of this paper is structured as follows. Section 2 briefly describes the data and methods used to analyse portfolio correlations. Section 3 reports the main findings for the weighted version of the bank-firm credit network of Japan. Discussions and concluding remarks are in Sect. 4. In the ''Appendix'', we provide additional results for portfolio overlaps and similarities, the fractions of eigenvalues in different ranges, the distributions as well as the evolution of the eigenvector 2 In their study, briefly, from the matrix of weights W bf representing loans from banks to firms defined in the next part of the present study, they obtain a matrix A that stands for relative amounts of lending by banks to firms and a matrix B that stands for the relative amount of borrowing by firms from banks. Then the matrix P ¼ AB will capture the accumulated reflections and influences between banks and firms. Assume that k ðPÞ 1 ! k ðPÞ 2 ! ::: ! k ðPÞ N are the eigenvalues of P, it can be shown that k ðPÞ 1 ¼ 1. Therefore, over a finite time-scale, only few next largest eigenvalues k ðPÞ 2 ... are important in the dynamics of the propagation of influences from banks to firms and vice versa. For more details, we refer readers to the study of Fujiwara et al. (2009). 3 Note that to deal with potential effects of the presence of extreme loan values on the largest eigenvalues, we also consider the other upper bounds obtained from heavy-tailed random matrices (see Sect. 2.2.2). elements, and the more detailed correlation network structure between banks based on the method of PMFG. We also summarize the main results for the binary version of the credit network.

Data
We use a yearly data set containing credit relations between banks and large firms in Japan over the period from 1980 to 2012. The data were obtained from the firms' financial statements and the survey by Nikkei Media Marketing, Inc. in Tokyo (commercially available). 4 This covers the most critical time and events over the past few decades in the Japanese economy, e.g. the bubble period from 1986 to 1991, the burst of the bubble economy and the stagnation time in the subsequent years (the so called ''lost decade'') (e.g. see Hoshi and Kashyap 2000;Brunnermeier and Schnabel 2016;Himino 2021), the recent global financial crisis. 5 Based on the maturity of loans, we separately consider three layers of lending relationships from banks to firms, i.e. total lending relationships (namely layer 1), shortterm lending relationships (namely layer 2), and long-term lending relationships (namely layer 3). Note that the layer 2 captures all loan contracts lasting less than or equal to one year. In contrast, the layer 3 only consists of loan contracts that exceed one year.
For the sake of conciseness, we exclude other financial institutions including insurance companies and a small number of aggregated banks. 6 In addition, it should be emphasized that for each maturity-based layer, we focus on the aggregate amount of loans from each bank to each firm at the end of the fiscal year, meaning that we ignore contract-specific frequencies (i.e. separate times that a bank lends to a firm during the fiscal year).

Representation of a Bank-Firm Network, Portfolio Overlaps and Projection Matrices
We shall start with some basic notations and definitions of a bank-firm credit network. Assume that in each year we have a weighted bipartite network consisting of N banks lending to N F firms. Let W bf ¼ fw bf ij g NxN F be the matrix of weights (i runs from 1 to N standing for N banks, and j runs from 1 to N F representing N F 4 For a more detailed explanation about the data set, we refer readers to, for example, Fujiwara et al. (2009), Marotta et al. (2015, Marotta et al. (2016). 5 Although additional analysis for more recent data could provide more insights on the evolution of the portfolio correlations among banks in lending to non-financial firms, we leave it for future study once data is available. 6 Each aggregated bank consists of a group of small/local banks. However, the main conclusions do not change very much when we include insurance companies and aggregated banks in the data, since the market shares of these companies and banks are relatively small. firms). Each element w bf ij is non-negative and implies the monetary amount of loan 7 that bank i lends to firm j. Notice that from the matrix W bf , we obtain the adjacency matrix and zero otherwise. This adjacency matrix is associated with the binary version of the bank-firm credit network.
From a perspective of network theory, one can measure the structural similarity between banks' loan portfolios based on various measures of the similarity between two vectors (e.g. Cai et al. 2018;Pool et al. 2015;Alvarez-Socorro et al. 2015;Fricke 2016;Lux 2018, 2019). Briefly, for each pair of nodes in the same set in the bipartite network, the similarity between them is based on the similarity of their neighborhoods. For instance, in the weighted version, we can, respectively, define the similarity matrix of banks and that of firms based on the generalized Jaccard index as and With non-negative weights in the bank-firm credit network, we have that 0 P N F k¼1 minðw bf ik ; w bf jk Þ P N F k¼1 maxðw bf ik ; w bf jk Þ and 0 P N k¼1 minðw bf ki ; w bf kj Þ P N k¼1 maxðw bf ki ; w bf kj Þ. Furthermore, since in the present study we focus on banks that lend to at least one firm and firms that borrow from at least one bank, the two denominators P N F k¼1 maxðw bf ik ; w bf jk Þ and P N k¼1 maxðw bf ki ; w bf kj Þ are strictly positive. Therefore, S BÀB ij and S FÀF ij range in [0, 1]. Notice that S BÀB ij ¼ 1 if and only if w bf ik ¼ w bf jk for all firms k. Furthermore, S BÀB ij ¼ 0 if and only if the portfolios of the two banks i and j have no overlaps at all. Therefore, each element of the matrix S BÀB demonstrates the degree of similarity between the lending portfolios of a pair of banks. In a similar vein, each entry of the matrix S FÀF indicates the degree of similarity between the borrowing portfolios of two firms.
Furthermore, the one-mode projection matrices obtained from the original bankfirm credit network also indicate the lending portfolio overlaps between banks or the borrowing portfolio overlaps between firms (e.g. Lux 2018, 2019). More specifically, defining we then obtain a matrix in which each off-diagonal element is the number of firms 7 Depending on the considered layer, it can be the long-term loan, the short-term loan, or the total loan. that a pair of banks co-finance. In a similar way, each off-diagonal element of the A FÀF matrix, with indicates the number of common lenders that a pair of firms have. A major disadvantage of the network projection approach is that the information about the monetary value of loans from banks to firms is partially or completely discarded (e.g. see Lehmann et al. 2008;Luu and Lux 2018). In addition, often sparser information about the original bipartite structure of credit links is projected into denser one-mode networks. More importantly, from an economic perspective, the network-based analysis of the two projection matrices (and the similarity matrices) alone cannot help to identify which pairs of banks tend to be more/less correlated in lending to firms. This provides the motivation for introducing an appropriate procedure used to measure and analyse the loan portfolio correlations between banks, which shall be briefly explained in the following. 8

Correlation Matrix
We now introduce the method used to measure the empirical cross-correlation matrix of banks' loan portfolios. Here we focus on the weighted version of bank-firm credit relationships. Additional explanations and results for the binary version (i.e. based on the elements of the matrix A bf ¼ fa bf ij g NxN F ) are provided in the ''Appendix''. Similar to the procedure often implemented in the analysis of stock returns/ prices, for w bf i;j [ 0, the normalized amount of the loan (in log-scale) from bank i to firm j is defined as where for each bank i, hlnðw bf ij Þi and r ij are the average and the standard deviation of lnðw bf ij Þ over all N F firms, respectively. As a special case, we let X ij ¼ 0 if w bf ij ¼ 0. 9 Denote X ¼ fX ij g NxN F , the cross-correlation matrix of N banks is then defined as 8 Note that although in the present work we focus on loan portfolio correlations between banks, we also summarize the main results for loan portfolio overlaps and similarities in the ''Appendix''. 9 It should be emphasized that in general, almost all main conclusions from this study still hold if we define withw bf ij ¼ ðw bf ij þ 1Þ. One can also use the original weight w bf ij instead of the loan amount in log-scale to define the normalized quantity X ij . Although for the sake of brevity the results for these alternative definitions of X ij are not reported here, they available from the author upon request.
where X T is the transposition of X . Note that À1 C ij 1 (for 1 i\j N) and C ii ¼ 1 (for all i from 1 to N). From an economic point of view, the value of C ij indicates the correlation between bank i and bank j in lending to all firms. In particular, C ij [ 0 ð\0Þ implies that two banks i and j are positively (negatively) correlated, while C ij ¼ 0 demonstrates that there is no correlation between the two banks at all. Note that with the particular application to the bank-firm credit network of Japan in the present work, the ratio N F N is larger than 1, since the number of banks is always less than that of firms over the sample period. However, if the number of borrowers was less than the number of lenders, the estimated covariance matrix between banks in lending to firms would become singular. 10 Similarly, in analyses of economic and financial time series, when the number of observations over a relatively short period is smaller than the number of the indices (e.g. those represent different stock prices or macroeconomic indicators), such a singularity problem also emerges. In this case, the estimated covariance matrix will have the singular Wishart distribution (e.g. see Srivastava 2003;Bouchaud et al. 2007;Bodnar et al. 2013Bodnar et al. , 2014. As a an important consequence in portfolio analysis, if the estimated covariance matrix is singular, the matrix inversion operation used to estimate the portfolio weights is not possible. To tackle this problem, several transformation methods with practical applications to portfolio theory haven been proposed (c.f. Pappas and Kaimakamis 2010;Bodnar et al. 2016Bodnar et al. , 2017Bodnar et al. , 2018Bodnar et al. , 2019, among others for a more detailed discussion on different transformation methods and their practical relevance).

Random Matrix Theory and Selection of Factors
We shall now introduce the relevant results of RMT that will be used to analyse the cross-correlations matrix defined in Eq. (6). Suppose that fk i g i¼N i¼1 are the eigenvalues of a correlation matrix C and f C ðkÞ is the probability density function of the eigenvalues of C. According to RMT (e.g. Marčenko and Pastur 1967;Tao and Vu 2012), if the elements of X $ iid N ð0; r 2 Þ, then in the limit N; N F ! 1 with Q ¼ N F N ! a (a is constant and a [ 1), the eigenvalue distribution f C ðkÞ is given by the Marchenko-Pastur law In (7), r 2 is the variance of the elements of X (r 2 is equal to 1 in our case), and k Ã max and k Ã min are the upper and lower bounds of eigenvalues predicted by RMT, respectively. With unit variance, these two bounds are given by Generally speaking, if some of the eigenvalues of the empirical cross-correlation matrix are beyond the interval ½k Ã min ; k Ã max , they carry genuine information about the correlations between banks (Plerou et al. 2000(Plerou et al. , 2002. Nevertheless, it should be emphasized that the presence of extreme values in the elements of X (e.g. when X is a heavy-tailed random matrix) might cause the largest eigenvalues to exceed the upper bound implied by the Marchenko-Pastur law (e.g. Biroli et al. 2007;Bouchaud and Potters 2009). In this case, some eigenvalues that are larger than k Ã max might actually contain no genuine information.
As shown in Biroli et al. (2007), whenever the largest entry of X (in absolute terms), S, satisfies that S ðN:N F Þ 1 4 , the upper limit for the largest eigenvalue still remains stuck at k Ã max . However, if S [ ðN:N F Þ 1 4 , a new upper bound k ð1Þ max will emerge: Furthermore, when the elements of X have power-law tails with exponent l, then the large values among fX ij g will dominate the top largest eigenvalues. The new upper bound k ð2Þ max now depends on l as Hence, in the present study, to capture the potential effects of extreme values of loans from banks to firms, we also compare the top largest eigenvalues with these different bounds. As shown later, in our analysis the top five largest eigenvalues of the empirical correlation matrix C are always larger than all three upper bounds k Ã max , k ð1Þ max and k ð2Þ max .

Decomposition of Correlation Matrix
We now introduce one of the most important applications of RMT, i.e. how to extract latent information from the largest eigenvalues deviating from the random bulk. Without loss of generality, we assume that k 1 ! k 2 ! ::: ! k N are the eigenvalues of the empirical correlation matrix C, and u 1 ; u 2 ; . . .; u N are their corresponding eigenvectors. The correlation matrix C can be diagonalized as with K ¼ diagfk 1 ; . . .; k N g, and fUg NxN is an orthonormal matrix, whose i th column is the normalized eigenvector u i corresponding to the eigenvalue k i . As a result, for each k i we have Let y i ¼ u T i X , we obtain the following property for the total variance of fX i g N i¼1 : From Eqs. (12) and (13), we can easily show that the ratio k i N indicates the percentage of the total variance P N i¼1 VarðX i Þ explained by the principal component Jolliffe 1986;Wang et al. 2011;Gewers et al. 2018).
In addition, since Eq. (11) can be re-written as we shall now decompose the empirical correlation C into different parts capturing the effects of different factors. First, if the largest eigenvalue k 1 deviates significantly from the random bulk, the so-called ''market'' mode, i.e. the global common factor (or the systematic factor) that influences the lending of all banks in the credit market, can be extracted from this eigenvalue and its corresponding eigenvector (Plerou et al. 2000(Plerou et al. , 2002. Its effect on C is represented by and the rest of C is which is the filtered matrix after the market-wide effect is subtracted. Next, genuine information can be extracted from the next largest eigenvalues if they are still larger than the selected upper bound (e.g. the bound k Ã max implied by RMT or the more ''conservative'' one given by k all max ¼ maxðk Ã max ; k ð1Þ max ; k ð2Þ max Þ). Such information may be associated with the level of sub-groups (namely the ''groups'' modes) (e.g. Kim and Jeong 2005;Shen and Zheng 2009;Jiang and Zheng 2012). In that case, following Kim and Jeong (2005), we can further decompose C into three parts C m , C g , and C r as where C g and C r indicate the effect of the ''groups'' modes and the rest of C, respectively. From an economic perspective, the matrix C g captures the influence of non-systematic components (e.g. sectoral factors, regional factors, etc.) that only affect the lending of a subset of banks in the credit market. The residual represented by the matrix C r is refereed to as the random noise part. 11 11 Indeed, the interpretations for different factors used in the decomposition of the matrix C are somewhat similar to those used for the dynamic factors models, which have also become one of the important dimension reduction techniques in macro-econometrics as well as financial econometrics (e.g. Mathematically, and the rest where N g is determined based on the number of the next largest eigenvalues larger than the selected upper bound (see also Kim and Jeong (2005), MacMahon and Garlaschelli (2015) for further discussions about the selection of N g ). From Eqs. (17) to (19), we can define FC m;g ¼ ðC À C m À C g Þ ¼ C r as the filtered matrix, in which the effects of the ''market'' and ''groups'' modes are subtracted.

Absorption Ratios and Systemic Risk
From Eq. (13), for each i ¼ 1; 2; . . .N, we define the absorption ratio E i as which indicates the fraction of the total variance of fX j g N j¼1 explained by (or absorbed by) the first i principal components (e.g. Pukthuanthong and Roll 2009;Wang et al. 2011;Billio et al. 2012). Since the ratio E N is equal to 1, representing 100% of the total variance explained by all components together. Hence, comparing different absorption ratios E 1 ; . . .; E k to E N ¼ 1 allows us to assess the contribution of the significant principal components to the total variance. 12 On top of this, we can use the absorption ratios to analyse the systemic risk in the market. For example, if E 1 accounts for a larger part of E N when the largest eigenvalue k 1 increases, it indicates a higher level of systemic risk. This is because it implies that banks in the credit market now become more unified and strongly coupled, and a shock, thus, can be quickly and widely propagated throughout the whole system (e.g. Pukthuanthong and Roll 2009;Billio et al. 2012;Zheng et al. 2012;Meng et al. 2014).

Inverse Participation Ratios
We use the Inverse Participation Ratio (IPR) as an indicator of the contribution to the eigenvectors of the correlation matrix (Plerou et al. 1999(Plerou et al. , 2002. It is given by The inverse of IPR i indicates the number of elements that contribute significantly to 12 Here k is the largest integer such that k k is larger than the selected bound, e.g. k Ã max implied by RMT or the more ''conservative'' one k all max ¼ maxðk Ã max ; k ð1Þ max ; k ð2Þ max Þ. u i . More specifically, a lower value for IPR implies that banks contribute more equally (i.e. a less localized behavior). In contrast, a higher value for IPR shows that a smaller number of banks dominate in the eigenvector (i.e. a more localized behavior), signaling the presence of a more hierarchical structure.
In the normalized version of eigenvectors, if P N j¼1 u i ðjÞ 2 ¼ 1 (8i ¼ 1; 2; . . .N), the ratio IPR i will have two limiting cases Notice  Massara et al. 2015). Based on the decomposition of the empirical correlations into different eigenmodes (see Eq. (17)), we can also construct the PMFG for the different correlation matrices, e.g. the filtered correlation matrix preserving only the ''market'' mode (i.e. C m ), the filtered correlation matrix retaining only the ''groups'' modes (i.e. C g ), or the noise-filtered correlation matrix maintaining all significant correlations (i.e. Indeed, the construction algorithm of the PMFG is similar to the one used for the so-called Minimal Spanning Tree (MST)-one of the simplest connected graphs widely used to capture a minimal set of relevant interactions associated with the strongest correlations (e.g. Mantegna 1999). To construct the MST graph, one can follow the following steps: First, sorting the pairwise correlations fC ij g in descending order, we obtain a list L of the N 2 reordered correlation elements. In the second step, choosing the two banks i and j associated with the first element in L, we put an edge between banks i and j to the MST graph. In the third step, we select the next edge in the list L, and then add the corresponding link and nodes to the graph if it is not connected to the existing nodes and does not form any cycles. We then repeat the third step until all the correlation elements in the list L have been examined.
Notice that since in the third step in the construction of the MST, all new edges are excluded if they form a cyclic structure, the resulting filtered MST is a tree graph consisting of N banks with ðN À 1Þ edges. Under such a strong constraint on an acyclic structure, the presence of significant correlations that have loops will be definitely discarded (e.g. Tumminello et al. 2005;Di Matteo et al. 2010;Wang et al. 2018). To provide a more significant and richer structure, Tumminello et al. (2005) propose the PMFG in which the resulting connected planar graph retains ð3N À 6Þ edges and possibly contains 3-and 4-closed cliques (in the third step of the aforementioned construction algorithm of the MST). In fact, it can be shown that the resulting MST graph is always a subgraph of the PMFG (e.g. Tumminello et al. 2005;Pozzi et al. 2008).

Portfolio Correlations in the Bank-Firm Credit Market of Japan
To begin with, we briefly summarize the main statistical properties of the elements of the correlation matrices in the three lending layers of the bank-firm credit market of Japan over the period from 1980 to 2012. The time-varying distributions of C ij in the different layers are shown in Fig. 1, while the fundamental statistics including the mean, median, skewness and kurtosis of C ij are reported in Fig. 2. Overall, we observe that the averages of correlations in all layers are relatively small but always positive. In addition, these distributions exhibit positive skewness and high kurtosis.
In order to investigate whether non-random patterns can be identified in the correlations between loan portfolios, in each layer, first, we compute the eigenvalues as well as eigenvectors of the empirical correlation matrix C. After that, we will compare the distribution of eigenvalues of C with the one predicted by RMT. Note that in our analysis we also consider the potential effects of extreme values on the upper limit of eigenvalues by taking into account the other two upper bounds k ð1Þ max and k ð2Þ max , besides the bound k Ã max implied by RMT. As shown in Fig. 3 and in Fig. 11 (see the ''Appendix''), a majority of the eigenvalues of the empirical correlation matrix lie within the random bulk [k Ã min , k Ã max ]. Compared to this, we find that about additional 10% of the eigenvalues are in the broader range [k Ã min , k all max ] when the more conservative upper bound k all max ¼ maxðk Ã max ; k ð1Þ max ; k ð2Þ max Þ is used. Furthermore, we also observe a group of the smallest eigenvalues deviating from the RMT's left edge (i.e. less than the lower bound k Ã min ). However, these eigenvalues are not of interest in the present work as their contribution to the total variance is relatively small.
Moving on to the largest eigenvalues exceeding the RMT's right edge, typically from 10 % to 20 % of eigenvalues are always larger than k Ã max over the years, while a smaller fraction is observed when we only count the eigenvalues that exceed k all max . All taken together, the inspection of the empirical distribution the eigenvalues suggests that genuine information from the bank-bank correlation matrices and significant factors driving the correlation dynamics over the years in the three layers can be extracted from some of the largest eigenvalues.
In the following, for the sake of conciseness, we select the top five largest eigenvalues of the empirical correlation matrix in each lending layer and then investigate their temporal evolution over time. As shown in the panels (a), (c), and (e) of Fig. 4, over the years, these eigenvalues persistently deviate from the random bulk, and they are always larger than the all three upper bounds k Ã max , k ð1Þ max and k ð2Þ max . To show the contribution of the first five principal components to the total variance and to infer the systemic risk, we also report the associated absorption ratios defined in Eq. (20). Overall, to a certain degree, the layer 1 (total loans) and layer 3 (longterm loans) display a similar trend. As demonstrated in Fig. 4, a couple of regimes in the bank-firm credit market of Japan can be identified from these two layers. For example, the temporal evolution shows that almost all top five eigenvalues and the absorption ratios increase sharply during the period of the lending boom and Japanese asset price bubble (1986)(1987)(1988)(1989)(1990)(1991). 13 This implies that in this period the top five significant factors become more important driving forces of the correlation dynamics, and Japanese banks tend to be more tightly coupled in lending to the nonfinancial sector of the economy. Hence, it indicates a higher level of systemic risk in the credit market during the bubble time in the sense that banks' lending becomes more vulnerable to negative shocks (Kritzman et al. 2011;Billio et al. 2012;Meng et al. 2014). As we can see in the consecutive years after the bubble bursting, all of these eigenvalues as well as the corresponding absorption ratios substantially  1980,1990,2000,2010 decline, and even decrease further after the Asian financial crisis (1997)(1998). Such a trend somewhat stops in 2002, when the Bank of Japan (BOJ) implements a monetary stimulus policy (e.g. Bowman et al. 2015). In addition, as a consequence of the economic contraction in Japan hit by the recent global crisis (e.g. Kawai and Takagi 2011), we observe a significant decrease in all of these eigenvalues and absorption ratios in 2009 in all three layers. Furthermore, interestingly, compared to the layers 1 and 3, the layer 2 (short-term loans) possesses a different dynamics: the largest eigenvalues and their contributions to the total variance show a decline from the Asian financial crisis, suggesting that from then on Japanese banks tend to be less unified in this layer. Notice that as shown in panel (b) of Fig. 2, the correlations between banks in the layer 2 also exhibit a similar dynamics: on average, banks' short-term loans to firms become less The values of the skewness of C ij in the three layers. Panels g-i The values of the kurtosis of C ij in the three layers correlated since the aftermath of that crisis. In fact, by a closer inspection from a network perspective, we find that credit linkages in this lending layer also become sparser and the average of weights (in absolute terms and in log-scale) substantially drops since then.
Having discussed the important statistics of the eigenvalues of the correlation matrix C in each lending layer, we now move on to the analysis of the eigenvectors of C. Note that the i th element of an eigenvector u k indicates the contribution of the bank i to u k , thus, showing the role of the bank i in the k th principle component.
To begin with, we compare the distributions of the empirical eigenvector elements with the theoretical distribution predicted by RMT. According to the Gaussian prediction of RMT, the elements of each normalized eigenvector will follow (e.g. Guhr et al. 1998;Laloux et al. 1999;Plerou et al. 2002): The detailed comparison results are shown in Fig. 12 in the ''Appendix''. Overall, again we typically find that the distributions of the elements of eigenvectors associated with the largest eigenvalues significantly deviate from the Gaussian prediction of RMT. In contrast, the distributions of the elements of eigenvectors associated with eigenvalues in the random bulk are somewhat more similar to P RMT . This observation is in agreement with those reported in analyses of stock returns (e.g. Laloux et al. 1999;Plerou et al. 1999).
In the next step, based on the eigenvector elements, we investigate the degree of homogeneity/heterogeneity of banks' contribution to C. To this end, we compute the   4 Evolution of the top five largest eigenvalues and absorption ratios for the correlation matrices in the three layers. All panels on the left show the evolution of the top five largest eigenvalues. These eigenvalues are persistently larger than the bound k Ã max implied by RMT as well as the two more conservative bounds k ð1Þ max and k ð2Þ max . The right panels show the five associated absorption ratios (E 1 to E 5 ) and the absorption ratio of the top ten largest eigenvalues (E 10 ) and that of all eigenvalues that exceed k Ã max (namely E allsig:factors ) Inverse Participation Ratios fIPR k g k¼N k¼1 and then examine the relationship between these ratios and the corresponding eigenvalues. Large values of IPR k reveal that only few banks contribute to u k , suggesting the presence of a localized behavior of banks. In contrast, a small value of IPR k indicates that many banks contribute to u k together, showing that the factor (if it exists) extracted from that eigenvector has a pervasive effect in the market.
All panels of Fig. 5 demonstrate a common feature among the three layers: some smallest eigenvalues have the highest level of IPR and their IPR ratios deviate from the average of fIPR k g k¼N k¼1 . This indicates that their associated eigenvectors are more localized, i.e. only a small group of banks contribute to them. Nevertheless, these smallest eigenvalues are less than 1 and the contribution of their associated factors/components to the total variance (recalling also Eqs. (12) and (13)) is actually negligible. Moving on to the IPRs associated with the eigenvalues located in the center of the eigenvalue distribution, we find that these eigenvalues often have a relatively low level of IPR. In fact, these features of the smallest eigenvalues and those located at the center of the distribution are analogous to what universally found in similar analyses of stock return correlations (e.g. Plerou et al. 1999).
However, interestingly, we do observe a distinct feature for the eigenvector elements of the largest eigenvalue: over the years, we find that IPR 1 is typically less than the average of fIPR k g k¼N k¼1 , showing that there are many banks contributing to the eigenvector associated with the first largest eigenvalue. From an economic perspective, this indicates a wide effect that the ''market'' mode has on the banking system. Such a behavior still emerges in some of the next largest eigenvalues deviating from the random bulk. This result implies that besides the global common factor, some ''groups'' factors also play a certain role in explaining the correlation dynamics. Furthermore, it also signals the presence of large communities (clusters) of banks in which the members within each community (cluster) tend to be more correlated in their lending activities, since they are influenced by some common pervasive factors.
It should be emphasized that under a more detailed inspection, in each year, we also observe a more localized behavior in at least one of the eigenvectors associated with the largest eigenvalues other than k 1 . In this case, the associated factor only drives the co-movements among a smaller subset of banks. For instance, as shown in panels (b), (f) of Fig. 5, in 2012, in the first and the third layers, IPR 4 is relatively larger than the average of fIPR k g k¼N k¼1 , while in the second layer, such a behavior can be found in the case of IPR 8 . 14 Nevertheless, since we typically observe distinct localized eigenvectors in different years, it is hard to track the economic imprints responsible for the emergence of this behavior. We therefore leave this issue for future research.
Furthermore, if the largest eigenvalue and the next largest ones stand for the ''market'' and ''groups'' modes (i.e. systematic and ''non-systematic'' factors), respectively, it would be interesting to examine their effects by comparing the raw correlation matrix with the filtered ones. As can be seen in the panels (a) to (f) of Fig. 6, after excluding the influence of the ''market'' mode on cross-correlation matrix C, some significant correlations still remain in the filtered matrix FC m . Such significant correlations are actually mainly driven by the ''groups'' modes associated with the next largest eigenvalues. A further decomposition can be implemented to split the ''market'' mode-filtered matrix into the correlations driven by the ''groups'' modes and by noise. The ''market'' as well as ''groups'' modesfiltered matrices FC m;g in three layers are shown in the panels (g) to (h) of Fig. 6. We observe that almost all correlations are significantly subtracted in FC m;g . All taken together, the results we have obtained so far demonstrate that the analysis based on the methods of RMT and PCA can be used to identify important, significant patterns in the correlations between banks in lending to firms in the other sectors of the economy. In particular, we have shown that the correlations are mainly driven by a global common factor influencing the lending of all banks and some non-systematic factors that only affect the lending of a subset of banks in the credit market of Japan. To gain a deeper understanding of the structure and the dynamics of these correlations, we make one more step by combining these methods with those of complex networks. More specifically, for each year, we use the PMFG graph to extract the skeleton structure from the different parts of the crosscorrelation matrices, i.e. the part represents the effects of ''market'' mode (C m ), the part captures the effects of ''groups'' modes (C g ), the part represents the effects of "market" and "groups" modes, in 2012 Fig. 8 The interaction structure between banks for C m , C g , and C m;g obtained with the PMFG graph, in layer 2 both ''market'' as well as ''groups'' modes (C m;g ¼ C m þ C g ). We then study the persistence and changes in the skeleton structure over the years. For the illustration purpose, in Figs. 7, 8 and 9, we show PMFG graphs extracted from different parts of the correlation matrices in the two years 1980 and 2012. In each panel of these Figures, each number corresponds to a bank, and each link (in green color) indicates the presence of a significant relation between a pair of banks. The numbers in red color represents the hub banks that have the highest levels of the degrees in the PMFG graph. Overall, the correlations between banks reveal different backbone structures in different periods. Indeed, by having a closer inspection over the years (for example, see Figs. 17 and 18 in the ''Appendix'') we often observe that when the dominance of a group of banks in one period gradually disappears, the credit market starts to build-up a different structure in the next period in which another group of banks emerge as the key players in the backbone of the crosscorrelations. The interaction structure between banks for C m , C g , and C m;g obtained with the PMFG graph, in layer 3 In this paper, we have examined the structure and the dynamics of the correlation matrices for the banks' loan portfolios in a large data set of the bank-firm credit network of Japan during the period from 1980 to 2012. Our results show that only a subset of the eigenvalues contain genuine information while the rest actually corresponds to noise or just have a small effect on the correlations between banks in lending to firms in the other sectors of the economy. This implies that the methods of RMT and PCA can be used for filtering noise from the empirical correlation matrices of banks' loan portfolios.
The dynamics of the largest eigenvalues and the associated absorption ratios reveal different milestones in the credit market of Japan over the years. More specifically, there is a dramatic increase in the level of systemic risk during the period of the Japanese asset price bubble (1986)(1987)(1988)(1989)(1990)(1991). In contrast, we observe a significant decrease in all of these eigenvalues and the associated absorption ratios in 2009, which might be considered as a consequence of the economic contraction in Japan hit by the global crisis (e.g. Kawai and Takagi 2011).
Based on the eigenvector elements, we have further investigated the localization and clustering behaviours of banks. One the one hand, we find that in the case of the eigenvectors corresponding to a group of smallest eigenvalues, there are only few banks contributing to them. Such a localization behavior is similar to what is often found in analyses of stock price changes (e.g. Plerou et al. 1999). However, since these eigenvalues are smaller than 1 and the top largest eigenvalues, the contribution of their corresponding principal components to the total variance is actually negligible. On the other hand, we find that the eigenvectors corresponding to the top largest eigenvalues are typically not very localized. This indicates that latent factors extracted from them have a more pervasive effect on banks in providing credit to the firms.
Interestingly, building Planar Maximally Filtered Graphs (PMFG) from the correlations driven different eigenmodes, we find that the local interaction structure between banks changes in different periods. More specifically, when the dominance of a group of banks in one period gradually disappears, the credit market starts to build-up a different structure in the next period in which another group of banks become the new hubs in the backbone of the cross-correlations.
Several directions for future research can be suggested from our present work. First, as we can see in the analysis of eigenvector elements, it is often observed that in a couple of the eigenvectors corresponding to the top largest eigenvalues, a large group of banks tend to have a higher degree of correlations among themselves. Therefore, the clustering behaviours and the community structure of banks in the correlation matrices in the different lending layers should be studied further (e.g. Almog et al. 2015;MacMahon and Garlaschelli 2015). Second, it would also be interesting to examine the effects of the bank characteristics (e.g. bank locations, bank types, balance sheets' information) on the formation of such clusters and communities. Last but not least, another important direction for future research is to analyse the multilayer architecture of the interdependencies between banks, e.g. to consider together various layers including the network of loan portfolio overlaps, the network of loan portfolio correlations, the correlation network of bank stock returns, and the physical interbank trading network (e.g. Brunetti et al. 2015;Montagna and Lux 2017). We believe that this direction will help to gain a deeper understanding of the complex structure of interactions within the banking system and between the banking system and the other sectors of the economy.

Appendix
Here we provide additional results for portfolio overlaps and similarities, the fractions of eigenvalues in different ranges, the distributions as well as the evolution of the eigenvector elements, and the more detailed correlation network structure between banks based on the method of PMFG. We also report the main results for the binary version of the bank-firm credit network of Japan.

Portfolio Overlaps and Similarities
In this part, we briefly summarize the main results for portfolio overlaps and portfolio similarities between banks. Recall that in each lending layer, each element of the binary version of the bank-bank projection matrix A BÀB explained in Eq. (3) implies whether a pair of banks share at least one common borrower. Meanwhile, each element of the similarity matrix S BÀB , which is based on the generalized Jaccard index defined in Eq. (1), indicates the degree of structural similarity between two loan portfolios. Figure 10 demonstrates the binary version of the bank- bank projection matrices (panels (a) to (c)) and the similarity matrices (panels (d) to (f)) in the three lending layers, for the data in 2012 as an example. We observe that many pairs of banks have overlaps in their loan portfolios, especially in the layer 1 and layer 3. Furthermore, the loan portfolios also tend to have a relatively high degree of similarities. 15 Figure 11 shows the fractions of eigenvalues in different ranges, i.e. larger than the considered upper bound (k Ã max or k all max ¼ maxðk Ã max ; k 1 max ; k 2 max Þ), within the interval from the lower bound to the considered upper bound (½k Ã min ; k Ã max or ½k Ã min ; k all max ), and less than the lower bound k Ã min . Although the fractions change over the years, in Fractions of eigenvalues in different ranges for the three layers. Panels a, d, g Fractions of eigenvalues larger than the considered upper bound k Ã max (blue lines) or k all max (red lines) for layers 1, 2, 3, respectively. Panels b, e, h Fractions of eigenvalues lie within ½k Ã min ; k Ã max (blue lines) or ½k Ã min ; k all max (red lines) for layers 1, 2, 3, respectively. Panels c, f, i Fractions of eigenvalues less than the lower bound k Ã min for layers 1, 2, 3, respectively .(Colour figure online) 15 To illustrate the similarity matrices, for the sake of convenience, we still use the same order of banks as in the analysis of binary overlaps in Fig. 10a-c, i.e. banks are sorted in descending order of the eigenvector centrality (e.g. see Bonacich (2007)) of the adjacency matrix obtained from A BÀB . We do not conclude any nature of the relationship between that order and the elements of the portfolio matrix S BÀB . general we find that a majority of the eigenvalues lie within the random bulk or are smaller than the lower bound k Ã min . In contrast, only a small subset of the top eigenvalues persistently exceed the upper bound k Ã max or k all max over the whole period.

Distribution of Eigenvector Elements
In Fig. 12, we compare the distributions of the elements of the four selected eigenvectors of C, i.e. the eigenvector u N associated with the smallest eigenvalue The eigenvectors (sorted in descending order) of k 1 and k 4 , in the three layers, in 2012. Without loss of generality, we fix the signs of eigenvector elements such that their sum is non-negative. Panels a, b show the eigenvectors of k 1 and k 4 in the layer 1. Panels c, d show the eigenvectors of k 1 and k 4 in the layer 2. Panels e, f show the eigenvectors of k 1 and k 4 in the layer 3 k N , the eigenvector u RMT corresponding to a representative eigenvalue located in the random bulk, the eigenvector u 4 associated with k 4 , and the eigenvector u 1 corresponding to the largest eigenvalue k 1 . Note that for the illustration purpose, in this Figure we re-normalize the eigenvector elements such that P N j¼1 u i ðjÞ 2 ¼ N (8i). We can see that u N has only few significant elements, revealing that this eigenvector is very localized. In addition, the distributions of the elements of u 1 and u 4 are different from the distribution of the elements of u RMT .

Localization Behavior
As two examples, with one for a more localized behavior of eigenvector elements and another for a less localized behavior of eigenvector elements, Fig. 13 shows the entries of u 1 and u 4 in descending order. We can see that in the first and third layers, the homogeneity degree is much stronger in u 1 than in u 4 . However, it should be emphasized that this does not implies that all banks contribute equally to the factor extracted from k 1 and u 1 . Indeed, as shown in Fig. 13a, c and e, we still observe a certain level of heterogeneity among banks.

Evolution of Eigenvector Elements
From Figs. 14, 15, 16, we show the evolution of the eigenvectors (in absolute terms) associated with the top four largest eigenvalues in the three lending layers. Here we focus on 115 banks that activate in the bank-firm credit market of Japan in all years from 1980 to 2012. As we can see, some banks correspond to a higher level eigenvector elements in certain sub-periods, but overall, in almost all years, the distribution ranges of these elements are not very wide. This result is consistent with those obtained from the analysis of IPR.

Additional Results for PMFG Graphs Over the Years
In the following we provide additional results for the evolution of PMFG graphs based on the matrix C m;g ¼ ðC m þ C g Þ, which captures the effects of significant factors (i.e. the global common factor as well as the ''groups'' factors) on the correlations between banks in lending to firms. Again, we focus on 115 banks activating in the bank-firm credit market of Japan in all the years from 1980 to 2012. For the sake of conciseness, here we only show the PMFG graphs in the layer 1 (the total lending layer). 16 In each panel of Figs. 17 and 18, the green links indicate significant pairwise correlations retained in the PMFG graph. The numbers represent banks, and those in red color are the banks that have highest degrees in the PMFG network. We can see that the correlation structure between banks changes in different periods. In particular, when the dominant role of a group of banks in one period gradually vanishes, the credit market starts to build-up a different structure in the next period in which another group of banks will become the new hubs in the backbone of the cross-correlations. (o) "market" and "groups" modes, in 2012 Fig. 18 The interaction structure between banks for the correlation matrix C m;g obtained with the PMFG graph, in layer 1, from 1998 to 2012 b Fig. 17 The interaction structure between banks for the correlation matrix C m;g obtained with the PMFG graph, in layer 1, from 1980 to 1997

Binary Analysis
Using the similar methods, we can also analyse the binary version of the bank-firm credit relationships. Interestingly, in general, our results suggest that this version also contains genuine information about the structure and dynamics of the crosscorrelation matrix of banks' loan portfolios.
Recalling that under the binary case, for each pair (i, j), we have a bf i;j ¼ 1 if bank i lends to firm j; a bf i;j ¼ 0 otherwise: ( We then define the matrix X bin ¼ fX i;j;bin g NxN F with each element is given by where for each bank i, ha bf i;j i and rða bf i;j Þ are the average and the standard deviation of a bf i;j over all N F firms, respectively. The correlation matrix C bin of the binary data is then defined as with X T bin is the transposition matrix of X bin . For the sake of conciseness, in the following we will only summarize the main results for the analysis of the binary data. First, we find that a group of largest eigenvalues (typically from 10 % to 15 %) always deviate from the random bulk predicted by RMT (see Fig. 19). In addition, we often observe that more than half of the eigenvalues lie between the two RMT bounds k Ã min and k Ã max . Compared to the weighted data, we also find a similar evolution dynamics for the top largest eigenvalues and their absorption ratios, although in general the factors extracted from these eigenvalues contribute slightly less to the total variance in the Furthermore, as demonstrated in Fig. 21, the localization behavior still emerges in the eigenvectors corresponding to the smallest eigenvalues, as their IPR ratios are often much higher than the average of IPR. Besides this, again many banks contribute to the eigenvectors corresponding to the top largest eigenvalues, signalling that the ''market'' mode as well as some of the ''groups'' modes extracted from the binary version of the credit network also have a wide effect.