Skip to main content

Tackling Information Asymmetry in Networks: A New Entropy-Based Ranking Index


Information is a valuable asset in socio-economic systems, a significant part of which is entailed into the network of connections between agents. The different interlinkages patterns that agents establish may, in fact, lead to asymmetries in the knowledge of the network structure; since this entails a different ability of quantifying relevant, systemic properties (e.g. the risk of contagion in a network of liabilities), agents capable of providing a better estimation of (otherwise) inaccessible network properties, ultimately have a competitive advantage. In this paper, we address the issue of quantifying the information asymmetry of nodes: to this aim, we define a novel index—InfoRank—intended to rank nodes according to their information content. In order to do so, each node ego-network is enforced as a constraint of an entropy-maximization problem and the subsequent uncertainty reduction is used to quantify the node-specific accessible information. We, then, test the performance of our ranking procedure in terms of reconstruction accuracy and show that it outperforms other centrality measures in identifying the “most informative” nodes. Finally, we discuss the socio-economic implications of network information asymmetry.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    Newman, M.E.J.: Networks: An Introduction. Oxford University Press, New York (2010)

    Book  Google Scholar 

  2. 2.

    Bloch, F., Jackson, M.O., Tebaldi, P.: Centrality measures in networks (2017). arXiv:1608.05845

  3. 3.

    Borgatti, S.P.: Centrality and network flow. Soc. Netw. 27, 55–71 (2005)

    Article  Google Scholar 

  4. 4.

    Benzi, M., Klymko, C.: A matrix analysis of different centrality measures. SIAM J. Matrix Anal. Appl. 36, 686–706 (2013).

    Article  MATH  Google Scholar 

  5. 5.

    Sabidussi, G.: The centrality index of a graph. Psychometrika 31, 581–603 (1966)

    MathSciNet  Article  Google Scholar 

  6. 6.

    Langville, A.N., Meyer, C.: Google’s PageRank and Beyond. Princeton University Press, Princeton (2006)

    Book  Google Scholar 

  7. 7.

    Squartini, T., Cimini, G., Gabrielli, A., Garlaschelli, D.: Network reconstruction via density sampling. Appl. Netw. Sci. 2(3) (2017).

  8. 8.

    Zhang, Q., Meizhu, L., Yuxian, D., Yong, D.: Local structure entropy of complex networks (2014). arXiv:1412.3910v1

  9. 9.

    Bianconi, G., Pin, P., Marsili, M.: Assessing the relevance of node features for network structure. PNAS 106(28), 11433–11438 (2009).

    ADS  Article  Google Scholar 

  10. 10.

    Bianconi, G.: The entropy of randomized network ensembles. Europhys. Lett. 81(2), 28005 (2007)

    ADS  MathSciNet  Article  Google Scholar 

  11. 11.

    Borgatti, S.P.: Identifying sets of key players in a social network. Comput. Math. Organ. Theory 12, 21–34 (2006).

    Article  MATH  Google Scholar 

  12. 12.

    Park, J., Newman, M.E.J.: The statistical mechanics of networks. Phys. Rev. E 70, 066117 (2004).

    ADS  MathSciNet  Article  Google Scholar 

  13. 13.

    Squartini, T., Garlaschelli, D.: Maximum-Entropy Networks. Pattern Detection, Network Reconstruction and Graph Combinatorics. Springer Briefs in Complexity. Springer, Cham (2018)

    MATH  Google Scholar 

  14. 14.

    Oshio, K., Iwasaki, Y., Morita, S., Osana, Y., Gomi, S., Akiyama, E., Omata, K., Oka, K., Kawamura, K.: Tech. Rep. of CCeP, Keio Future 3. Keio University, Tokyo (2003)

  15. 15.

    Colizza, V., Pastor-Satorras, R., Vespignani, A.: Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nat. Phys. 3, 276–282 (2007)

    Article  Google Scholar 

  16. 16.

    Martinez, N.D.: Artifacts or attributes? Effects of resolution on the Little Rock Lake food web. Ecol. Monogr. 61(4), 367–392 (1991)

    Article  Google Scholar 

  17. 17.

    Fortunato, S., Boguna, M., Flammini, A., Menczer, F.: Approximating PageRank from in-Degree in Lecture Notes in Computer Science 4936. Springer, Berlin (2008)

    MATH  Google Scholar 

  18. 18.

    Gleditsch, K.S.: Expanded trade and GDP data. J. Confl. Resolut. 46, 712–724 (2002)

    Article  Google Scholar 

  19. 19.

    Squartini, T., Fagiolo, G., Garlaschelli, D.: Randomizing world trade. I. A binary network analysis. Phys. Rev. E84, 046117 (2011).

  20. 20.

    Wittenberg-Moerman, R.: The role of information asymmetry and financial reporting quality in debt trading: evidence from the secondary loan market. J. Account. Econ. 46(2), 240–260 (2008)

    Article  Google Scholar 

  21. 21.

    Eisenberg, L., Noe, T.H.: Systemic risk in financial systems. Manag. Sci. 47(2), 236–249 (2001)

    Article  Google Scholar 

  22. 22.

    Rogers, L.C.G., Veraart, L.A.M.: Failure and rescue in an interbank network. Manag. Sci. 59(4), 882–898 (2013)

    Article  Google Scholar 

  23. 23.

    Barucca, P., Lillo, F.: The organization of the interbank network and how ECB unconventional measures affected the e-MID overnight market (2015). arXiv:1511.08068

  24. 24.

    Glasserman, P., Young, P.H.: Contagion in financial networks. J. Econ. Lit. 54(3), 779–831 (2016)

    Article  Google Scholar 

  25. 25.

    Barucca, P., Bardoscia, M., Caccioli, F., D’Errico, M., Visentin, G., Battiston, S., Caldarelli, G.: Network valuation in financial systems (2016). arXiv:1606.05164

Download references


PB and TS acknowledge support from: FET Project DOLFINS No. 640772 and FET IP Project MULTIPLEX No. 317532.

Author information



Corresponding author

Correspondence to Tiziano Squartini.


Appendix A

Here we show how the computation of \(S_0^{(i)}\) can be simplified in two cases of general interest. The first one concerns sparse networks: since, in this case, the probability coefficients defined by Eq. (1) satisfy the requirement \(p_{ij}\ll 1\), the following factorization holds \(p_{ij}\simeq x_ix_j\), further implying that

$$\begin{aligned} S_0^{(i)}\simeq -\sum _{j(\ne i)}[p_{ij}\ln p_{ij}-p_{ij}]=-k_i\ln \left( \frac{k_i}{\sqrt{2L}}\right) +k_i. \end{aligned}$$

The second approximation is valid whenever the node i-specific probability coefficients are well represented by their average value, i.e. \(p_{ij}\simeq \frac{k_i}{N-1}\equiv \overline{p}_{ij}\); in this case,

$$\begin{aligned} S_0^{(i)}\simeq -(N-1)\left[ \overline{p}_{ij}\ln \overline{p}_{ij}+\left( 1-\overline{p}_{ij}\right) \ln \left( 1-\overline{p}_{ij}\right) \right] . \end{aligned}$$

Appendix B

This second appendix collects the details of the derivation of our proposed methodology. Let us focus on the simplest case of a single node (hereafter indexed by l): in order to calculate InfoRank it can be imagined to solve two different problems. The first one concerns the maximization of the functional

$$\begin{aligned} S_0= & {} -\sum _\mathbf {G} P(\mathbf {G})\ln P(\mathbf {G})+\nonumber \\&-\sum _i\eta _i\left[ \sum _\mathbf {G}P(\mathbf {G})C_i(\mathbf {G})-C_i^*\right] \end{aligned}$$

i.e. the constrained Shannon entropy, constraints encoding the benchmark information accessible by all nodes (represented by the vector of M constraints \(\vec {C}^*\)—notice that the normalization condition of the probability distribution, \(P(\mathbf {G}|\vec {\eta })\), to be determined can be re-written as an \(M+1\)-th constraint of the kind \(C_{M+1}(\mathbf {G})=C_{M+1}^*=1\)) [13]. By solving the constrained-optimization problem in (14), node l finds that

$$\begin{aligned} S_0=\vec {\eta }\cdot \vec {C}^*+\ln Z(\vec {\eta }). \end{aligned}$$

(where \(Z(\vec {\eta })=\sum _{\mathbf {G}}e^{-\vec {\eta }\cdot \vec {C}(\mathbf {G})}\) is the so-called partition function and depends on the unknown Lagrange multipliers \(\vec {\eta }\)). On the other hand, the second optimization problem node l has to solve concerns the functional

$$\begin{aligned} S_{(l)}=S_0-\sum _{m}\psi _{lm}\left[ \sum _\mathbf {G}P(\mathbf {G})a_{lm}(\mathbf {G})-a^*_{lm}\right] \end{aligned}$$

with \(S_{(l)}\) being nothing else than the functional in (14) further constrained by imposing the ego-network of node l as well (i.e. the values of the link-specific variables \(a^*_{lm}\)—either 0 or 1). Upon solving the second problem, the expression

$$\begin{aligned} S_{(l)}=\vec {\theta }\cdot \vec {C}^*+\sum _m\psi _{lm}a^*_{lm}+\ln Z'(\vec {\theta },\vec {\psi }) \end{aligned}$$

(where \(Z'(\vec {\theta },\vec {\psi })=\sum _{\mathbf {G}}e^{-\vec {\theta }\cdot \vec {C}(\mathbf {G})-\sum _m\psi _{lm}a_{lm}(\mathbf {G})}\)) is found. Notice that although \(S_{(l)}\) and \(S_0\) are defined by the same vector of constraints, \(\vec {C}\), the numerical values of the Lagrange multipliers ensuring that \(\langle \vec {C}\rangle =\vec {C}^*\) will, in general, differ, whence the use of different symbols, i.e. \(\vec {\eta }\) and \(\vec {\theta }\).

Both functionals achieve a minimum in their stationary point (consistently with our attempt to minimize each node—residual—uncertainty). This can be easily proven, upon noticing that the Hessian matrix of both \(S_0\) and \(S_{(l)}\) is the covariance matrix of the constraints and, as such, positive-semidefinite. In order to find the stationary point of \(S_{(l)}\), node l must solve the equations

$$\begin{aligned} \frac{\delta S_{(l)}}{\delta \theta _i}=0,\,\forall \,i\,\,\,\text{ and }\,\,\,\frac{\delta S_{(l)}}{\delta \psi _{lm}}=0,\,\forall \,m \end{aligned}$$

which lead to the system of equations in (4). More explicitly, the second group of conditions reads

$$\begin{aligned} \sum _{\mathbf {G}}\left( \frac{e^{-\vec {\theta }\cdot \vec {C}(\mathbf {G})-\sum _m\psi _{lm}a_{lm}(\mathbf {G})}}{Z'(\vec {\theta },\vec {\psi })}\right) a_{lm}(\mathbf {G})=a^*_{lm},\,\forall \,m; \end{aligned}$$

in order to numerically evaluate the parameters \(\vec {\psi }\), let us focus on a specific value, e.g. \(\psi _{l1}\) controlling for the value of the entry \(a_{l1}\). Let us now explicitly distinguish the configurations characterized by \(a_{l1}=0\) from the ones with \(a_{l1}=1\): upon doing so, condition (19) can be rewritten as

$$\begin{aligned} \sum _{\mathbf {G}_1}\left( \frac{e^{-\vec {\theta }\cdot \vec {C}(\mathbf {G}_1)-\psi _{l1}-\sum _{m(\ne 1)}\psi _{lm}a_{lm}}}{Z'(\vec {\theta },\vec {\psi })}\right) =a^*_{l1} \end{aligned}$$

i.e. as a sum over only the configurations with \(a_{l1}=1\) (indicated with the symbol \(\mathbf {G}_1\)). Analogously, we can split \(Z'(\vec {\theta },\vec {\psi })\) into the sum of two terms, i.e. \(Z'(\vec {\theta },\vec {\psi })=Z_0'(\vec {\theta },\vec {\psi })+e^{-\psi _{l1}}Z_1'(\vec {\theta },\vec {\psi })\), where the first sum

$$\begin{aligned} Z_0'(\vec {\theta },\vec {\psi })=\sum _{\mathbf {G}_0}e^{-\vec {\theta }\cdot \vec {C}(\mathbf {G}_0)-\sum _{m(\ne 1)}\psi _{lm}a_{lm}} \end{aligned}$$

runs over the networks having \(a_{l1}=0\) and the second sum

$$\begin{aligned} Z_1'(\vec {\theta },\vec {\psi })=\sum _{\mathbf {G}_1}e^{-\vec {\theta }\cdot \vec {C}(\mathbf {G}_1)-\sum _{m(\ne 1)}\psi _{lm}a_{lm}} \end{aligned}$$

runs over the networks having \(a_{l1}=1\).

Solving Eq. (20) in the case \(a^*_{l1}=0\) leads to \(\psi _{l1}=+\infty \). As a consequence, in this case \(S_{(l)}=\vec {\theta }\cdot \vec {C}^*+\ln Z_0'(\vec {\theta },\vec {\psi })\) since the term \(Z_1'(\vec {\theta },\vec {\psi })\) is suppressed by the coefficient \(e^{-\psi _{l1}}\) that converges to zero. On the other hand, solving Eq. (20) in the case \(a^*_{l1}=1\) leads to \(\psi _{l1}=-\infty \) and \(S_{(l)}=\vec {\theta }\cdot \vec {C}^*+\ln Z_1'(\vec {\theta },\vec {\psi })\) since the term \(Z_0'(\vec {\theta },\vec {\psi })\) is now suppressed by the coefficient \(e^{\psi _{l1}}\) (this is readily seen by multiplying both the numerator and the denominator at the left-hand side of Eq. (20) by \(e^{\psi _{l1}}\)). Specifying the node-specific ego-networks, in other words, leads to reducing the number of configurations over which the estimation of the constraints is carried out: \(Z'(\vec {\theta })\), thus, runs over a smaller number of configurations than \(Z(\vec {\eta })\). The estimation of the other parameters \(a_{l2}\dots a_{lN}\) proceeds in an analogous way, by applying the same line of reasoning to the “surviving” partition functions.

Let us now evaluate the expressions \(Z(\vec {\eta })\) and \(Z'(\vec {\theta })\) for the same value of the parameters (say \(\vec {\mu }\)): since the number of addenda in \(Z(\vec {\mu })\) is larger than the number of addenda in \(Z'(\vec {\mu })\), it also holds true that \(\ln Z(\vec {\mu })\ge \ln Z'(\vec {\mu })\), in turn implying the inequivalence \(S_0(\vec {\mu })\ge S_{(l)}(\vec {\mu })\) to be true as well. Let us now choose a particular value of the parameters, i.e. the point of minimum of \(S_0\): \(\vec {\mu }=\vec {\eta }^*\). Thus,

$$\begin{aligned} S_0(\vec {\eta }^*)\ge S_{(l)}(\vec {\eta }^*)\ge S_{(l)}(\vec {\theta }^*) \end{aligned}$$

where the second inequality follows from the very definition of minimum. This ensures the ratio \(S_{(l)}/S_0\) to be smaller than one and the InfoRank index in Eq. (7) to be always well-defined.

Our ranking procedure builds upon the evidence that, by imposing more information on top of the common one, each node further reduces its uncertainty about the unknown network structure: the one reducing the residual uncertainty to the largest extent is identified as the “most informative” one.

The same line of reasoning applies when subsets of nodes are considered, although the resolution of such a problem may be computationally demanding: given a network of size N, quantifying the InfoRank of all possible subsets of s nodes would require computing \({N}\atopwithdelims (){s}\) different Shannon entropies.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Barucca, P., Caldarelli, G. & Squartini, T. Tackling Information Asymmetry in Networks: A New Entropy-Based Ranking Index. J Stat Phys 173, 1028–1044 (2018).

Download citation


  • Complex networks
  • Shannon entropy
  • Information theory
  • Ranking algorithm