Abstract
Many real systems can be described through time-varying networks of interactions that encapsulate information sharing between individual units over time. These interactions can be classified as being either reducible or irreducible: reducible interactions pertain to node-specific properties, while irreducible interactions reflect dyadic relationships between nodes that form the network backbone. The process of filtering reducible links to detect the backbone network could allow for identifying family members and friends in social networks or social structures from contact patterns of individuals. A pervasive hypothesis in existing methods of backbone discovery is that the specific properties of the nodes are constant in time, such that reducible links have the same statistical features at any time during the observation. In this work, we release this assumption toward a new methodology for detecting network backbones against time variations in node properties. Through analytical insight and numerical evidence on synthetic and real datasets, we demonstrate the viability of the proposed approach to aid in the discovery of network backbones from time series. By critically comparing our approach with existing methods in the technical literature, we show that neglecting time variations in node-specific properties may beget false positives in the inference of the network backbone.
Similar content being viewed by others
Notes
According to the weighted configuration model [31, 32], Eq. (9) represents the expected number of links formed between node i and j in each of the \(\tau (\varDelta )\) snapshots of the \(\varDelta \)th interval. Since most temporal networks are sparse, we can assume that \(p_{i j} (t) \in \left[ 0, 1\right) \) and refer to it as a probability.
References
Holme, P., Saramäki, J.: Temporal networks. Phys. Rep. 519(3), 97 (2012)
Holme, P.: Modern temporal network theory: a colloquium. Eur. Phys. J. B 88, 1 (2015)
Masuda, N., Lambiotte, R.: A Guide to Temporal Networks, vol. 4. World Scientific, Singapore (2016)
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., et al.: Computational social science. Science 323(5915), 721 (2009)
Ivancevic, T., Jain, L., Pattison, J., Hariz, A.: Nonlinear dynamics and chaos methods in neurodynamics and complex data analysis. Nonlinear Dyn. 56(1–2), 23 (2009)
Battiston, S., Farmer, J.D., Flache, A., Garlaschelli, D., Haldane, A.G., Heesterbeek, H., Hommes, C., Jaeger, C., May, R., Scheffer, M.: Complexity theory and financial regulation. Science 351(6275), 818–819 (2016)
Kobayashi, T., Takaguchi, T., Barrat, A.: The structured backbone of temporal social ties. Nat. Commun. 10(1), 220 (2019)
Wu, Z., Braunstein, L.A., Havlin, S., Stanley, H.E.: Transport in weighted networks: partition into superhighways and roads. Phys. Rev. Lett. 96(14), 148702 (2006)
Serrano, M.Á., Boguná, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks. Proc. Natl. Acad. Sci. 106(16), 6483 (2009)
Tumminello, M., Micciche, S., Lillo, F., Piilo, J., Mantegna, R.N.: Statistically validated networks in bipartite complex systems. PLoS ONE 6(3), e17994 (2011)
Li, M.X., Palchykov, V., Jiang, Z.Q., Kaski, K., Kertész, J., Micciché, S., Tumminello, M., Zhou, W.X., Mantegna, R.N.: Statistically validated mobile communication networks: the evolution of motifs in European and Chinese data. New J. Phys. 16(8), 083038 (2014)
Gemmetto, V., Cardillo, A., Garlaschelli, D.: Irreducible network backbones: unbiased graph filtering via maximum entropy (2017). arXiv preprint arXiv:1706.00230
Cimini, G., Squartini, T., Saracco, F., Garlaschelli, D., Gabrielli, A., Caldarelli, G.: The statistical physics of real-world networks. Nat. Rev. Phys. 1(1), 58 (2019)
Marcaccioli, R., Livan, G.: A Pólya urn approach to information filtering in complex networks. Nat. Commun. 10(1), 745 (2019)
Perra, N., Gonçalves, B., Pastor-Satorras, R., Vespignani, A.: Activity driven modeling of time varying networks. Sci. Rep. 2, 469 (2012)
Zino, L., Rizzo, A., Porfiri, M.: An analytical framework for the study of epidemic models on activity driven networks. J. Complex Netw. 5(6), 924 (2017)
Sun, K., Baronchelli, A., Perra, N.: Contrasting effects of strong ties on SIR and SIS processes in temporal networks. Eur. Phys. J. B 88(12), 326 (2015)
Zino, L., Rizzo, A., Porfiri, M.: Modeling memory effects in activity-driven networks. SIAM J. Appl. Dyn. Syst. 17(4), 2830 (2018)
Nadini, M., Sun, K., Ubaldi, E., Starnini, M., Rizzo, A., Perra, N.: Epidemic spreading in modular time-varying networks. Sci. Rep. 8(1), 2352 (2018)
Liu, Q.H., Xiong, X., Zhang, Q., Perra, N.: Epidemic spreading on time-varying multiplex networks. Phys. Rev. E 98(6), 062303 (2018)
Lei, Y., Jiang, X., Guo, Q., Ma, Y., Li, M., Zheng, Z.: Contagion processes on the static and activity-driven coupling networks. Phys. Rev. E 93(3), 032308 (2016)
Rizzo, A., Frasca, M., Porfiri, M.: Effect of individual behavior on epidemic spreading in activity-driven networks. Phys. Rev. E 90(4), 042801 (2014)
Nadini, M., Rizzo, A., Porfiri, M.: Epidemic spreading in temporal and adaptive networks with static backbone. In: IEEE Transactions on Network Science and Engineering. IEEE (2018)
Rizzo, A., Pedalino, B., Porfiri, M.: A network model for Ebola spreading. J. Theor. Biol. 394, 212 (2016)
Moinet, A., Starnini, M., Pastor-Satorras, R.: Burstiness and aging in social temporal networks. Phys. Rev. Lett. 114(10), 108701 (2015)
Eguiluz, V.M., Chialvo, D.R., Cecchi, G.A., Baliki, M., Apkarian, A.V.: Scale-free brain functional networks. Phys. Rev. Lett. 94(1), 018102 (2005)
Musciotto, F., Marotta, L., Piilo, J., Mantegna, R.N.: Long-term ecology of investors in a financial market. Palgrave Commun. 4(1), 92 (2018)
Curme, C., Tumminello, M., Mantegna, R.N., Stanley, H.E., Kenett, D.Y.: Emergence of statistically validated financial intraday lead-lag relationships. Quant. Finance 15(8), 1375 (2015)
Challet, D., Chicheportiche, R., Lallouache, M., Kassibrakis, S.: Statistically validated lead-lag networks and inventory prediction in the foreign exchange market. Adv. Complex Syst. 21, 1850019 (2018)
Bongiorno, C., London, A., Miccichè, S., Mantegna, R.N.: Core of communities in bipartite networks. Phys. Rev. E 96(2), 022321 (2017)
Serrano, M.Á., Boguñá, M.: Weighted configuration model. AIP Conf. Proc. 776(1), 101 (2005)
Newman, M.E.J.: Networks: An Introduction. Oxford University Press, Oxford (2010)
Gordevičius, J., Gamper, J., Böhlen, M.: Parsimonious temporal aggregation. VLDB J. 21(3), 309 (2012)
Konno, H., Kuno, T.: Best piecewise constant approximation of a function of single variable. Oper. Res. Lett. 7(4), 205 (1988)
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, vol. 98, pp. 24–27 (1998)
Mahlknecht, G., Bohlen, M.H., Dignös, A., Gamper, J.: VISOR: visualizing summaries of ordered data. IN: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, p. 40. ACM (2017)
Scargle, J.D., Norris, J.P., Jackson, B., Chiang, J.: Studies in astronomical time series analysis. VI. Bayesian block representations. Astrophys. J. 764(2), 167 (2013)
Barbour, A., Eagleson, G.: Poisson approximation for some statistics based on exchangeable trials. Adv. Appl. Prob. 15(3), 585 (1983)
Steele, J.M.: Le Cam’s inequality and Poisson approximations. Am. Math. Mon. 101(1), 48 (1994)
Le Cam, L., et al.: An approximation theorem for the Poisson binomial distribution. Pac. J. Math. 10(4), 1181 (1960)
Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46(1), 561 (1995)
Hochberg, Y., Tamhane, A.: Multiple Comparison Procedures. Wiley, New York (2009)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57, 289–300 (1995)
Perra, N., Balcan, D., Gonçalves, B., Vespignani, A.: Towards a characterization of behavior-disease models. PloS ONE 6(8), e23084 (2011)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull. Soc. Vaud. Sci. Nat. 37, 547 (1901)
Vijaymeena, M., Kavitha, K.: A survey on similarity measures in text mining. Mach. Learn. Appl. Int. J. 3, 19 (2016)
Kossinets, G., Watts, D.J.: Empirical analysis of an evolving social network. Science 311(5757), 88 (2006)
Ribeiro, B., Perra, N., Baronchelli, A.: Quantifying the effect of temporal resolution on time-varying networks. Sci. Rep. 3, 3006 (2013)
Zhou, D.D., Hu, B., Guan, Z.H., Liao, R.Q., Xiao, J.W.: Finite-time topology identification of complex spatio-temporal networks with time delay. Nonlinear Dyn. 91(2), 785 (2018)
Chen, J., Lu, Ja, Zhou, J.: Topology identification of complex networks from noisy time series using ROC curve analysis. Nonlinear Dyn. 75(4), 761 (2014)
Xu, Y., Zhou, W., Fang, J.: Topology identification of the modified complex dynamical network with non-delayed and delayed coupling. Nonlinear Dyn. 68(1–2), 195 (2012)
Funding
The authors acknowledge financial support from the National Science Foundation under Grant No. CMMI-1561134. A.R. acknowledges financial support from Compagnia di San Paolo, Italy, and the Italian Ministry of Foreign Affairs and International Cooperation, within the project, “Macro to Micro: uncovering the hidden mechanisms driving network dynamics”.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Backbone detection methods
Here, we succinctly summarize the temporal fitness model (TFM) [7], the temporal fitness model with rhythm (\(\hbox {TFM}_{\mathrm{rhythm}}\)) [7], and the statistically validated network (SVN) [10].
1.1.1 Temporal fitness model
The TFM considers a temporal network formed by N nodes evolving over T discrete time steps. All multiple links occurring within the same time step are removed, so that the total number of temporal links between node i and j is bounded by T. First, individual activities are computed according to
Then, their values are refined through a maximum likelihood approach, which requires the solution of N equations
where \(\varvec{a^*} = \left( a_1^*, \ldots , a_N^* \right) \) contains the optimal values for the individual activities. Finally, the p-value \(\alpha _{i j}\) for the link generated between node i and j is computed from the cumulative function of the Binomial distribution as
All p-values, one for each link in the network, are compared with a threshold value \(\beta \), properly corrected by using a multiple hypotheses correction [42, 43], and any value lower than \(\beta \) adds a link to the backbone network.
For our purposes, we also compute the expected total number of temporal links in the overall temporal evolution
1.1.2 Temporal fitness model with rhythm
The \(\hbox {TFM}_{\mathrm{rhythm}}\) adds to the TFM T time-varying coefficients, one for each time step, \(\varvec{\xi } = \left( \xi (1), \ldots , \xi (T) \right) \). First, every element in the time-varying vector is manually set to 0.999, with the exception of \(\xi (1)\) that is set equal to one. Individual activities are estimated according to Eq. (22). To determine the optimal values \((\varvec{a^*}, \varvec{\xi }^{*})\) in the maximum likelihood sense, we solve the system of \(N+T-1\) equations
where \(A_{ij}^{\text {ts}}(t)\) is the adjacency matrix at time t estimated from the time series. The expected number of links is computed as
Finally, the p-value \(\alpha _{i j}\) for the link generated between node i and j is computed from the cumulative function of the Poisson distribution as
All the p-values, one for each link in the network, are compared to a threshold value \(\beta \), properly corrected by using a multiple hypotheses correction [42, 43]. Any value lower than \(\beta \) leads to a link in the backbone network.
For our purposes, we also compute the expected total number of temporal links in the overall temporal evolution
1.1.3 Statistically validated network
The SVN considers a temporal network of N nodes evolving over an observation time window that can be either discrete or continuous in time. Temporal links are aggregated to form a weighted static network. The p-value \(\alpha _{i j}\) for the link generated between node i and j is computed from the cumulative function of the hypergeometric distribution as
The p-values are compared with a threshold value \(\beta \), properly corrected by using a multiple hypotheses correction [42, 43], and a link is added to the backbone network of the p-value which is less than \(\beta \).
1.2 On the similarity among the \(\hbox {EADM}_{I=1}\), SVN, and TFM
Here, we discuss why these three methods yield similar results for both synthetic and real datasets. First, we show that the \(\hbox {EADM}_{I=1}\) is a valid approximation of the TFM for large networks (hundreds of nodes or more). Then, we analytically examine the convergence of the SVN to the \(\hbox {EADM}_{I=1}\).
1.2.1 On the similarity between the TFM and \(\hbox {EADM}_{I=1}\)
We consider a long observation window T, for which the Binomial distribution in Eq. (24) converges to a Poisson distribution used in our method in Eq. (12). While in the \(\hbox {EADM}_{I=1}\) activities are estimated from the dataset using Eq. (8), in the TFM they are identified in a maximum likelihood sense [7]
In Fig. 7, we assess the ability of the \(\hbox {EADM}_{I=1}\) and the TFM to estimate the total number of temporal links. We compute the expected values of the number of links for the \(\hbox {EADM}_{I=1}\) as \(\text {E}\left[ {\overline{W}} \right] = \sum _{i,j=1;i<j}^{N} T p_{ij} \), while we use Eq. (25) for the TFM. These values are compared with the total number of temporal links observed in the time series \({\overline{W}}^{\text {ts}}\). As expected, the TFM works well for any network size, due to the use of the maximum likelihood. Nevertheless, the maximum likelihood approach becomes computational demanding for networks of around 1,000 nodes and beyond, thereby becoming useless for very large networks. On the other hand, the \(\hbox {EADM}_{I=1}\) shows poor performance for small networks, while reaching the TFM for networks of 100 nodes. This improvement in performance of the \(\hbox {EADM}_{I=1}\) is explained in [31], where it is shown that Eq. (14) is in excellent agreement with numerical simulations for large networks.
1.2.2 On the similarity between the SVN and \(\hbox {EADM}_{I=1}\)
When \({\overline{W}}^{\mathrm {ts}} \gg 1\), the hypergeometric distribution in Eq. (30) converges to a Poisson distribution and its p-value becomes equivalent to the p-value for the \(\hbox {EADM}_{I=1}\)
In all the synthetic and real data studied herein \({\overline{W}}^{\mathrm {ts}}\) is very large, so that Eq. (30) converges to Eq. (31).
1.3 Generation of synthetic temporal networks
To examine the precision and recall of irreducible links, we generate synthetic networks. The procedure of network generation is given as follows:
- 1.
We consider a temporal network evolving in an observation window of length T, divided into I different intervals. We randomly select without replacement \(I-1\) time steps in \(\{1,\ldots ,T\}\), which we sort as \(t_{\mathrm {in}}(2) \ldots t_{\mathrm {in}}(I)\), and we set \(t_{\mathrm {in}}(1)=1\). Each interval \(\varDelta \) has different length \(\tau (\varDelta )\), so that, in general, the average length of the interval is \(\langle \tau (\varDelta ) \rangle = T/I\).
- 2.
The N nodes in the network have a time-varying, piece-wise constant, individual activity. We extract activity values from a power law distribution, \(F(a) \sim a^{-2.1}\), with \(a \in [a_{{\min }}, 1]\). The time-varying activity \(a_i (t)\) is selected according to the following procedure:
When \(\varDelta =1\), N activity values, one for each node in the network, are randomly extracted from F(a), and held constant within \([t_{\text {in}}(1), t_{\text {in}}(1)+\tau (1)-1]\).
When \(2 \le \varDelta \le I\), activities might be correlated between two successive intervals, \(t_{1}\in [t_{\text {in}}(\varDelta -1), t_{\text {in}}(\varDelta -1)+\tau (\varDelta -1)-1]\) and \(t_{2} \in [t_{\text {in}}(\varDelta ), t_{\text {in}}(\varDelta )+\tau (\varDelta )-1]\) according to Eq. (19) in the main text.
- 3.
We generate a temporal network in the observation window [1, T]. Each pair of nodes ij within an interval \(\varDelta \) is connected with probability \(a_i(\varDelta ) a_j(\varDelta )\). As a result, we obtain a sequence of T undirected and unweighted networks, with adjacency matrices \({\hat{A}}(1)\), \(\ldots \), \({\hat{A}}(T)\). These networks are generated only as a function of the individual activities.
- 4.
Based on the node pairs that are connected at least once over T time steps of the observation window, we define the synthetic backbone. Specifically, we randomly assign a fraction \(\delta \) of these node pairs to the backbone.
- 5.
We construct T new networks A(1), A(2), \(\ldots \), A(T) from \({\hat{A}}(1)\), \({\hat{A}}(2)\), \(\ldots \), \({\hat{A}}(T)\) by accounting for the synthetic backbone above. First, we set \(A_{ij}(t) = {\hat{A}}_{ij}(t)\) for \(t = 1, \ldots T\) for all the pairs that do not belong to the backbone. Then, for the generic link ij in the backbone, we initialize \(A_{i j}(1) = {\hat{A}}_{ij}(1)\) and we iterate the following steps for \(t = 2, \ldots , T\):
if \({\hat{A}}_{ij}(t) = 1\), we maintain \(A_{ij}(t) = 1\);
if \({\hat{A}}_{ij}(t) = 0\), we set \(A_{ij}(t) = 1\) with probability \(\lambda \) and \(A_{ij}(t) = 0\) with probability \(1-\lambda \).
The parameter \(\lambda \) measures the preponderance of links associated with the backbone during the observation window.
1.4 Insights on the interval estimation
The EADM+R requires that the number of intervals is known a priori. Nevertheless, when dealing with real networks, our knowledge, \(I_e\), might differ from the true value, I. This mismatch might diminish the accuracy of the backbone inference, as examined below for synthetic data. We focus on two sets of parameters, which represent two possible scenarios. In the first case, \(a_{{\min }} = [\sqrt{\langle \tau (\varDelta ) \rangle }]^{-1}\) and \(\lambda = 0.025\), which correspond to a “dense” ADNs with an easily detectable backbone. In the second case, \(a_{{\min }} = [\langle \tau (\varDelta ) \rangle ]^{-1}\) and \(\lambda = 0.002\), which represent a “sparse” ADNs with a partially hidden backbone.
In Fig. 8a, c, we show that if the number of estimated intervals, \(I_e\), is greater or equal to the true value, I, precision and recall are close to one. On the contrary, in Fig. 8b, d, we observe a more dramatic scenario, in which increasing \(I_e\) hinders the performance of the method, leading to filtering out most of the links that belong to the backbone network.
1.5 Analysis of all available real datasets
1.5.1 Significant links
We compare the backbone networks from seven real-world datasets inferred by the five methods under consideration in terms of the number of significant links. The EADM+BB always finds less links than any other methods (Fig. 9).
1.5.2 Jaccard index
In Fig. 10, we assess differences in the backbone networks detected by the EADM+BB and four methods on seven real-world datasets, in terms of the Jaccard index. We observe that the EADM+BB finds backbones different from the \(\hbox {EADM}_{I=1}\), SVN, TFM, and \(\hbox {TFM}_{\mathrm{rhythm}}\), which are equivalent.
1.5.3 Overlap coefficient
Similar to Fig. 10, we examine the overlap coefficient of backbone networks determined by our method and the other four in Fig. 4, confirming that the EADM+BB tends to detect a subset of the links predicted by other methods—which are thus prone to false positives (Fig. 11).
1.5.4 Temporal links
In Fig. 12, we display the total number of temporal links estimated in the time series, \({\overline{W}}^{\mathrm{ts}}\), for all the considered methods on all the seven real-world datasets. We confirm that the number of links decreases as we increase the time resolution of the dataset.
1.5.5 Relative error
We analyze the accuracy of the methods in describing the overall system evolution. We compare the expected number of the total temporal links generated in, E\(\left[ {\overline{W}} \right] \), with \({\overline{W}}^{\mathrm {ts}}\). All methods are accurate for the datasets studied herein, with a relative error up to 5% (Fig. 13).
Rights and permissions
About this article
Cite this article
Nadini, M., Bongiorno, C., Rizzo, A. et al. Detecting network backbones against time variations in node properties. Nonlinear Dyn 99, 855–878 (2020). https://doi.org/10.1007/s11071-019-05134-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11071-019-05134-y