Detecting network backbones against time variations in node properties

Abstract

Many real systems can be described through time-varying networks of interactions that encapsulate information sharing between individual units over time. These interactions can be classified as being either reducible or irreducible: reducible interactions pertain to node-specific properties, while irreducible interactions reflect dyadic relationships between nodes that form the network backbone. The process of filtering reducible links to detect the backbone network could allow for identifying family members and friends in social networks or social structures from contact patterns of individuals. A pervasive hypothesis in existing methods of backbone discovery is that the specific properties of the nodes are constant in time, such that reducible links have the same statistical features at any time during the observation. In this work, we release this assumption toward a new methodology for detecting network backbones against time variations in node properties. Through analytical insight and numerical evidence on synthetic and real datasets, we demonstrate the viability of the proposed approach to aid in the discovery of network backbones from time series. By critically comparing our approach with existing methods in the technical literature, we show that neglecting time variations in node-specific properties may beget false positives in the inference of the network backbone.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    According to the weighted configuration model [31, 32], Eq. (9) represents the expected number of links formed between node i and j in each of the \(\tau (\varDelta )\) snapshots of the \(\varDelta \)th interval. Since most temporal networks are sparse, we can assume that \(p_{i j} (t) \in \left[ 0, 1\right) \) and refer to it as a probability.

References

  1. 1.

    Holme, P., Saramäki, J.: Temporal networks. Phys. Rep. 519(3), 97 (2012)

    Google Scholar 

  2. 2.

    Holme, P.: Modern temporal network theory: a colloquium. Eur. Phys. J. B 88, 1 (2015)

    Google Scholar 

  3. 3.

    Masuda, N., Lambiotte, R.: A Guide to Temporal Networks, vol. 4. World Scientific, Singapore (2016)

    Google Scholar 

  4. 4.

    Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., et al.: Computational social science. Science 323(5915), 721 (2009)

    Google Scholar 

  5. 5.

    Ivancevic, T., Jain, L., Pattison, J., Hariz, A.: Nonlinear dynamics and chaos methods in neurodynamics and complex data analysis. Nonlinear Dyn. 56(1–2), 23 (2009)

    MathSciNet  MATH  Google Scholar 

  6. 6.

    Battiston, S., Farmer, J.D., Flache, A., Garlaschelli, D., Haldane, A.G., Heesterbeek, H., Hommes, C., Jaeger, C., May, R., Scheffer, M.: Complexity theory and financial regulation. Science 351(6275), 818–819 (2016)

    Google Scholar 

  7. 7.

    Kobayashi, T., Takaguchi, T., Barrat, A.: The structured backbone of temporal social ties. Nat. Commun. 10(1), 220 (2019)

    Google Scholar 

  8. 8.

    Wu, Z., Braunstein, L.A., Havlin, S., Stanley, H.E.: Transport in weighted networks: partition into superhighways and roads. Phys. Rev. Lett. 96(14), 148702 (2006)

    Google Scholar 

  9. 9.

    Serrano, M.Á., Boguná, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks. Proc. Natl. Acad. Sci. 106(16), 6483 (2009)

    Google Scholar 

  10. 10.

    Tumminello, M., Micciche, S., Lillo, F., Piilo, J., Mantegna, R.N.: Statistically validated networks in bipartite complex systems. PLoS ONE 6(3), e17994 (2011)

    Google Scholar 

  11. 11.

    Li, M.X., Palchykov, V., Jiang, Z.Q., Kaski, K., Kertész, J., Micciché, S., Tumminello, M., Zhou, W.X., Mantegna, R.N.: Statistically validated mobile communication networks: the evolution of motifs in European and Chinese data. New J. Phys. 16(8), 083038 (2014)

    Google Scholar 

  12. 12.

    Gemmetto, V., Cardillo, A., Garlaschelli, D.: Irreducible network backbones: unbiased graph filtering via maximum entropy (2017). arXiv preprint arXiv:1706.00230

  13. 13.

    Cimini, G., Squartini, T., Saracco, F., Garlaschelli, D., Gabrielli, A., Caldarelli, G.: The statistical physics of real-world networks. Nat. Rev. Phys. 1(1), 58 (2019)

    Google Scholar 

  14. 14.

    Marcaccioli, R., Livan, G.: A Pólya urn approach to information filtering in complex networks. Nat. Commun. 10(1), 745 (2019)

    Google Scholar 

  15. 15.

    Perra, N., Gonçalves, B., Pastor-Satorras, R., Vespignani, A.: Activity driven modeling of time varying networks. Sci. Rep. 2, 469 (2012)

    Google Scholar 

  16. 16.

    Zino, L., Rizzo, A., Porfiri, M.: An analytical framework for the study of epidemic models on activity driven networks. J. Complex Netw. 5(6), 924 (2017)

    MathSciNet  Google Scholar 

  17. 17.

    Sun, K., Baronchelli, A., Perra, N.: Contrasting effects of strong ties on SIR and SIS processes in temporal networks. Eur. Phys. J. B 88(12), 326 (2015)

    MathSciNet  Google Scholar 

  18. 18.

    Zino, L., Rizzo, A., Porfiri, M.: Modeling memory effects in activity-driven networks. SIAM J. Appl. Dyn. Syst. 17(4), 2830 (2018)

    MathSciNet  MATH  Google Scholar 

  19. 19.

    Nadini, M., Sun, K., Ubaldi, E., Starnini, M., Rizzo, A., Perra, N.: Epidemic spreading in modular time-varying networks. Sci. Rep. 8(1), 2352 (2018)

    Google Scholar 

  20. 20.

    Liu, Q.H., Xiong, X., Zhang, Q., Perra, N.: Epidemic spreading on time-varying multiplex networks. Phys. Rev. E 98(6), 062303 (2018)

    Google Scholar 

  21. 21.

    Lei, Y., Jiang, X., Guo, Q., Ma, Y., Li, M., Zheng, Z.: Contagion processes on the static and activity-driven coupling networks. Phys. Rev. E 93(3), 032308 (2016)

    MathSciNet  Google Scholar 

  22. 22.

    Rizzo, A., Frasca, M., Porfiri, M.: Effect of individual behavior on epidemic spreading in activity-driven networks. Phys. Rev. E 90(4), 042801 (2014)

    Google Scholar 

  23. 23.

    Nadini, M., Rizzo, A., Porfiri, M.: Epidemic spreading in temporal and adaptive networks with static backbone. In: IEEE Transactions on Network Science and Engineering. IEEE (2018)

  24. 24.

    Rizzo, A., Pedalino, B., Porfiri, M.: A network model for Ebola spreading. J. Theor. Biol. 394, 212 (2016)

    MathSciNet  MATH  Google Scholar 

  25. 25.

    Moinet, A., Starnini, M., Pastor-Satorras, R.: Burstiness and aging in social temporal networks. Phys. Rev. Lett. 114(10), 108701 (2015)

    Google Scholar 

  26. 26.

    Eguiluz, V.M., Chialvo, D.R., Cecchi, G.A., Baliki, M., Apkarian, A.V.: Scale-free brain functional networks. Phys. Rev. Lett. 94(1), 018102 (2005)

    Google Scholar 

  27. 27.

    Musciotto, F., Marotta, L., Piilo, J., Mantegna, R.N.: Long-term ecology of investors in a financial market. Palgrave Commun. 4(1), 92 (2018)

    Google Scholar 

  28. 28.

    Curme, C., Tumminello, M., Mantegna, R.N., Stanley, H.E., Kenett, D.Y.: Emergence of statistically validated financial intraday lead-lag relationships. Quant. Finance 15(8), 1375 (2015)

    MathSciNet  MATH  Google Scholar 

  29. 29.

    Challet, D., Chicheportiche, R., Lallouache, M., Kassibrakis, S.: Statistically validated lead-lag networks and inventory prediction in the foreign exchange market. Adv. Complex Syst. 21, 1850019 (2018)

    MathSciNet  Google Scholar 

  30. 30.

    Bongiorno, C., London, A., Miccichè, S., Mantegna, R.N.: Core of communities in bipartite networks. Phys. Rev. E 96(2), 022321 (2017)

    Google Scholar 

  31. 31.

    Serrano, M.Á., Boguñá, M.: Weighted configuration model. AIP Conf. Proc. 776(1), 101 (2005)

    Google Scholar 

  32. 32.

    Newman, M.E.J.: Networks: An Introduction. Oxford University Press, Oxford (2010)

    Google Scholar 

  33. 33.

    Gordevičius, J., Gamper, J., Böhlen, M.: Parsimonious temporal aggregation. VLDB J. 21(3), 309 (2012)

    Google Scholar 

  34. 34.

    Konno, H., Kuno, T.: Best piecewise constant approximation of a function of single variable. Oper. Res. Lett. 7(4), 205 (1988)

    MathSciNet  MATH  Google Scholar 

  35. 35.

    Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, vol. 98, pp. 24–27 (1998)

  36. 36.

    Mahlknecht, G., Bohlen, M.H., Dignös, A., Gamper, J.: VISOR: visualizing summaries of ordered data. IN: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, p. 40. ACM (2017)

  37. 37.

    Scargle, J.D., Norris, J.P., Jackson, B., Chiang, J.: Studies in astronomical time series analysis. VI. Bayesian block representations. Astrophys. J. 764(2), 167 (2013)

    Google Scholar 

  38. 38.

    Barbour, A., Eagleson, G.: Poisson approximation for some statistics based on exchangeable trials. Adv. Appl. Prob. 15(3), 585 (1983)

    MathSciNet  MATH  Google Scholar 

  39. 39.

    Steele, J.M.: Le Cam’s inequality and Poisson approximations. Am. Math. Mon. 101(1), 48 (1994)

    MathSciNet  MATH  Google Scholar 

  40. 40.

    Le Cam, L., et al.: An approximation theorem for the Poisson binomial distribution. Pac. J. Math. 10(4), 1181 (1960)

    MathSciNet  MATH  Google Scholar 

  41. 41.

    Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46(1), 561 (1995)

    Google Scholar 

  42. 42.

    Hochberg, Y., Tamhane, A.: Multiple Comparison Procedures. Wiley, New York (2009)

    Google Scholar 

  43. 43.

    Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57, 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  44. 44.

    www.sociopatterns.org

  45. 45.

    https://snap.stanford.edu/data/index.html

  46. 46.

    https://www.cs.cmu.edu/~./enron/

  47. 47.

    Perra, N., Balcan, D., Gonçalves, B., Vespignani, A.: Towards a characterization of behavior-disease models. PloS ONE 6(8), e23084 (2011)

    Google Scholar 

  48. 48.

    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Google Scholar 

  49. 49.

    Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull. Soc. Vaud. Sci. Nat. 37, 547 (1901)

    Google Scholar 

  50. 50.

    Vijaymeena, M., Kavitha, K.: A survey on similarity measures in text mining. Mach. Learn. Appl. Int. J. 3, 19 (2016)

    Google Scholar 

  51. 51.

    Kossinets, G., Watts, D.J.: Empirical analysis of an evolving social network. Science 311(5757), 88 (2006)

    MathSciNet  MATH  Google Scholar 

  52. 52.

    Ribeiro, B., Perra, N., Baronchelli, A.: Quantifying the effect of temporal resolution on time-varying networks. Sci. Rep. 3, 3006 (2013)

    Google Scholar 

  53. 53.

    Zhou, D.D., Hu, B., Guan, Z.H., Liao, R.Q., Xiao, J.W.: Finite-time topology identification of complex spatio-temporal networks with time delay. Nonlinear Dyn. 91(2), 785 (2018)

    MATH  Google Scholar 

  54. 54.

    Chen, J., Lu, Ja, Zhou, J.: Topology identification of complex networks from noisy time series using ROC curve analysis. Nonlinear Dyn. 75(4), 761 (2014)

    MathSciNet  Google Scholar 

  55. 55.

    Xu, Y., Zhou, W., Fang, J.: Topology identification of the modified complex dynamical network with non-delayed and delayed coupling. Nonlinear Dyn. 68(1–2), 195 (2012)

    MathSciNet  MATH  Google Scholar 

  56. 56.

    https://github.com/matnado/Backbone-Detection

Download references

Funding

The authors acknowledge financial support from the National Science Foundation under Grant No. CMMI-1561134. A.R. acknowledges financial support from Compagnia di San Paolo, Italy, and the Italian Ministry of Foreign Affairs and International Cooperation, within the project, “Macro to Micro: uncovering the hidden mechanisms driving network dynamics”.

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Alessandro Rizzo or Maurizio Porfiri.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Backbone detection methods

Here, we succinctly summarize the temporal fitness model (TFM) [7], the temporal fitness model with rhythm (\(\hbox {TFM}_{\mathrm{rhythm}}\)) [7], and the statistically validated network (SVN) [10].

Temporal fitness model

The TFM considers a temporal network formed by N nodes evolving over T discrete time steps. All multiple links occurring within the same time step are removed, so that the total number of temporal links between node i and j is bounded by T. First, individual activities are computed according to

$$\begin{aligned} a_i = \frac{{\overline{s}}^{\text {ts}}_i}{\sqrt{2 {\overline{W}}^{ts} T}}. \end{aligned}$$
(22)

Then, their values are refined through a maximum likelihood approach, which requires the solution of N equations

$$\begin{aligned} \sum _{j = 1; j\ne i}^{N} \frac{{\overline{w}}_{ij}^{\text {ts}} - T a_i^*a_j^*}{1 - a_i^*a_j^*} = 0, i = 1, \ldots , N, \end{aligned}$$
(23)

where \(\varvec{a^*} = \left( a_1^*, \ldots , a_N^* \right) \) contains the optimal values for the individual activities. Finally, the p-value \(\alpha _{i j}\) for the link generated between node i and j is computed from the cumulative function of the Binomial distribution as

$$\begin{aligned} \alpha _{i j} \equiv 1 - \sum _{ x = 0}^{{\overline{w}}^{\text {ts}}_{i j}-1} B \left( x; T, a_i^*a_j^* \right) . \end{aligned}$$
(24)

All p-values, one for each link in the network, are compared with a threshold value \(\beta \), properly corrected by using a multiple hypotheses correction [42, 43], and any value lower than \(\beta \) adds a link to the backbone network.

For our purposes, we also compute the expected total number of temporal links in the overall temporal evolution

$$\begin{aligned} \text {E} \left[ {\overline{W}} \right] = T \sum _{i,j=1;i<j}^{N} a_i^*a_j^*. \end{aligned}$$
(25)

Temporal fitness model with rhythm

The \(\hbox {TFM}_{\mathrm{rhythm}}\) adds to the TFM T time-varying coefficients, one for each time step, \(\varvec{\xi } = \left( \xi (1), \ldots , \xi (T) \right) \). First, every element in the time-varying vector is manually set to 0.999, with the exception of \(\xi (1)\) that is set equal to one. Individual activities are estimated according to Eq. (22). To determine the optimal values \((\varvec{a^*}, \varvec{\xi }^{*})\) in the maximum likelihood sense, we solve the system of \(N+T-1\) equations

$$\begin{aligned} \begin{aligned}&\sum _{t=1}^{T} \sum _{j = 1; j\ne i}^{N} \frac{A_{ij}^{\text {ts}}(t) - a_i^*a_j^*\xi ^*(t)}{1 - a_i^*a_j^*\xi ^*(t)} = 0,&i = 1, \ldots , N,\\&\sum _{i,j = 1; j\ne i}^{N} \frac{A_{ij}^{\text {ts}}(t) - a_i^*a_j^*\xi ^*(t)}{1 - a_i^*a_j^*\xi ^*(t)} = 0,&t = 2, \ldots , T, \end{aligned} \end{aligned}$$
(26)

where \(A_{ij}^{\text {ts}}(t)\) is the adjacency matrix at time t estimated from the time series. The expected number of links is computed as

$$\begin{aligned} \text {E} \left[ {\overline{w}}_{ij} \right] = \sum _{t = 1}^{T} a_i^*a_j^*\xi ^*(t). \end{aligned}$$
(27)

Finally, the p-value \(\alpha _{i j}\) for the link generated between node i and j is computed from the cumulative function of the Poisson distribution as

$$\begin{aligned} \alpha _{i j} \equiv 1 - \sum _{ x = 0}^{{\overline{w}}^{\text {ts}}_{i j}-1} P \left( x; \text {E}\left[ {\overline{w}}_{ij} \right] \right) . \end{aligned}$$
(28)

All the p-values, one for each link in the network, are compared to a threshold value \(\beta \), properly corrected by using a multiple hypotheses correction [42, 43]. Any value lower than \(\beta \) leads to a link in the backbone network.

For our purposes, we also compute the expected total number of temporal links in the overall temporal evolution

$$\begin{aligned} \text {E} \left[ {\overline{W}} \right] = \sum _{i,j=1;i<j}^{N} \sum _{t = 1}^{T} a_i^*a_j^*\xi ^*(t). \end{aligned}$$
(29)

Statistically validated network

The SVN considers a temporal network of N nodes evolving over an observation time window that can be either discrete or continuous in time. Temporal links are aggregated to form a weighted static network. The p-value \(\alpha _{i j}\) for the link generated between node i and j is computed from the cumulative function of the hypergeometric distribution as

$$\begin{aligned} \begin{aligned} \alpha _{i j}&\equiv 1 - \sum _{ x = 0}^{{\overline{w}}^{\mathrm {ts}}_{i j}-1} H \left( {\overline{w}}_{i j} \bigg |2 {\overline{W}}^{\mathrm {ts}}, {\overline{s}}_{i}^{\mathrm {ts}}, {\overline{s}}_{j}^{\mathrm {ts}} \right) . \end{aligned} \end{aligned}$$
(30)

The p-values are compared with a threshold value \(\beta \), properly corrected by using a multiple hypotheses correction [42, 43], and a link is added to the backbone network of the p-value which is less than \(\beta \).

On the similarity among the \(\hbox {EADM}_{I=1}\), SVN, and TFM

Here, we discuss why these three methods yield similar results for both synthetic and real datasets. First, we show that the \(\hbox {EADM}_{I=1}\) is a valid approximation of the TFM for large networks (hundreds of nodes or more). Then, we analytically examine the convergence of the SVN to the \(\hbox {EADM}_{I=1}\).

On the similarity between the TFM and \(\hbox {EADM}_{I=1}\)

We consider a long observation window T, for which the Binomial distribution in Eq. (24) converges to a Poisson distribution used in our method in Eq. (12). While in the \(\hbox {EADM}_{I=1}\) activities are estimated from the dataset using Eq. (8), in the TFM they are identified in a maximum likelihood sense [7]

Fig. 7
figure7

Accuracy of the \(\hbox {EADM}_{I=1}\) and the TFM in estimating the total number of temporal links in the overall time series. A perfect identification should yield a ratio between E\( \left[ {\overline{W}}\right] \) and \({\overline{W}}^{\text {ts}}\) of one (black solid line). In these simulations, we use our artificial network where no backbone is present (\(\delta = \lambda = 0\)) and activities are constant in time (\(T = 5000\), \(I=1\), \(\langle \tau (\varDelta ) \rangle = T/I = 5000\), \(a_{{\min }} = [\sqrt{\langle \tau (\varDelta ) \rangle }]^{-1}\), and \(p=0\)). Markers indicate the average of \(10^2\) independent simulations; 95% confidence interval is displayed in gray

Fig. 8
figure8

Sensitivity analysis of the EADM+R to the number of estimated intervals, \(I_e\), from \(I_e=1\) to \(I_e = T-1\). In a, c, we set \(a_{{\min }} = [\sqrt{\langle \tau (\varDelta ) \rangle }]^{-1}\) and \(\lambda = 0.025\), to attain a dense ADNs and an easy-to-discover backbone. On the contrary, in b, d, we set \(a_{{\min }} = [\langle \tau (\varDelta ) \rangle ]^{-1}\) and \(\lambda = 0.002\), to attain sparse ADNs and a partially hidden backbone. Other parameter values are: \(N=100\), \(T = 5000\), \(I=20\), \(\langle \tau (\varDelta ) \rangle = T/I = 250\), \(\delta = 0.01\), and \(p=0.4\). Markers indicate the average of \(10^2\) independent simulations; 95% confidence interval is displayed in gray

Fig. 9
figure9

Number of significant links as a function of the resolution for all real datasets under consideration. Inferences not reported correspond to simulations that exceed our time limit of 24 h

Fig. 10
figure10

Jaccard index between EADM+BB and all the other methods as a function of the resolution for all datasets under consideration. Inferences not reported correspond to simulations that exceed our time limit of 24 h

Fig. 11
figure11

Overlap coefficient between EADM \(+\) BB and all the other models as a function of the resolution for all datasets under consideration. Inferences not reported correspond to simulations that exceed our time limit of 24 h

In Fig. 7, we assess the ability of the \(\hbox {EADM}_{I=1}\) and the TFM to estimate the total number of temporal links. We compute the expected values of the number of links for the \(\hbox {EADM}_{I=1}\) as \(\text {E}\left[ {\overline{W}} \right] = \sum _{i,j=1;i<j}^{N} T p_{ij} \), while we use Eq. (25) for the TFM. These values are compared with the total number of temporal links observed in the time series \({\overline{W}}^{\text {ts}}\). As expected, the TFM works well for any network size, due to the use of the maximum likelihood. Nevertheless, the maximum likelihood approach becomes computational demanding for networks of around 1,000 nodes and beyond, thereby becoming useless for very large networks. On the other hand, the \(\hbox {EADM}_{I=1}\) shows poor performance for small networks, while reaching the TFM for networks of 100 nodes. This improvement in performance of the \(\hbox {EADM}_{I=1}\) is explained in [31], where it is shown that Eq. (14) is in excellent agreement with numerical simulations for large networks.

On the similarity between the SVN and \(\hbox {EADM}_{I=1}\)

When \({\overline{W}}^{\mathrm {ts}} \gg 1\), the hypergeometric distribution in Eq. (30) converges to a Poisson distribution and its p-value becomes equivalent to the p-value for the \(\hbox {EADM}_{I=1}\)

$$\begin{aligned} \begin{aligned} \alpha _{ij} = 1 - \sum _{x = 0}^{{\overline{w}}_{i j}^{\mathrm {ts}}-1} P \left( x; \frac{{\overline{s}}_{i}^{\mathrm {ts}} {\overline{s}}_{j}^{\mathrm {ts}}}{2 {\overline{W}}^{\mathrm {ts}}} \right) . \end{aligned} \end{aligned}$$
(31)

In all the synthetic and real data studied herein \({\overline{W}}^{\mathrm {ts}}\) is very large, so that Eq. (30) converges to Eq. (31).

Generation of synthetic temporal networks

To examine the precision and recall of irreducible links, we generate synthetic networks. The procedure of network generation is given as follows:

  1. 1.

    We consider a temporal network evolving in an observation window of length T, divided into I different intervals. We randomly select without replacement \(I-1\) time steps in \(\{1,\ldots ,T\}\), which we sort as \(t_{\mathrm {in}}(2) \ldots t_{\mathrm {in}}(I)\), and we set \(t_{\mathrm {in}}(1)=1\). Each interval \(\varDelta \) has different length \(\tau (\varDelta )\), so that, in general, the average length of the interval is \(\langle \tau (\varDelta ) \rangle = T/I\).

  2. 2.

    The N nodes in the network have a time-varying, piece-wise constant, individual activity. We extract activity values from a power law distribution, \(F(a) \sim a^{-2.1}\), with \(a \in [a_{{\min }}, 1]\). The time-varying activity \(a_i (t)\) is selected according to the following procedure:

    • When \(\varDelta =1\), N activity values, one for each node in the network, are randomly extracted from F(a), and held constant within \([t_{\text {in}}(1), t_{\text {in}}(1)+\tau (1)-1]\).

    • When \(2 \le \varDelta \le I\), activities might be correlated between two successive intervals, \(t_{1}\in [t_{\text {in}}(\varDelta -1), t_{\text {in}}(\varDelta -1)+\tau (\varDelta -1)-1]\) and \(t_{2} \in [t_{\text {in}}(\varDelta ), t_{\text {in}}(\varDelta )+\tau (\varDelta )-1]\) according to Eq. (19) in the main text.

  3. 3.

    We generate a temporal network in the observation window [1, T]. Each pair of nodes ij within an interval \(\varDelta \) is connected with probability \(a_i(\varDelta ) a_j(\varDelta )\). As a result, we obtain a sequence of T undirected and unweighted networks, with adjacency matrices \({\hat{A}}(1)\), \(\ldots \), \({\hat{A}}(T)\). These networks are generated only as a function of the individual activities.

  4. 4.

    Based on the node pairs that are connected at least once over T time steps of the observation window, we define the synthetic backbone. Specifically, we randomly assign a fraction \(\delta \) of these node pairs to the backbone.

  5. 5.

    We construct T new networks A(1), A(2), \(\ldots \), A(T) from \({\hat{A}}(1)\), \({\hat{A}}(2)\), \(\ldots \), \({\hat{A}}(T)\) by accounting for the synthetic backbone above. First, we set \(A_{ij}(t) = {\hat{A}}_{ij}(t)\) for \(t = 1, \ldots T\) for all the pairs that do not belong to the backbone. Then, for the generic link ij in the backbone, we initialize \(A_{i j}(1) = {\hat{A}}_{ij}(1)\) and we iterate the following steps for \(t = 2, \ldots , T\):

    • if \({\hat{A}}_{ij}(t) = 1\), we maintain \(A_{ij}(t) = 1\);

    • if \({\hat{A}}_{ij}(t) = 0\), we set \(A_{ij}(t) = 1\) with probability \(\lambda \) and \(A_{ij}(t) = 0\) with probability \(1-\lambda \).

    The parameter \(\lambda \) measures the preponderance of links associated with the backbone during the observation window.

Insights on the interval estimation

The EADM+R requires that the number of intervals is known a priori. Nevertheless, when dealing with real networks, our knowledge, \(I_e\), might differ from the true value, I. This mismatch might diminish the accuracy of the backbone inference, as examined below for synthetic data. We focus on two sets of parameters, which represent two possible scenarios. In the first case, \(a_{{\min }} = [\sqrt{\langle \tau (\varDelta ) \rangle }]^{-1}\) and \(\lambda = 0.025\), which correspond to a “dense” ADNs with an easily detectable backbone. In the second case, \(a_{{\min }} = [\langle \tau (\varDelta ) \rangle ]^{-1}\) and \(\lambda = 0.002\), which represent a “sparse” ADNs with a partially hidden backbone.

In Fig. 8a, c, we show that if the number of estimated intervals, \(I_e\), is greater or equal to the true value, I, precision and recall are close to one. On the contrary, in Fig. 8b, d, we observe a more dramatic scenario, in which increasing \(I_e\) hinders the performance of the method, leading to filtering out most of the links that belong to the backbone network.

Fig. 12
figure12

Total number of temporal links estimated in the time series \(W^{\text {ts}}\) as a function of the resolution for all datasets under consideration

Fig. 13
figure13

Relative error between the total number of temporal links, \({\overline{W}}^{\mathrm {ts}}\), and the number of total temporal links estimated from the backbone detection algorithms under consideration. The SVN is discarded from this analysis because it uses \({\overline{W}}^{\mathrm {ts}}\) as an input. Inferences not reported correspond to simulations that exceed our time limit of 24 h

Analysis of all available real datasets

Significant links

We compare the backbone networks from seven real-world datasets inferred by the five methods under consideration in terms of the number of significant links. The EADM+BB always finds less links than any other methods (Fig. 9).

Jaccard index

In Fig. 10, we assess differences in the backbone networks detected by the EADM+BB and four methods on seven real-world datasets, in terms of the Jaccard index. We observe that the EADM+BB finds backbones different from the \(\hbox {EADM}_{I=1}\), SVN, TFM, and \(\hbox {TFM}_{\mathrm{rhythm}}\), which are equivalent.

Overlap coefficient

Similar to Fig. 10, we examine the overlap coefficient of backbone networks determined by our method and the other four in Fig. 4, confirming that the EADM+BB tends to detect a subset of the links predicted by other methods—which are thus prone to false positives (Fig. 11).

Temporal links

In Fig. 12, we display the total number of temporal links estimated in the time series, \({\overline{W}}^{\mathrm{ts}}\), for all the considered methods on all the seven real-world datasets. We confirm that the number of links decreases as we increase the time resolution of the dataset.

Relative error

We analyze the accuracy of the methods in describing the overall system evolution. We compare the expected number of the total temporal links generated in, E\(\left[ {\overline{W}} \right] \), with \({\overline{W}}^{\mathrm {ts}}\). All methods are accurate for the datasets studied herein, with a relative error up to 5% (Fig. 13).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nadini, M., Bongiorno, C., Rizzo, A. et al. Detecting network backbones against time variations in node properties. Nonlinear Dyn 99, 855–878 (2020). https://doi.org/10.1007/s11071-019-05134-y

Download citation

Keywords

  • Activity-driven
  • Backbone network
  • Statistical filtering
  • Time-varying network