Combinatorial approach to spreading processes on networks

Abstract

Stochastic spreading models defined on complex network topologies are used to mimic the diffusion of diseases, information, and opinions in real-world systems. Existing theoretical approaches to the characterization of the models in terms of microscopic configurations rely on some approximation of independence among dynamical variables, thus introducing a systematic bias in the prediction of the ground-truth dynamics. Here, we develop a combinatorial framework based on the approximation that spreading may occur only along the shortest paths connecting pairs of nodes. The approximation overestimates dynamical correlations among node states and leads to biased predictions. Systematic bias is, however, pointing in the opposite direction of existing approximations. We show that the combination of the two biased approaches generates predictions of the ground-truth dynamics that are more accurate than the ones given by the two approximations if used in isolation. We further take advantage of the combinatorial approximation to characterize theoretical properties of some inference problems, and show that the reconstruction of microscopic configurations is very sensitive to both the place where and the time when partial knowledge of the system is acquired.

Graphic Abstract

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Data Availability Statement

This manuscript has no associated data or the data will not be deposited. [Authors’ comment: Data of the real-world network considered in this paper can be found in Ref. [37].].

References

  1. 1.

    R. Pastor-Satorras, C. Castellano, P. Van Mieghem, A. Vespignani, Rev. Mod. Phys. 87, 925 (2015)

    ADS  Article  Google Scholar 

  2. 2.

    C .T. Butts, Science 325, 414 (2009)

    ADS  MathSciNet  Article  Google Scholar 

  3. 3.

    M.O. Jackson, Social and Economic Networks (Princeton University Press, Princeton, 2010)

    Google Scholar 

  4. 4.

    A. Vespignani, Nat. Phys. 8, 32 (2012)

    Article  Google Scholar 

  5. 5.

    A.L. Lloyd, R.M. May, Science 292, 1316 (2001)

    Article  Google Scholar 

  6. 6.

    K.T. Eames, M.J. Keeling, Proc. Natl. Acad. Sci. 99, 13330 (2002)

    ADS  Article  Google Scholar 

  7. 7.

    L. Weng, F. Menczer, Y.-Y. Ahn, In:Eighth international AAAI conference on weblogs and social media, (2014)

  8. 8.

    C. Castellano, S. Fortunato, V. Loreto, Rev. Mod. Phys. 81, 591 (2009)

    ADS  Article  Google Scholar 

  9. 9.

    Y. Moreno, M. Nekovee, A.F. Pacheco, Phys. Rev. E 69, 066130 (2004)

    ADS  Article  Google Scholar 

  10. 10.

    L. Dall’Asta, A. Baronchelli, A. Barrat, V. Loreto, Phys. Rev. E 74, 036105 (2006)

    ADS  Article  Google Scholar 

  11. 11.

    G. Brandi, R. Di Clemente, G. Cimini, Phys. A Stat. Mech. Appl. 507, 255 (2018)

    Article  Google Scholar 

  12. 12.

    I. Dobson, B.A. Carreras, D.E. Newman, J.M. Reynolds-Barredo, IEEE Trans. Power Syst. 31, 4831 (2016)

    ADS  Article  Google Scholar 

  13. 13.

    C.A. Hidalgo, B. Klinger, A.-L. Barabási, R. Hausmann, Science 317, 482 (2007)

    ADS  Article  Google Scholar 

  14. 14.

    T.P. Vogels, K. Rajan, L.F. Abbott, Annu. Rev. Neurosci. 28, 357 (2005)

    Article  Google Scholar 

  15. 15.

    Y. Moreno, R. Pastor-Satorras, A. Vespignani, Eur. Phys. J. B Condens. Matter Complex Syst. 26, 521 (2002)

    Google Scholar 

  16. 16.

    J.L. Payne, K.D. Harris, P.S. Dodds, Phys. Rev. E 84, 016110 (2011)

    ADS  Article  Google Scholar 

  17. 17.

    C. Castellano, R. Pastor-Satorras, Phys. Rev. Lett. 105, 218701 (2010)

    ADS  Article  Google Scholar 

  18. 18.

    L. Buzna, K. Peters, D. Helbing, Phys. A Stat. Mech. Appl. 363, 132 (2006)

    Article  Google Scholar 

  19. 19.

    F. Altarelli, A. Braunstein, L. Dall’Asta, A. Lage-Castellanos, R. Zecchina, Phys. Rev. Lett. 112, 118701 (2014)

  20. 20.

    A.Y. Lokhov, M. Mézard, H. Ohta, L. Zdeborová, Phys. Rev. E 90, 012801 (2014)

    ADS  Article  Google Scholar 

  21. 21.

    F. Radicchi, C. Castellano, Phys. Rev. Lett. 120, 198301 (2018)

  22. 22.

    D. Kempe, J. Kleinberg, É. Tardos, in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. (2003), pp. 137–146

  23. 23.

    Y. Wang, D. Chakrabarti, C. Wang, C. Faloutsos, in Proceedings of 22nd International Symposium on Reliable Distributed Systems, 2003, (IEEE, 2003), pp. 25–34

  24. 24.

    D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, C. Faloutsos, ACM Transactions on Information and System Security 10, 1 (2008). https://doi.org/10.1145/1284680.1284681. ISSN 1094-9224

  25. 25.

    B. Karrer, M.E. Newman, Phys. Rev. E 82, 016101 (2010)

    ADS  MathSciNet  Article  Google Scholar 

  26. 26.

    A.Y. Lokhov, M. Mézard, L. Zdeborová, Phys. Rev. E 91, 012811 (2015)

    ADS  Article  Google Scholar 

  27. 27.

    E. Cator, P. Van Mieghem, Phys. Rev. E 89, 052802 (2014)

    ADS  Article  Google Scholar 

  28. 28.

    J.P. Gleeson, Phys. Rev. X 3, 021004 (2013)

    Google Scholar 

  29. 29.

    D. Brockmann, D. Helbing, Science 342, 1337 (2013)

    ADS  Article  Google Scholar 

  30. 30.

    M.E. Newman, Phys. Rev. E 66, 016128 (2002)

    ADS  MathSciNet  Article  Google Scholar 

  31. 31.

    R.M. Anderson, B. Anderson, R.M. May, Infectious Diseases of Humans: Dynamics and Control (Oxford University Press, Oxford, 1992)

    Google Scholar 

  32. 32.

    J.P. Gleeson, Phys. Rev. Lett. 107, 068701 (2011)

    ADS  Article  Google Scholar 

  33. 33.

    K.E. Hamilton, L.P. Pryadko, Phys. Rev. Lett. 113, 208701 (2014)

    ADS  Article  Google Scholar 

  34. 34.

    B. Karrer, M.E. Newman, L. Zdeborová, Phys. Rev. Lett. 113, 208702 (2014)

    ADS  Article  Google Scholar 

  35. 35.

    F. Radicchi, Nat. Phys. 11, 597 (2015)

    Article  Google Scholar 

  36. 36.

    F. Radicchi, C. Castellano, Nat. Commun. 6, 1 (2015)

    Article  Google Scholar 

  37. 37.

    V. Colizza, R. Pastor-Satorras, A. Vespignani, Nat. Phys. 3, 276 (2007)

    Article  Google Scholar 

  38. 38.

    H. Prüfer, Arch. Math. Phys 27, 742 (1918)

    Google Scholar 

  39. 39.

    S. Pemmaraju, S. Skiena, Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica® (Cambridge University Press, Cambridge, 2003)

    Google Scholar 

  40. 40.

    D. Shah, T. Zaman, in Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, (2010), pp. 203–214

  41. 41.

    D. Shah, T. Zaman, IEEE Trans. Inf. Theory 57, 5163 (2011)

    Article  Google Scholar 

  42. 42.

    W. Luo, W.P. Tay, M. Leng, IEEE Trans. Signal Process. 61, 2850 (2013)

    ADS  MathSciNet  Article  Google Scholar 

  43. 43.

    K. Zhu, L. Ying, IEEE/ACM Trans. Netw. 24, 408 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

DM and FR acknowledge support from the US Army Research Office (W911NF-16-1- 0104). FR acknowledges support from the National Science Foundation (CMMI-1552487).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Filippo Radicchi.

Appendices

Appendix A: Magnitude of the error associated with the shortest-path combinatorial approximation

In Figs. 2 and 5, we considered an hypothetical setting where the generic node i is connected to the source node s by two independent paths of length \(\ell _{si}\) and \(\ell _{si} + d \ell \), with \(d \ell \ge 0\). The paths are independent in the sense that they do not share any node except for s and i. This fact allows us to easily compute the exact probabilities for the ground-truth scenario by simply combining the probabilities of the individual paths. The setting is useful to understand the magnitude of the error that we should expect to have when using SPCA in a non-tree network, where multiple paths among nodes may exist. For simplicity of notation, but without loss of generality, we will use \(\ell = \ell _{si}\) in the following description.

Susceptible-infected model

For the SI model, the probability that the infection reaches a certain node along a path of length \(\ell \) in t time steps or less is given by

$$\begin{aligned} q_{1} (\ell , t) = \sum _{r=0}^{t} \, \left( {\begin{array}{c}t-1\\ \ell -1\end{array}}\right) \, \beta ^{\ell } \, (1-\beta )^{t-\ell }, \end{aligned}$$

The previous expression is nothing more than a mere combination of Eqs. (2) and (7) of the main text. We just avoided to write an explicit dependence on the source and target nodes to simplify the expression. In presence of two independent paths, the probability that the infection reaches the target node is given by

$$\begin{aligned} q_{2}(\ell , \ell + d \ell , t) = 1 - [1 - q_{1} (\ell , t)] [1 - q_{1} (\ell + d \ell , t)], \end{aligned}$$

thus equal to the probability that spreading occurs at least on one of the two independent paths. The relative error of Fig. 2 is finally quantified as

$$\begin{aligned} \epsilon (\ell , d \ell , t) = 1 - \frac{q_{1}(\ell , t)}{q_{2}(\ell , d \ell , t)}. \end{aligned}$$

Susceptible-infected-recovered model

For the SIR model, the calculation is a bit more cumbersome than for the SI model.

Suppose node s is initially in the infected state, and suppose that two independent paths of length \(\ell \) and \(\ell + d \ell \) connect node i to node s. The probability \(q_{2}(\ell , \ell + d \ell , t)\) that node i becomes infected at time t is given by the probability that the infection spreads along at least one of these paths. We remark that we know the analytical form of the probability \(q_{1}(\ell , t)\) that the infection spreads along a single path of length \(\ell \) in t time steps or less, see main text. However, this expression can be used to combine the contribution of the two independent paths only provided that the paths are dynamically independent. The latter condition is satisfied only when the infection performs at least one step towards the target along at least one of the paths.

Indicate with v the neighbor of node s along the path of length \(\ell \) towards i, and with w the neighbor of node s along the path of length \(\ell + d \ell \) towards i. The initial configuration at time \(t=0\) is such that \(\sigma _s^{(0)} = I\) and \(\sigma _{\forall j \ne s}^{(0)} = S\). At time \(t=1\), the states of nodes may change as the results of spreading and recovery events. The only nodes that can change their states are s, v and w. For example, we can go to the configuration \(\mathbf {\sigma }^{(1)} = (I, I, S, \ldots )\), i.e., such that \(\sigma _v^{(1)} = I\), \(\sigma _w^{(1)} = S\) and \(\sigma _s^{(1)} = I\), with probability \(\text {Prob.}[\mathbf {\sigma }^{(1)} = (\sigma _v^{(1)} = I, \sigma _w^{(1)} = S, \sigma _v^{(1)} = I, S, \ldots , S) ] = \beta (1-\beta ) (1- \gamma )\). After this first step, the spreading of the infection will happen independently along the two paths, thus we can write \(q_2[\ell , \ell + d \ell , t | \mathbf {\sigma }^{(1)} = (\sigma _v^{(1)} = I, \sigma _w^{(1)} = S, \sigma _v^{(1)} = I, S, \ldots , S) ] = 1 - [1-q_1(\ell -1, t-1)] [1-q_1(\ell + d \ell , t-1)]\). There are in total eight of such configurations. They are listed in Table 1. In general, we can write that

$$\begin{aligned} q_2 (\ell , \ell + d \ell , t) = \sum _{\mathbf {\sigma }} \, q_2(\ell , \ell + d \ell , t | \mathbf {\sigma }) \, \text {Prob.}(\mathbf {\sigma }), \end{aligned}$$
(A1)

where the sum runs over all eight configurations \(\mathbf {\sigma }\) of Table 1. The expressions of the probabilities appearing in Table 1 are then used to solve Eq. (A1) by iteration, starting from the initial condition \(q_2(\ell , \ell + d \ell , t =0 ) = 0\).

Appendix B: Joint probability of infection from a single source

Susceptible-infected model

Here, we illustrate how to compute the joint probability \(Q^{(t)}_{s\rightarrow i,j}\) that nodes i and j are infected at time t or earlier given that the source of spreading is node s. The computation still takes advantage of Eqs. (2) and (7), by properly accounting for the position of the source node s relatively to the positions of the target nodes i and j (see Fig. 12).

Fig. 12
figure12

Schematic illustration for the computation of the joint probability. The shaded areas highlight different parts of the network where the source node can be located, relatively to the positions of the target nodes i and j. Red areas denote regions where one of the two paths of spreading is dependent on the other. The blue shaded area indicate locations of the source node leading to path of spreading that are partially independent.

If node j is seating in between nodes s and j, then the infection can reach node i only passing first through node j. Thus, we can safely write that \(Q^{(t)}_{s \rightarrow i,j} = Q^{(t)}_{s \rightarrow i}\). The same exact argument leads us to write \(Q^{(t)}_{s \rightarrow i,j} = Q^{(t)}_{s \rightarrow j}\) if node i is seating in between nodes j and s.

A less straightforward computation is required when the source node s is connected to nodes i and j with partially independent paths. Part of the spreading path can be in common among the two trajectories, say up to node k as indicated in Fig. 12. However after this node, the two paths are dynamically independent one on the other and the two contributions are computed separately. Specifically, we can write

$$\begin{aligned} Q^{(t)}_{s\rightarrow i,j} = \sum _{r=0}^{t-\max (\ell _{ki},\ell _{kj})} P^{(r)}_{s \rightarrow k} \, Q^{(t-r)}_{k\rightarrow i} \, Q^{(t-r)}_{k\rightarrow j} \; , \end{aligned}$$
(B1)

where \(P^{(r)}_{s \rightarrow k}\) is the usual probability that the infection reached node k in exactly r stages of the dynamics. The sum on the r.h.s. of Eq. (B1) runs over all possible values of r compatible with the quantity that we want to estimate.

Susceptible-infected-recovered model

In the SIR model we can compute \(Q^{(t)}_{s\rightarrow i,j}\) using the very same method for SI with the only caveat to take into account Eq. (A1) and Table 1 whenever the source is between i and j or the two shortest paths become independent.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mazzilli, D., Radicchi, F. Combinatorial approach to spreading processes on networks. Eur. Phys. J. B 94, 15 (2021). https://doi.org/10.1140/epjb/s10051-020-00029-z

Download citation