### Abstract

Stochastic spreading models defined on complex network topologies are used to mimic the diffusion of diseases, information, and opinions in real-world systems. Existing theoretical approaches to the characterization of the models in terms of microscopic configurations rely on some approximation of independence among dynamical variables, thus introducing a systematic bias in the prediction of the ground-truth dynamics. Here, we develop a combinatorial framework based on the approximation that spreading may occur only along the shortest paths connecting pairs of nodes. The approximation overestimates dynamical correlations among node states and leads to biased predictions. Systematic bias is, however, pointing in the opposite direction of existing approximations. We show that the combination of the two biased approaches generates predictions of the ground-truth dynamics that are more accurate than the ones given by the two approximations if used in isolation. We further take advantage of the combinatorial approximation to characterize theoretical properties of some inference problems, and show that the reconstruction of microscopic configurations is very sensitive to both the place where and the time when partial knowledge of the system is acquired.

### Graphic Abstract

This is a preview of subscription content, access via your institution.

## Data Availability Statement

This manuscript has no associated data or the data will not be deposited. [Authors’ comment: Data of the real-world network considered in this paper can be found in Ref. [37].].

## References

- 1.
R. Pastor-Satorras, C. Castellano, P. Van Mieghem, A. Vespignani, Rev. Mod. Phys.

**87**, 925 (2015) - 2.
C .T. Butts, Science

**325**, 414 (2009) - 3.
M.O. Jackson,

*Social and Economic Networks*(Princeton University Press, Princeton, 2010) - 4.
A. Vespignani, Nat. Phys.

**8**, 32 (2012) - 5.
A.L. Lloyd, R.M. May, Science

**292**, 1316 (2001) - 6.
K.T. Eames, M.J. Keeling, Proc. Natl. Acad. Sci.

**99**, 13330 (2002) - 7.
L. Weng, F. Menczer, Y.-Y. Ahn, In:Eighth international AAAI conference on weblogs and social media, (2014)

- 8.
C. Castellano, S. Fortunato, V. Loreto, Rev. Mod. Phys.

**81**, 591 (2009) - 9.
Y. Moreno, M. Nekovee, A.F. Pacheco, Phys. Rev. E

**69**, 066130 (2004) - 10.
L. Dall’Asta, A. Baronchelli, A. Barrat, V. Loreto, Phys. Rev. E

**74**, 036105 (2006) - 11.
G. Brandi, R. Di Clemente, G. Cimini, Phys. A Stat. Mech. Appl.

**507**, 255 (2018) - 12.
I. Dobson, B.A. Carreras, D.E. Newman, J.M. Reynolds-Barredo, IEEE Trans. Power Syst.

**31**, 4831 (2016) - 13.
C.A. Hidalgo, B. Klinger, A.-L. Barabási, R. Hausmann, Science

**317**, 482 (2007) - 14.
T.P. Vogels, K. Rajan, L.F. Abbott, Annu. Rev. Neurosci.

**28**, 357 (2005) - 15.
Y. Moreno, R. Pastor-Satorras, A. Vespignani, Eur. Phys. J. B Condens. Matter Complex Syst.

**26**, 521 (2002) - 16.
J.L. Payne, K.D. Harris, P.S. Dodds, Phys. Rev. E

**84**, 016110 (2011) - 17.
C. Castellano, R. Pastor-Satorras, Phys. Rev. Lett.

**105**, 218701 (2010) - 18.
L. Buzna, K. Peters, D. Helbing, Phys. A Stat. Mech. Appl.

**363**, 132 (2006) - 19.
F. Altarelli, A. Braunstein, L. Dall’Asta, A. Lage-Castellanos, R. Zecchina, Phys. Rev. Lett.

**112**, 118701 (2014) - 20.
A.Y. Lokhov, M. Mézard, H. Ohta, L. Zdeborová, Phys. Rev. E

**90**, 012801 (2014) - 21.
F. Radicchi, C. Castellano, Phys. Rev. Lett.

**120**, 198301 (2018) - 22.
D. Kempe, J. Kleinberg, É. Tardos, in

*Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining*. (2003), pp. 137–146 - 23.
Y. Wang, D. Chakrabarti, C. Wang, C. Faloutsos, in Proceedings of 22nd International Symposium on Reliable Distributed Systems, 2003, (IEEE, 2003), pp. 25–34

- 24.
D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, C. Faloutsos, ACM Transactions on Information and System Security

**10**, 1 (2008). https://doi.org/10.1145/1284680.1284681. ISSN 1094-9224 - 25.
B. Karrer, M.E. Newman, Phys. Rev. E

**82**, 016101 (2010) - 26.
A.Y. Lokhov, M. Mézard, L. Zdeborová, Phys. Rev. E

**91**, 012811 (2015) - 27.
E. Cator, P. Van Mieghem, Phys. Rev. E

**89**, 052802 (2014) - 28.
J.P. Gleeson, Phys. Rev. X

**3**, 021004 (2013) - 29.
D. Brockmann, D. Helbing, Science

**342**, 1337 (2013) - 30.
M.E. Newman, Phys. Rev. E

**66**, 016128 (2002) - 31.
R.M. Anderson, B. Anderson, R.M. May,

*Infectious Diseases of Humans: Dynamics and Control*(Oxford University Press, Oxford, 1992) - 32.
J.P. Gleeson, Phys. Rev. Lett.

**107**, 068701 (2011) - 33.
K.E. Hamilton, L.P. Pryadko, Phys. Rev. Lett.

**113**, 208701 (2014) - 34.
B. Karrer, M.E. Newman, L. Zdeborová, Phys. Rev. Lett.

**113**, 208702 (2014) - 35.
F. Radicchi, Nat. Phys.

**11**, 597 (2015) - 36.
F. Radicchi, C. Castellano, Nat. Commun.

**6**, 1 (2015) - 37.
V. Colizza, R. Pastor-Satorras, A. Vespignani, Nat. Phys.

**3**, 276 (2007) - 38.
H. Prüfer, Arch. Math. Phys

**27**, 742 (1918) - 39.
S. Pemmaraju, S. Skiena,

*Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica®*(Cambridge University Press, Cambridge, 2003) - 40.
D. Shah, T. Zaman, in

*Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems*, (2010), pp. 203–214 - 41.
D. Shah, T. Zaman, IEEE Trans. Inf. Theory

**57**, 5163 (2011) - 42.
W. Luo, W.P. Tay, M. Leng, IEEE Trans. Signal Process.

**61**, 2850 (2013) - 43.
K. Zhu, L. Ying, IEEE/ACM Trans. Netw.

**24**, 408 (2014)

## Acknowledgements

DM and FR acknowledge support from the US Army Research Office (W911NF-16-1- 0104). FR acknowledges support from the National Science Foundation (CMMI-1552487).

## Author information

### Affiliations

### Corresponding author

## Appendices

### Appendix A: Magnitude of the error associated with the shortest-path combinatorial approximation

In Figs. 2 and 5, we considered an hypothetical setting where the generic node *i* is connected to the source node *s* by two independent paths of length \(\ell _{si}\) and \(\ell _{si} + d \ell \), with \(d \ell \ge 0\). The paths are independent in the sense that they do not share any node except for *s* and *i*. This fact allows us to easily compute the exact probabilities for the ground-truth scenario by simply combining the probabilities of the individual paths. The setting is useful to understand the magnitude of the error that we should expect to have when using SPCA in a non-tree network, where multiple paths among nodes may exist. For simplicity of notation, but without loss of generality, we will use \(\ell = \ell _{si}\) in the following description.

### Susceptible-infected model

For the SI model, the probability that the infection reaches a certain node along a path of length \(\ell \) in *t* time steps or less is given by

The previous expression is nothing more than a mere combination of Eqs. (2) and (7) of the main text. We just avoided to write an explicit dependence on the source and target nodes to simplify the expression. In presence of two independent paths, the probability that the infection reaches the target node is given by

thus equal to the probability that spreading occurs at least on one of the two independent paths. The relative error of Fig. 2 is finally quantified as

### Susceptible-infected-recovered model

For the SIR model, the calculation is a bit more cumbersome than for the SI model.

Suppose node *s* is initially in the infected state, and suppose that two independent paths of length \(\ell \) and \(\ell + d \ell \) connect node *i* to node *s*. The probability \(q_{2}(\ell , \ell + d \ell , t)\) that node *i* becomes infected at time *t* is given by the probability that the infection spreads along at least one of these paths. We remark that we know the analytical form of the probability \(q_{1}(\ell , t)\) that the infection spreads along a single path of length \(\ell \) in *t* time steps or less, see main text. However, this expression can be used to combine the contribution of the two independent paths only provided that the paths are dynamically independent. The latter condition is satisfied only when the infection performs at least one step towards the target along at least one of the paths.

Indicate with *v* the neighbor of node *s* along the path of length \(\ell \) towards *i*, and with *w* the neighbor of node *s* along the path of length \(\ell + d \ell \) towards *i*. The initial configuration at time \(t=0\) is such that \(\sigma _s^{(0)} = I\) and \(\sigma _{\forall j \ne s}^{(0)} = S\). At time \(t=1\), the states of nodes may change as the results of spreading and recovery events. The only nodes that can change their states are *s*, *v* and *w*. For example, we can go to the configuration \(\mathbf {\sigma }^{(1)} = (I, I, S, \ldots )\), i.e., such that \(\sigma _v^{(1)} = I\), \(\sigma _w^{(1)} = S\) and \(\sigma _s^{(1)} = I\), with probability \(\text {Prob.}[\mathbf {\sigma }^{(1)} = (\sigma _v^{(1)} = I, \sigma _w^{(1)} = S, \sigma _v^{(1)} = I, S, \ldots , S) ] = \beta (1-\beta ) (1- \gamma )\). After this first step, the spreading of the infection will happen independently along the two paths, thus we can write \(q_2[\ell , \ell + d \ell , t | \mathbf {\sigma }^{(1)} = (\sigma _v^{(1)} = I, \sigma _w^{(1)} = S, \sigma _v^{(1)} = I, S, \ldots , S) ] = 1 - [1-q_1(\ell -1, t-1)] [1-q_1(\ell + d \ell , t-1)]\). There are in total eight of such configurations. They are listed in Table 1. In general, we can write that

where the sum runs over all eight configurations \(\mathbf {\sigma }\) of Table 1. The expressions of the probabilities appearing in Table 1 are then used to solve Eq. (A1) by iteration, starting from the initial condition \(q_2(\ell , \ell + d \ell , t =0 ) = 0\).

### Appendix B: Joint probability of infection from a single source

### Susceptible-infected model

Here, we illustrate how to compute the joint probability \(Q^{(t)}_{s\rightarrow i,j}\) that nodes *i* and *j* are infected at time *t* or earlier given that the source of spreading is node *s*. The computation still takes advantage of Eqs. (2) and (7), by properly accounting for the position of the source node *s* relatively to the positions of the target nodes *i* and *j* (see Fig. 12).

If node *j* is seating in between nodes *s* and *j*, then the infection can reach node *i* only passing first through node *j*. Thus, we can safely write that \(Q^{(t)}_{s \rightarrow i,j} = Q^{(t)}_{s \rightarrow i}\). The same exact argument leads us to write \(Q^{(t)}_{s \rightarrow i,j} = Q^{(t)}_{s \rightarrow j}\) if node *i* is seating in between nodes *j* and *s*.

A less straightforward computation is required when the source node *s* is connected to nodes *i* and *j* with partially independent paths. Part of the spreading path can be in common among the two trajectories, say up to node *k* as indicated in Fig. 12. However after this node, the two paths are dynamically independent one on the other and the two contributions are computed separately. Specifically, we can write

where \(P^{(r)}_{s \rightarrow k}\) is the usual probability that the infection reached node *k* in exactly *r* stages of the dynamics. The sum on the r.h.s. of Eq. (B1) runs over all possible values of *r* compatible with the quantity that we want to estimate.

### Susceptible-infected-recovered model

In the SIR model we can compute \(Q^{(t)}_{s\rightarrow i,j}\) using the very same method for SI with the only caveat to take into account Eq. (A1) and Table 1 whenever the source is between *i* and *j* or the two shortest paths become independent.

## Rights and permissions

## About this article

### Cite this article

Mazzilli, D., Radicchi, F. Combinatorial approach to spreading processes on networks.
*Eur. Phys. J. B* **94, **15 (2021). https://doi.org/10.1140/epjb/s10051-020-00029-z

Received:

Accepted:

Published: