The fastest spreader in SIS epidemics on networks

Identifying the fastest spreaders in epidemics on a network helps to ensure an efficient spreading. By ranking the average spreading time for different spreaders, we show that the fastest spreader may change with the effective infection rate of a SIS epidemic process, which means that the time-dependent influence of a node is usually strongly coupled to the dynamic process and the underlying network. With increasing effective infection rate, we illustrate that the fastest spreader changes from the node with the largest degree to the node with the shortest flooding time. (The flooding time is the minimum time needed to reach all other nodes if the process is reduced to a flooding process.) Furthermore, by taking the local topology around the spreader and the average flooding time into account, we propose the spreading efficiency as a metric to quantify the efficiency of a spreader and identify the fastest spreader, which is adaptive to different infection rates in general networks.


Introduction
Identifying the most influential initial spreaders in a network constitutes a basic endeavor in network science, which helps to optimize the utility of resources and to ensure an efficient diffusion [1]. Injecting information in the fastest spreaders results in the most efficient spreading performance. The knowledge of the fastest spreader can be applied in direct marking [2] or idea spreading [3], where the resources are limited to start the spreading with a small number of spreaders.
Many topological metrics have been proposed to measure the influence of nodes in networks [4], such as degree, betweenness, closeness [5], eigenvector centrality [6] and the square eigenvector component [7]. Kitsak et al. [8] suggest that coreness constitutes a better topological descriptor to identify influential spreaders in epidemics [8]. However, many nodes performing differently in a spreading process may have the same k-core value. Therefore, new metrics based on the existing centrality are proposed to improve the identification of the influential nodes by coreness [9,10]. Considering removing the nodes causing the biggest drop in the energy function, Morone and Makse [11] propose the metric of collective influence through optimal percolation, which performs well in locally tree-like networks. Van Mieghem et al. [12] propose that the best conduction node in a resistor network is the minimizer of the diagonal elements of the pseudoinverse matrix Q † of the weighted Laplacian matrix of the graph.
In the Susceptible-Infected-Removed (SIR) model [13], Sikić et al. [14] show that the ranking of nodal influences is sensitive to the spreading dynamics, which depends on the a e-mail: Z.He@tudelft.nl infection rate and the curing rate. Measured by the cumulative infection probabilities of nodes, the degree centrality can better identify influential spreaders when the spreading rate is very small. However, the eigenvector centrality performs better when the spreading rate is close to the epidemic threshold [15]. Holme [16] discovers similar results and proposes an exact method to identify the best spreaders for influence maximization (the expected outbreak size) in the SIR model, but the method is only tractable in small graphs. In the Susceptible-Infected-Susceptible (SIS) model, Qu et al. [17] unveil that the ranking of nodal metastable infection probability also changes with the effective infection rate.
The "influence" of the spreader in the SIS model is not well defined. In this paper, we confine ourselves to the spreading time T m (i), defined as the time [18] when the number of infected nodes in the metastable state is first reached, started with one initially infected node i. The spreading time of an epidemic process generally determines the preferred period to take immunization actions to eradicate the spreading [19]. We investigate the average spreading time E[T m (i)] to identify the fastest spreader in an SIS epidemic on a general network. This paper is organized as follows. Section 2 introduces the spreading time and shows that the average spreading time depends on the topological metrics in an ER random graph. Section 3 shows that the fastest spreader changes with the dynamic process in SIS epidemics. Further, we propose the spreading efficiency to identify the fastest spreader. We show the performance in four artificial and real networks in Section 4. Finally, we conclude our results in Section 5.

The spreading time in epidemics on networks
We concentrate on the Markovian SIS epidemics [20] on networks, where both the curing and infection processes are Poisson processes. In the SIS epidemics model on a network G with N nodes and L links, the ratio between the infection rate β and the curing rate δ is called the effective infection rate τ = β/δ. The SIS model features a phase transition [21] around the epidemic threshold τ c . Viruses with an effective infection rate τ above the epidemic threshold τ c can infect a sizable portion of the population and stay for a long time in the network. A firstorder mean-field approximation of the epidemic threshold τ (1) c = 1/λ 1 , where λ 1 is the spectral radius of the adjacency matrix A of the network G, was shown to be a lower bound for the epidemic threshold [20]. We denote by x = τ /τ (1) c the normalized effective infection rate. The spreading time T m (i) of the Markovian SIS process resembles a lognormal-like distribution with deep tails [18]. The average spreading time E[T m (i)] approximates the average hitting time when the average fraction y ∞ of infected nodes in the metastable state is reached. Physically, the spreading time T m (i) describes the spreading velocity in the early stage of the spreading process, which depends on the local topology around the initial spreader i. The analytic expression of the spreading time in a general graph is hard to derive in closed form [19]. Due to the limitation of the analytical methods, an event-driven simulator SSIS for the SIS spreading process based on the Gillespie algorithm is implemented to determine the spreading time [18].
A faster initial spreader speeds up the spreading in the outbreak period and leads to a shorter average spreading time, which measures the efficiency of the spreader. We can identify the fastest nodes by ranking the average spreading time. We first show the effect of the topological properties of the spreader i on the average spreading time E[T m (i)] in a SIS epidemics on an Erdős-Rényi (ER) random network. Figure 1 shows the normalized topological metrics of node i versus the average spreading time E[T m (i)], which demonstrates that the average spreading time E[T m (i)] depends on the topological properties of initial spreader i. Specifically, the degree and the closeness of the initial spreader seem to have a similar behavior as the average spreading time in the ER random graph, while the betweenness of the initial spreader has a weaker correlation with the average spreading time. The reciprocal of the diagonal element (Q † ii ) of the pseudoinverse matrix Q † also performs well in ranking the fastest spreaders and behaves similarly as the degree in the ER random graph [12]. Figure 1 illustrates that the nodes with the same coreness may occupy a large proportion of the network so that the fastest spreader cannot be identified well by their coreness.

The fastest spreader in SIS epidemics
In this section, we further investigate the fastest spreader in the SIS epidemics. The change of the fastest spreader with the effective infection rate τ is presented in an exemplified barbell-like graph. Then, we propose a new metric to identify the fastest spreader.

Change of the fastest spreader with τ in a barbell-like graph
We generate an asymmetric barbell-like graph G 20 where a path graph L 2 connects an ER random graph G 0.5 (10) and a star graph K 1,7 , as shown in Figure 2. The barbell-like The probability that the nodes is infected at the spreading time for different normalized effective infection rate x = τ /τc. Node 9 is the initially infected spreader. The darkness of the nodes represents the probability. The results is based on 10 5 realizations. Fig. 3. Illustration of the changing of the fastest spreader with different τ . The size of the nodes represents the degree, and the darker node represents the faster spreader. The orange node is the fastest initial spreader. graph helps us to trace the fastest spreader if the effective infection rate τ changes. Figure 2 illustrates the probability that the nodes is infected at the spreading time. Figure 2 shows that the infected nodes are usually localized around the initial spreader at the spreading time, e.g., the viruses seldom reach node 14 for a small normalized effective infection rate x = 4. Figure 3 exemplifies that the fastest spreader changes with the effective infection rate τ in G 20 . The fastest spreader changes dramatically from the highest degree node to the lowest degree node with increasing effective infection rate τ . Specifically, we observe three different cases in Figure 3. If the effective infection rate τ is relatively small, the fastest spreader tends to be located in the dense part (the ER random subgraph) of the network. With the increasing the effective infection rate τ , the fastest spreader transits to nodes with a larger closeness in the path subgraph. At last, the process approximates a flooding process if the effective infection rate τ is large enough. Since the average time to infect all nodes in the star subgraph is larger than that in the ER random subgraph, 1 the fastest spreader should be closer to the star subgraph.
In Figure 4, the crossings of the average spreading time E[T m (i)] with the effective infection rate τ for different initial spreaders demonstrate that not only the fastest spreader but also the ranking of spreaders is not fixed for different effective infection rates τ . Therefore, we conclude that the fastest initial spreader in SIS model, only inferred by its location in the underlying graph of the network, cannot be determined. Our finding implies that 1 The average time to infect all nodes [22] in an ER random graph Gp(N ) is estimated to be 1 The average time to infect all nodes in a star graph K 1,N from the center is estimated to be the maximum of N exponentially distributed random variables with mean 1/β, which approximates time-dependent "importance or centrality"of a node is usually strongly coupled to the dynamic process and the underlying graph itself.

A heuristic topological metric for the fastest spreader
In this section, we discuss the topological property of the fastest spreader throughout the increase of the effective infection rate τ , i.e., τ ↓ τ c , τ > τ c and very large τ .

Case: τ ↓ τ c
Invoking the infection probability vector V (t) = (v 1 (t), v 2 (t), . . . , v N (t)) T , we approximate the spreading dynamics in the early stage of the spreading [23] and obtain The average fraction y(t, τ ) of infected nodes at the spreading time t m with τ obeys that where u T = (1, 1, . . . , 1). If the effective infection rate τ = β δ approaches the first order mean-field approximation of the epidemic threshold τ (1) c = 1 λ1 , only a very small proportion y(t m , τ ) of nodes will be infected in the metastable state. The spreading time t m , defined as the the first hitting time when N y(t m , τ ) nodes are infected in the Markovian SIS process without extinction [18], is finite. Figure 4 also exemplifies that the average spreading time is relatively small if τ ↓ τ (1) c . The matrix (βA − δI)t m in (2) is dominated by the largest eigenvalue δ(τ λ 1 − 1)t m , which tends to be 0 if the effective infection rate τ ↓ τ (1) c (by Perron-Frobenius Theorem [5]). Simplified, invoking the degree vector d = Au and V (0) = e i , we arrive at Relation (3) exhibits that the degree of the spreader dominates the spreading time t m for the unaltered rates β, δ and their corresponding y(t m ) = y ∞ . This result is different from the result that the eigenvector of the adjacent matrix A belonging to the largest eigenvalue determines the infection probability vector in the metastable state [5]. We here exemplify an extreme case: if the effective infection rate τ approaches τ c , and there is only one infected node in the metastable stable, i.e., y(t m ) = 1 N , the spreading time t m equals the minimum time when any one of the neighbors of the spreader i is infected. Then, the average spreading time E[T m ] is the minimum of the d i exponential distributed random variables with a mean 1/β, where d i is the degree of the spreader i. Thus, the average spreading time follows E[T m ] = 1 βdi , which is determined by the degree of the initial spreader.

Case: increasing τ
We then investigate the case for the increasing effective infection rate τ . Inspired by the illustration in Section 3.1, we postulate that the fastest spreader depends on the local topology around itself, i.e., the number of nodes and the connectivity of nodes around the spreader. We first consider the number of nodes around the initial spreader and regard that the efficiency of the initial spreader is related to the expansion [5] of the subgraph centered at the spreader. Specifically, assuming that the hop count h is the farthest distance from the initial spreader i that the viruses can reach before the spreading time, the expansion of the subgraph is the number of nodes |C i (h)| within h hops from the initial spreader i.
We then consider the connectivity of the nodes around the initial spreader. An epidemic behaves like a continuous time Markov branching process in the early stage [24]. For a branching process, we obtain that the number of infected nodes follows N y(t) ≈ u T e βAt V (0) ≤ e βλ1t N y(0) (4) which implies that the lower bound of the time to infect N y(t) nodes around the initial spreader follows that t ≥ log(N y(t)) βλ1 . Inspired by (4), we propose λi(h) log |Ci(h)| as an indication of the connectivity of the local topology around the spreader i for a fixed infection rate β, where λ i (h) is the largest eigenvalue of the subgraph within h hops around the initial spreader i. A larger λi(h) log |Ci(h)| implies a higher connectivity that leads to a faster spreading in the local network within h hops.  Considering the above two factors including the expansion |C i (h)| of the subgraph and the connectivity indication λi(h) log |Ci(h)| within the subgraph, we propose the spreading efficiency as a new metric to measure the efficiency of the initial spreader in the SIS model. The spreading efficiency of node i is defined as In case that the sub-graph expansion |C i (h)| of the initial spreaders are the same, a larger sub-eigenvalue λ i (h) leads a higher spreading efficiency in the subgraph due to a higher connectivity of nodes. The hop count h describes the average farthest distance of the infected nodes from the spreader at the spreading time for the effective infection rate τ , which is difficult to be determined precisely in a general network. Morone and Makse [11] identify the influential spreaders by the Ball (subgraph) centered at the spreader, where the optimal radius of the Ball is 3 or 4.
The optimal hop h = f (τ ) in our method is more flexible, which is a function of the effective infection rate τ . We hereby proceed with an approximation. First, the average fraction of infected nodes y ∞ in the metastable state can be estimated by the NIMFA approach for a determined τ . The number N C of nodes in a branch process follows where H is the largest hop count from the root and µ = E[D] − 1 is the mean degree minus 1 in this graph [5]. In that case, we have the largest hop count Invoking the fact that a spreading process approximates a branching process in the early stage, we can estimate the hop count h in a sparse, large graph by

Case: large τ
With the increase of the effective infection rate τ and the average fraction y ∞ of infected nodes in the metastable state, the nodes that need relatively more time to be reached gradually dominate the spreading time. Thus, the fastest spreader could be closer to the sparser subgraph of the network. Finally, if the effective infection rate τ is large enough, the SIS process is reduced to be a flooding process [22]. The average flooding time E[T N (i)] of an initial spreader i is the average minimum time for the virus to reach all other nodes in a flooding process. Therefore, we could regard the reciprocal of the average flooding time determines the fastest spreader if τ is very large.
Assuming that λ 1 (0) = 1 and |C i (0)| = d i , the spreading efficiency in (5) with h < 1 follows the same rank as the degree d i . In summary, we simplify and propose the overall metric "spreading efficiency" to identify the fastest initial spreader in an SIS epidemics as where y * ∞ is a prescribed parameter indicating that the process approximates a flooding process if y ∞ > y * ∞ . We set y * ∞ = 0.8 in the paper for the simulation.  Figure 5 shows the Kendall rank correlation coefficient κ between the average spreading time and the above discussed metrics, including the degree, the spreading efficiency in (5) and the reciprocal of the average flooding time φ i via Monte-Carlo estimation. If the effective infection τ is close to the epidemics threshold τ c ≈ 0.17, the degree centrality could be a better metric. We then observe that the best hop count h increases with the effective infection rate τ , and the spreading efficiency E i with the proposed hop count h in (6) can lead to the maximum correlation coefficient κ in a wide range of τ . At last, the reciprocal of the average flooding time shows the advantage when τ is large enough.

Numerical results
We evaluate the performance by identifying the ranking of the fastest initial spreaders in four, artificial and real, networks with different sizes and topologies: co-appearances of characters in Les Misérables [25], small world citation network (SmallWCitation) [26], the artificial barbell network G 20 and Co-authorship network of scientists (Net-Science) [27]. Table 1 shows some properties of the giant component of the four networks including the number of nodes N , the number of links L, the diameter ρ, the largest eigenvalue λ 1 , the clustering coefficient C G , the Pearson degree correlation coefficient ρ D .
We extract the giant component of the above network and select 10 nodes randomly in each network. In each implementation, only one of the selected nodes is infected initially, and then the virus spreads in the network according to the Markovian SIS model. After obtaining the average spreading time via SSIS started from different initial spreaders, we compare the Kendall rank correlation coefficients κ between the average spreading time and some other metrics including degree, closeness, betweenness, coreness and the proposed spreading efficiency in (7). Physically, the identification of the fastest spreaders in a flooding process is a 1-center problem [28] in a graph, where the weights of links in the graph are exponentially distributed random variables with mean 1/β. Thus, we estimate the average flooding time E[T N (i)] by Monte-Carlo approach and the efficiency shortest path algorithm [29]. Figure 6 shows the performance of the several centrality metrics for ranking the fastest initial spreader in four networks. We observe that, in the networks with a small diameter (e.g., Les Misérables and SmallW citation network), the spreading efficiency performs similarly with the coreness, both of which are better than other centrality metrics. In addition, the spreading efficiency shows its advantage over the coreness if the effective infection rate τ is relatively large because the reciprocal of the average flooding time determines the fastest spreader in that case.
However, the degree and coreness show the vulnerability in the community networks with a large diameter (e.g., Barbell and NetScience network). Meanwhile, the closeness becomes a better metric, which considers the average length of the path between the spreader and all other nodes. Especially, in the Barbell G 20 , we observe the changing of the performance of the centrality metrics with the increasing effective infection rate τ . When τ is small, the degree and the coreness perform better, but the closeness and the betweenness become better if τ is large enough, which further convinces us that a single existing centrality metric fails to identify the fastest spreader in the SIS model. The results suggest that, in the real world, the viruses or information may spread more efficiency starting from the spreader with a large degree within the community for a small τ , but it is better to choose the spreader with a high closeness for a large τ .
In summary, we can observe that the proposed spreading efficiency performs better than the compared topological metrics in general, which is adaptive to different topologies and different dynamic process. We find that the accuracy of the spreading efficiency drops a little around the effective infection rate corresponding to the transition parameter y * ∞ . We also expect a better transition method and a better estimation of hop h = f (τ ) that can improve the performance.

Conclusion
We investigated the properties of the fastest initial spreader with the shortest average spreading time in the SIS model. We showed that the fastest spreader changes from the node with the largest degree to the node with the shortest flooding time for the increasing effective infection rate, which implies that the fastest spreader is coupled to not only the underlying graph but also the dynamic process.
By considering the expansion and the largest eigenvalue of the subgraph around the spreader, we proposed the spreading efficiency as a metric to rank the fastest spreaders. The spreading efficiency depends on the effective infection rate τ , and reduces to the reciprocal of the flooding time for a large τ . The simulation results on four networks show that the spreading efficiency can better rank the fastest spreaders than some existing topological metrics including degree, closeness, betweenness, and coreness, in different topologies and dynamic processes.