Data-driven efficient network and surveillance-based immunization


Given a contact network and coarse-grained diagnostic information such as electronic Healthcare Reimbursement Claims (eHRC) data, can we develop efficient intervention policies from data to control an epidemic? Immunization is an important problem in multiple areas, especially epidemiology and public health. However, most existing studies rely on assuming prior epidemiological models to develop pre-emptive strategies, which may fail to adapt to the change in new epidemiological patterns and the availability of rich data such as eHRC. In practice, disease spread is usually complicated, hence assuming an underlying model may deviate from true spreading patterns, leading to possibly inaccurate interventions. Additionally, the abundance of health care surveillance data (such as eHRC) makes it possible to study data-driven strategies without too many restrictive assumptions. Hence, such a data-driven intervention approach can help public-health experts take more practical decisions. In this paper, we take into account propagation log and contact networks for controlling propagation. Different from previous model-based approaches, our solutions are solely data driven in a sense that we develop immunization strategies directly from the network and eHRC without assuming classical epidemiological models. In particular, we formulate the novel and challenging data-driven immunization problem. To solve it, we first propose an efficient sampling approach to align surveillance data with contact networks, then develop an efficient algorithm with the provably approximate guarantee for immunization. Finally, we show the effectiveness and scalability of our methods via extensive experiments on multiple datasets, and conduct case studies on nation-wide real medical surveillance data.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    Code in Python:

  2. 2.

  3. 3.

    We extract vaccine reports based on ICD-9 codes V04.81. These are actual vaccine allocations as given in the eHRC data.


  1. 1.

    Medlock J, Galvani AP (2009) Optimizing influenza vaccine distribution. Science 325:1705–1708

    Article  Google Scholar 

  2. 2.

    Halloran ME, Ferguson NM, Eubank S, Longini IM, Cummings DAT, Lewis B, Xu S, Fraser C, Vullikanti A, Germann TC, Wagener D, Beckman R, Kadau K, Barrett C, Macken CA, Burke DS, Cooley P (2008) Modeling targeted layered containment of an influenza pandemic in the United States. In: Proceedings of the National Academy of Sciences (PNAS), March 10 2008, pp 4639–4644

    Article  Google Scholar 

  3. 3.

    Tong H, Prakash BA, Tsourakakis CE, Eliassi-Rad T, Faloutsos C, Chau DH (2010) On the vulnerability of large graphs. In: ICDM

  4. 4.

    Zhang Y, Adiga A, Vullikanti A, Prakash BA (2015) Controlling propagation at group scale on networks. In: 2015 IEEE international conference on data mining (ICDM). IEEE, pp 619–628

  5. 5.

    Zhang Y, Prakash BA (2014) Dava: distributing vaccines over networks under prior information. In: Proceedings of the SIAM data mining conference, ser. SDM’14

  6. 6.

    Pellis L, Ball F, Bansal S, Eames K, House T, Isham V, Trapman P (2015) Eight challenges for network epidemic models. Epidemics 10:58–62

    Article  Google Scholar 

  7. 7.

    Ramanathan A, Pullum LL, Hobson TC, Steed CA, Quinn SP, Chennubhotla CS, Valkova S (2015) Orbit: Oak Ridge biosurveillance toolkit for public health dynamics. BMC Bioinform 16(17):S4

    Article  Google Scholar 

  8. 8.

    Ozmen O, Pullum LL, Ramanathan A, Nutaro JJ (2016) Augmenting epidemiological models with point-of-care diagnostics data. PLoS ONE 11(4):1–13 04

    Article  Google Scholar 

  9. 9.

    Barrett CL, Beckman RJ, Khan M, Anil Kumar VS, Marathe MV, Stretz PE, Dutta T, Lewis B (2009) Generation and analysis of large synthetic social contact networks. In: Winter simulation conference, pp 1003–1014

  10. 10.

    Eubank S, Guclu H, Anil Kumar VS, Marathe MV, Srinivasan A, Toroczkai Z, Wang N (2004) Modelling disease outbreaks in realistic urban social networks. Nature 429(6988):180–184

    Article  Google Scholar 

  11. 11.

    Prakash BA, Chakrabarti D, Faloutsos M, Valler N, Faloutsos C (2012) Threshold conditions for arbitrary cascade models on arbitrary networks. Knowl Inf Syst 33:549–575

    Article  Google Scholar 

  12. 12.

    Tong H, Prakash BA, Eliassi-Rad T, Faloutsos M, Faloutsos C (2012) Gelling, and melting, large graphs by edge manipulation. In: Proceedings of CIKM

  13. 13.

    Anderson RM, May RM (1991) Infectious diseases of humans. Oxford University Press, Oxford

    Google Scholar 

  14. 14.

    Karp RM (1972) Reducibility among combinatorial problems. In: Complexity of computer computations. Springer, pp 85–103

  15. 15.

    Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions—I. Math Program 14(1):265–294

    MathSciNet  Article  Google Scholar 

  16. 16.

    Palmer CR, Gibbons PB, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. Ser. KDD ’02. ACM, New York, NY, USA, pp 81–90

  17. 17.

    Flajolet P, Martin GN (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209

    MathSciNet  Article  Google Scholar 

  18. 18.

    McDaid AF, Murphy B, Friel N, Hurley N (2012) Clustering in networks with the collapsed stochastic block model. arXiv preprint arXiv:1203.3083

  19. 19.

    Kumar R, Novak J, Raghavan P, Tomkins A (2003) On the bursty evolution of blogspace. In: WWW’03, pp 568–576

  20. 20.

    Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: KDD’03

  21. 21.

    Goyal A, Bonchi F, Lakshmanan LV (2011) A data-based approach to social influence maximization. Proc VLDB Endow 5(1):73–84

    Article  Google Scholar 

  22. 22.

    Hethcote HW (2000) The mathematics of infectious diseases. SIAM Rev 42:599–653

    MathSciNet  Article  Google Scholar 

  23. 23.

    Ganesh A, Massoulie L, Towsley D (2005) The effect of network topology on the spread of epidemics. In: Proceedings of INFOCOM

  24. 24.

    Cohen R, Havlin S, Ben Avraham D (2003) Efficient immunization strategies for computer networks and populations. Phys Rev Lett 91(24):247901

    Article  Google Scholar 

  25. 25.

    Aspnes J, Chang K, Yampolskiy A (2005) Inoculation strategies for victims of viruses and the sum-of-squares partition problem. In: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, series SODA’05, pp 43–52

  26. 26.

    Van Mieghem P, Stevanović D, Kuipers F, Li C, Van De Bovenkamp R, Liu D, Wang H (2011) Decreasing the spectral radius of a graph by link removals. Phys Rev E 84(1):016101

    Article  Google Scholar 

  27. 27.

    Prakash BA, Adamic LA, Iwashyna TJ, Tong H, Faloutsos C (2013) Fractional immunization in networks. In: Proceedings of SDM, pp 659–667

  28. 28.

    Shim E (2013) Optimal strategies of social distancing and vaccination against seasonal influenza. Math Biosci Eng 10(5):1615–1634

    MathSciNet  Article  Google Scholar 

  29. 29.

    Khalil EB, Dilkina B, Song L (2014) Scalable diffusion-aware optimization of network topology. In: KDD 2014. ACM, pp 1226–1235

  30. 30.

    Saha B, Gupta S, Phung D, Venkatesh S (2017) Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions. Knowl Inf Syst 53(1):179–206.

    Article  Google Scholar 

  31. 31.

    Patwardhan A, Bilkovski R (2012) Comparison: flu prescription sales data from a retail pharmacy in the US with google flu trends and US ilinet (cdc) data as flu activity indicator. PloS ONE 7(8):e43611

    Article  Google Scholar 

  32. 32.

    Gog JR, Ballesteros S, Viboud C, Simonsen L, Bjornstad ON, Shaman J, Chao DL, Khan F, Grenfell BT (2014) Spatial transmission of 2009 pandemic influenza in the us. PLoS Comput Biol 10(6):e1003635

    Article  Google Scholar 

  33. 33.

    Malhotra K, Hobson TC, Valkova S, Pullum LL, Ramanathan A (2015) Sequential pattern mining of electronic healthcare reimbursement claims: experiences and challenges in uncovering how patients are treated by physicians. In: 2015 IEEE international conference on big data (big data). IEEE, pp 2670–2679

Download references


This paper is based on work partially supported by the NSF (IIS-1353346, CAREER IIS-1750407), the NEH (HG-229283-15), ORNL, the Maryland Procurement Office (H98230-14-C-0127), and a Facebook faculty gift to BAP. AV is partially supported by the following grants: DTRA CNIMS Contract HDTRA1- 11-D-0016-0010, NSF BIG DATA Grant IIS-1633028 and NSF DIBBS Grant ACI-1443054, NSF EAGER SSDIM-1745207. Publication of this article was also funded by ORNL LDRD funding to AR. Oak Ridge National Laboratory (ORNL) (Grant No. Order 4000143330) is operated by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US Government purposes.

Author information



Corresponding author

Correspondence to Yao Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Proof of Lemma 4.4

When \(\alpha _{\mathbf {M}, \ell }\) is optimal, \(\alpha _{\mathbf {M}, \ell }=\alpha ^*_{\mathbf {M}, \ell }\).

Second, let \(\beta _{S_{\ell }}\) be the number of nodes without any parents. Maximizing \(\alpha _{\mathbf {M}, \ell }\) for Problem 3.1 is equivalent to minimizing \(\beta _{S_{\ell }}\) at location \(L_{\ell }\). Suppose \(\beta ^*_{S_{\ell }}\) is maximum number of nodes without any parents in the sample at location \(L_{\ell }\). It is obvious \(\beta ^*_{S_{\ell }} = CN(L_{\ell }, t_0)= |S_L|\). For each timestep \(t_i\), if \(CF_i (S_{\ell }) <CN(L_{\ell }, t_i)\), then \(CN(L_{\ell }, t_i)- CF_i(S_{\ell })\) is the number of nodes that cannot be mapped to the cascade generated by \(S_{\ell }\) at timestep \(t_i\). Hence, \(\theta (S_{\ell })\) is the number of nodes that cannot be mapped to the cascade generated by \(S_{\ell }\). If there exists any \(t_i\) that \(CF_i (S_{\ell }) <CN(L_{\ell }, t_i)\), we can always generate a cascade by mapping all \(CF_i (S_{\ell }) \) nodes into the cascade, then uniformly at random map other \(\theta (S_{\ell })\) nodes into cascade. This way, the number of nodes without any parents, \(\beta _{S_{\ell }} \le \beta ^*_{L_{\ell }}+ \theta (S_{\ell }) \) as \(\theta (S_{\ell })\) nodes can have connection within themselves. Since \(\beta _{S_{\ell }} + \alpha _{S_{\ell }} = \sum _{t_i} N(L_i, t_i)\), then \( \alpha _{\mathbf {M}, \ell } \ge \alpha ^*_{\mathbf {M}, \ell } - \theta (S_{\ell })\). Hence, \(\alpha ^*_{\mathbf {M}, \ell } - \theta (S_{\ell }) \le \alpha _{\mathbf {M}, \ell } \le \alpha ^*_{\mathbf {M}, \ell }\). When \(\alpha _{\mathbf {M}, \ell }=\alpha ^*_{\mathbf {M}, \ell }\), \(\theta (S_{\ell }) =0\). \(\square \)

Proof of Lemma 4.5

First, it is clear that \(g(\emptyset )=0\).

Second, to prove g(S) is monotonic increasing, we need to prove \(\theta (S)\) is a monotonic decreasing function. To do that, we first show that \(CF_{i}(S_{\ell })\) is monotone non-decreasing and submodular functions for any i and \(L_{\ell }\). First, let us define \(f_{i}(S_{\ell })\) as the number of nodes in \(L_{\ell }\) that \(S_{\ell }\) can reach in i-hops; hence, \(f_{i}(S_{\ell }) \le f_{i}(S_k)\) when \(S_{\ell }\subseteq S_k\). Second, given \(S_{\ell }\subseteq S_k\) and a node u, \(f_{i}(S_{\ell }\cup \{u\}) - f_{i}(S_{\ell })\) is marginal gain of a set union. Since the function in the set union problem is submodular [14], \(f_{i}(S_{\ell })\) is also submodular. Since \(f_{i}(S_{\ell })\) is monotone non-decreasing and submodular, the cumulative function \(CF_{i}(S_{\ell })\) is also non-decreasing and submodular.

Let \(X_i=[{\mathbb {1}}_{CF_i (A \cup B) <CN_i } (CN_i - CF_i(A \cup B))]\), \(Y_i=[{\mathbb {1}}_{CF_i (A) <CN_i } (CN_i - CF_i(A))] \). For any set A and B,

$$\begin{aligned} \theta (A \cup B) - \theta (A)= \sum _{i=1}^T (X_i - Y_i ) \end{aligned}$$

For any i, let us consider the following two cases:

(1) If \({\mathbb {1}}_{CF_i (A) <CN_i }=0\), it means \(CF_i (A) \ge CN_i\), then \(CF_i (A\cup B) \ge CN_i\); hence, \({\mathbb {1}}_{CF_i (A \cup B) <CN_i }=0\). We have \(X_i - Y_i=0\).

(2) If \({\mathbb {1}}_{CF_i (A) <CN_i }=1\), we have two cases:

(2a) \({\mathbb {1}}_{CF_i (A \cup B) <CN_i }=0\), then \(X_i -Y_i = -Y_i = - (CN_i - CF_i(A)) <0\);

(2b) \({\mathbb {1}}_{CF_i (A \cup B) <CN_i }=1\), then \( X_i - Y_i = (CN_i - CF_i(A \cup B))- (CN_i - CF_i(A)) = CF_i(A) - CF_i(A\cup B) \le 0\) (using Claim 2).

Putting together, we have \(\theta (A \cup B) \le \theta (A)\). Hence, \(\theta (S)\) is monotonic decreasing, and hence g(S) is monotonic increasing.

Third, to prove g(S) is submodular, For any location l, we need to prove that, given \(S \subseteq T\), \(g(S \cup \{a\}) - g(S) \ge g(T \cup \{a\}) - g(T)\), which is equivalent to \(\theta (S )-\theta (S\cup \{a\}) \le \theta (T)-\theta (T\cup \{a\}) \) (supermodularity). Let us write

\(\delta (S,a,i)= [{\mathbb {1}}_{CF_i (S \cup \{a\})<CN_i } (CN_i - CF_i(S \cup \{a\}))] -[{\mathbb {1}}_{CF_i (S) <CN_i } (CN_i - CF_i(S))]\), and

\(\delta (T,a,i) = [{\mathbb {1}}_{CF_i (T \cup \{a\})<CN_i } (CN_i - CF_i(T \cup \{a\}))] -[{\mathbb {1}}_{CF_i (T) <CN_i } (CN_i - CF_i(T))]\), then,

\(\theta (S) - \theta (S \cup \{a\}) = \sum _{i=1}^t \delta (S,a,i) \), and \(\theta (T) - \theta (T \cup \{a\}) = \sum _{i=1}^t \delta (T,a,i) \).

For any i, let us consider the following two cases:

(1) If \({\mathbb {1}}_{CF_i (S) <CN_i }=0\), then \({\mathbb {1}}_{CF_i (S \cup \{a\})<CN_i }={\mathbb {1}}_{CF_i (T)<CN_i }={\mathbb {1}}_{CF_i (T\cup \{a\}) <CN_i }=0\). Hence, \(\delta (S,a,i)=\delta (T,a,i)=0\).

(2) If \({\mathbb {1}}_{CF_i (S) <CN_i }=1\), we have the following cases:

(2a) If \({\mathbb {1}}_{CF_i (T) <CN_i }=0\), then we have \({\mathbb {1}}_{CF_i (T \cup \{a\}) <CN_i }=0\). Let us consider the value of \({\mathbb {1}}_{CF_i (S \cup \{a\}) <CN_i }\):

If \({\mathbb {1}}_{CF_i (S \cup \{a\}) <CN_i }=0\), then \(\delta (S,a,i) =(CN_i - CF_i(S \cup \{a\})) < 0 = \delta (T,a,i) \).

If \({\mathbb {1}}_{CF_i (S \cup \{a\}) <CN_i }=1\), then \(\delta (S,a,i) = CF_i (S) - CF_i(S \cup \{a\}) < 0 = \delta (T,a,i)\).

(2b) If \({\mathbb {1}}_{CF_i (T) <CN_i }=1\), let us consider the value of \({\mathbb {1}}_{CF_i (S \cup \{a\}) <CN_i }\):

If \({\mathbb {1}}_{CF_i (S \cup \{a\}) <CN_i }=0\), then \({\mathbb {1}}_{CF_i (T \cup \{a\}) <CN_i }=0\), and then \(\delta (S,a,i) = -(CN_i - CF_i(S)) \le -(CN_i - CF_i(T)) = \delta (T,a,i) \) (using Claim 2).

If \({\mathbb {1}}_{CF_i (S \cup \{a\}) <CN_i }=1\), then for \({\mathbb {1}}_{CF_i (T \cup \{a\}) <CN_i }\):

If \({\mathbb {1}}_{CF_i (T \cup \{a\}) <CN_i }=1\), then \(\delta (S,a,i)= CF_i(S) - CF_i(S \cup \{a\})) \le CF_i(T) - CF_i(T \cup \{a\})) =\delta (T, a,i)\) (using Claim 2 that \(CF_i(S)\) is a submodular function).

Otherwise, \({\mathbb {1}}_{CF_i (T \cup \{a\}) <CN_i }=0\), and then since we have \(CF_i (T \cup \{a\}) \ge CN_i\), \(\delta (S,a,i)= CF_i(S) - CF_i(S \cup \{a\})) \le CF_i(T) - CF_i(T \cup \{a\})) \le CF_i(T) - CN_i = \delta (T, a,i)\) (using Claim 2).

Putting all cases together, we have \(\theta (S) - \theta (S \cup \{a\}) \le \theta (T) - \theta (T \cup \{a\})\). Hence, \(g(S \cup \{a\}) - g(S) \ge g(T \cup \{a\}) - g(T)\).

g(S) is a submodular function. \(\square \)

Proof of Lemma 4.9

Since we uniformly randomly allocate \(\mathbf {x}\), \(\rho _{G, \mathbf {M}_i}(\mathbf {x})\) can be written as \(\rho _{G, \mathbf {M}_i}(\mathbf {x}) = \sum _S \Pr (S) r_{G, \mathbf {M}_i}(S)\), where S is a node set sampled from the random process of distributing \(\mathbf {x}\) (\(|S| = ||\mathbf {x}||_1\)), and \(r_{G, \mathbf {M}_i}(S)\) is the number of nodes \(SI_{\mathbf {M}_i}\) can reach after removing S.

Since \(\zeta _{G, \mathbf {M}_i}(\mathbf {x}) = \sum _{S} \Pr (S) C_{G, \mathbf {M}_i}(S)\) and \(\rho _{G, \mathbf {M}_i}(\mathbf {x}) = \sum _S \Pr (S) r_{G, \mathbf {M}_i}(S)\), we need to show that \( r_{G, \mathbf {M}_i}(S) \le C_{G, \mathbf {M}_i}(S)\). \( r_{G, \mathbf {M}_i}(S)\) is the number of nodes S can save in \(\mathbf {M}_i\), we can show that given any node u that \(SI_\mathbf {M}\) can save, the credit u given to \(SI_\mathbf {M}\) must be 1. This is because if we can save u, it means every path from \(SI_\mathbf {M}\) to u has been removed when S is removed. Hence, all nodes within the paths from \(SI_\mathbf {M}\) have been removed. These nodes are all nodes that propagate u’s credit to \(SI_\mathbf {M}\), so all credits of u can be contributed to \(C_{G, \mathbf {M}_i}(S)\). Hence, \(C_{G, \mathbf {M}_i}(S)\) is at least equal to \(r_{G, \mathbf {M}_i}(S)\). On the other hand, other nodes that S cannot save also make contributions to the credit of \(C_{G, \mathbf {M}_i}(S)\). Hence, \(C_{G, \mathbf {M}_i}(S) \ge r_{G, \mathbf {M}_i}(S)\), which leads to \(\rho _{G, \mathbf {M}_i}(\mathbf {x}) \le \zeta _{G, \mathbf {M}_i}(\mathbf {x}) \). \(\square \)

Proof of Lemma 4.11

We use a similar technique as in [4] given the properties of \(P_1\), \(P_2\) and \(P_3\) of \(\zeta _{G, \mathbf {M}_i} (\mathbf {x})\). For brevity, we write \(\zeta _{G, \mathbf {M}_i} (\mathbf {x})\) as \(\zeta (\mathbf {x})\).

First, we show that if \(\mathbf {y}=(y_i, \ldots ,y_n)^T\) where \(\sum _j y_j=m\), then \(\zeta (\mathbf {x}+ \mathbf {y}) - \zeta (\mathbf {x}) \le \sum _j y_j (\zeta (\mathbf {x}+\mathbf {e}_j)-\zeta (\mathbf {x}))\).

Let \(\mathbf {y}\) can be recursively obtained from a sequence \(\mathbf {e}^{(1)}, \ldots , \mathbf {e}^{(m)}\) (\(\mathbf {e}^{(i)} \in \{\mathbf {e}_1,\ldots ,\mathbf {e}_n\}\)) such that \(\mathbf {y}=\mathbf {y}^{(m)}=\mathbf {y}^{(m-1)}+\mathbf {e}^{(m)}\), \(\mathbf {y}^{(i)}=\mathbf {y}^{(i-1)}+\mathbf {e}^{(i)}\) (\(i \le m\)) and \(\mathbf {y}^{0}=\mathbf {0}\).

Obviously, \(\sum _{i=1}^m \mathbf {e}^{(i)}= \sum _j y_j \mathbf {e}_j =\mathbf {y}\). Then,

$$\begin{aligned}&\zeta (\mathbf {x}+ \mathbf {y}) -\zeta (\mathbf {x}) \\&\quad = \sum _{i=1}^m \zeta (\mathbf {x}+ \mathbf {y}^{(i)})-\zeta (\mathbf {x}+ \mathbf {y}^{(i-1)}) \\&\quad = \sum _{i=1}^m \zeta (\mathbf {x}+ \mathbf {y}^{(i-1)}+\mathbf {e}^{(i)})-\zeta (\mathbf {x}+ \mathbf {y}^{(i-1)}) \\&\quad \le \sum _{i=1}^m \zeta (\mathbf {x}+\mathbf {e}^{(i)})-\zeta (\mathbf {x}) \ \mathbf {(Diminishing\ Return)} \\&\quad = \sum _{j=1}^n y_j (\zeta (\mathbf {x}+\mathbf {e}_j)-\zeta (\mathbf {x})) \end{aligned}$$

Now, let us prove that ImmuNaiveGreedy gives a \((1-1/e)\)-approximate solution. Suppose \(\mathbf {x}\) is the solution from ImmuNaiveGreedy, and \(\mathbf {x}^*\) is the optimal solution. Clearly, we have \(\sum _{j}x_j=\sum _{j}x^*_j=m\). Let us define \(\mathbf {x}^{(i)}\) as the solution got from the ith iteration of the greedy algorithm; hence, \(\mathbf {x}=\mathbf {x}^{(m)}\). And \(\mathbf {x}^*\) can be represented as \(\sum _j x^*_j \mathbf {e}_j\). We have

$$\begin{aligned} \zeta (\mathbf {x}^*)&\le \zeta ( \mathbf {x}^*+\mathbf {x}^{(i)}) \\&= \zeta (\mathbf {x}^{(i)}) + ( \zeta ( \mathbf {x}^*+\mathbf {x}^{(i)}) - \zeta (\mathbf {x}^{(i)}) )\\&\le \zeta (\mathbf {x}^{(i)}) + \sum _j {x^*_j} (\zeta (\mathbf {x}^{(i)}+\mathbf {e}_j)- \zeta (\mathbf {x}^{(i)}) ) \\&\le \zeta (\mathbf {x}^{(i)}) + \sum _j {x^*_j} ( \zeta (\mathbf {x}^{(i+1)}) - \zeta (\mathbf {x}^{(i)})) \\&= \zeta (\mathbf {x}^{(i)}) +m( \zeta (\mathbf {x}^{(i+1)}) - \zeta (\mathbf {x}^{(i)})) \end{aligned}$$

Hence, \(\zeta (\mathbf {x}^{(i+1)}) \ge (1-\frac{1}{m})\zeta (\mathbf {x}^{(i)}) +\frac{1}{m} \zeta (\mathbf {x}^*)\). Recursively, we can get \(\zeta (\mathbf {x}^{(i)}) \ge (1-(1-\frac{1}{m})^i) \zeta (\mathbf {x}^*)\). Therefore, \(\zeta (\mathbf {x})=\zeta (\mathbf {x}^{(m)}) \ge (1-(1-\frac{1}{m})^m) \zeta (\mathbf {x}^*) \ge (1-1/e) \zeta (\mathbf {x}^*)\). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Ramanathan, A., Vullikanti, A. et al. Data-driven efficient network and surveillance-based immunization. Knowl Inf Syst 61, 1667–1693 (2019).

Download citation


  • Graph mining
  • Social networks
  • Immunization
  • Diffusion