Data-driven efficient network and surveillance-based immunization

  • Yao ZhangEmail author
  • Arvind Ramanathan
  • Anil Vullikanti
  • Laura Pullum
  • B. Aditya Prakash
Regular article


Given a contact network and coarse-grained diagnostic information such as electronic Healthcare Reimbursement Claims (eHRC) data, can we develop efficient intervention policies from data to control an epidemic? Immunization is an important problem in multiple areas, especially epidemiology and public health. However, most existing studies rely on assuming prior epidemiological models to develop pre-emptive strategies, which may fail to adapt to the change in new epidemiological patterns and the availability of rich data such as eHRC. In practice, disease spread is usually complicated, hence assuming an underlying model may deviate from true spreading patterns, leading to possibly inaccurate interventions. Additionally, the abundance of health care surveillance data (such as eHRC) makes it possible to study data-driven strategies without too many restrictive assumptions. Hence, such a data-driven intervention approach can help public-health experts take more practical decisions. In this paper, we take into account propagation log and contact networks for controlling propagation. Different from previous model-based approaches, our solutions are solely data driven in a sense that we develop immunization strategies directly from the network and eHRC without assuming classical epidemiological models. In particular, we formulate the novel and challenging data-driven immunization problem. To solve it, we first propose an efficient sampling approach to align surveillance data with contact networks, then develop an efficient algorithm with the provably approximate guarantee for immunization. Finally, we show the effectiveness and scalability of our methods via extensive experiments on multiple datasets, and conduct case studies on nation-wide real medical surveillance data.


Graph mining Social networks Immunization Diffusion 



This paper is based on work partially supported by the NSF (IIS-1353346, CAREER IIS-1750407), the NEH (HG-229283-15), ORNL, the Maryland Procurement Office (H98230-14-C-0127), and a Facebook faculty gift to BAP. AV is partially supported by the following grants: DTRA CNIMS Contract HDTRA1- 11-D-0016-0010, NSF BIG DATA Grant IIS-1633028 and NSF DIBBS Grant ACI-1443054, NSF EAGER SSDIM-1745207. Publication of this article was also funded by ORNL LDRD funding to AR. Oak Ridge National Laboratory (ORNL) (Grant No. Order 4000143330) is operated by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US Government purposes.


  1. 1.
    Medlock J, Galvani AP (2009) Optimizing influenza vaccine distribution. Science 325:1705–1708CrossRefGoogle Scholar
  2. 2.
    Halloran ME, Ferguson NM, Eubank S, Longini IM, Cummings DAT, Lewis B, Xu S, Fraser C, Vullikanti A, Germann TC, Wagener D, Beckman R, Kadau K, Barrett C, Macken CA, Burke DS, Cooley P (2008) Modeling targeted layered containment of an influenza pandemic in the United States. In: Proceedings of the National Academy of Sciences (PNAS), March 10 2008, pp 4639–4644Google Scholar
  3. 3.
    Tong H, Prakash BA, Tsourakakis CE, Eliassi-Rad T, Faloutsos C, Chau DH (2010) On the vulnerability of large graphs. In: ICDMGoogle Scholar
  4. 4.
    Zhang Y, Adiga A, Vullikanti A, Prakash BA (2015) Controlling propagation at group scale on networks. In: 2015 IEEE international conference on data mining (ICDM). IEEE, pp 619–628Google Scholar
  5. 5.
    Zhang Y, Prakash BA (2014) Dava: distributing vaccines over networks under prior information. In: Proceedings of the SIAM data mining conference, ser. SDM’14Google Scholar
  6. 6.
    Pellis L, Ball F, Bansal S, Eames K, House T, Isham V, Trapman P (2015) Eight challenges for network epidemic models. Epidemics 10:58–62CrossRefGoogle Scholar
  7. 7.
    Ramanathan A, Pullum LL, Hobson TC, Steed CA, Quinn SP, Chennubhotla CS, Valkova S (2015) Orbit: Oak Ridge biosurveillance toolkit for public health dynamics. BMC Bioinform 16(17):S4CrossRefGoogle Scholar
  8. 8.
    Ozmen O, Pullum LL, Ramanathan A, Nutaro JJ (2016) Augmenting epidemiological models with point-of-care diagnostics data. PLoS ONE 11(4):1–13 04CrossRefGoogle Scholar
  9. 9.
    Barrett CL, Beckman RJ, Khan M, Anil Kumar VS, Marathe MV, Stretz PE, Dutta T, Lewis B (2009) Generation and analysis of large synthetic social contact networks. In: Winter simulation conference, pp 1003–1014Google Scholar
  10. 10.
    Eubank S, Guclu H, Anil Kumar VS, Marathe MV, Srinivasan A, Toroczkai Z, Wang N (2004) Modelling disease outbreaks in realistic urban social networks. Nature 429(6988):180–184CrossRefGoogle Scholar
  11. 11.
    Prakash BA, Chakrabarti D, Faloutsos M, Valler N, Faloutsos C (2012) Threshold conditions for arbitrary cascade models on arbitrary networks. Knowl Inf Syst 33:549–575CrossRefGoogle Scholar
  12. 12.
    Tong H, Prakash BA, Eliassi-Rad T, Faloutsos M, Faloutsos C (2012) Gelling, and melting, large graphs by edge manipulation. In: Proceedings of CIKMGoogle Scholar
  13. 13.
    Anderson RM, May RM (1991) Infectious diseases of humans. Oxford University Press, OxfordGoogle Scholar
  14. 14.
    Karp RM (1972) Reducibility among combinatorial problems. In: Complexity of computer computations. Springer, pp 85–103Google Scholar
  15. 15.
    Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions—I. Math Program 14(1):265–294MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Palmer CR, Gibbons PB, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. Ser. KDD ’02. ACM, New York, NY, USA, pp 81–90Google Scholar
  17. 17.
    Flajolet P, Martin GN (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    McDaid AF, Murphy B, Friel N, Hurley N (2012) Clustering in networks with the collapsed stochastic block model. arXiv preprint arXiv:1203.3083
  19. 19.
    Kumar R, Novak J, Raghavan P, Tomkins A (2003) On the bursty evolution of blogspace. In: WWW’03, pp 568–576Google Scholar
  20. 20.
    Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: KDD’03Google Scholar
  21. 21.
    Goyal A, Bonchi F, Lakshmanan LV (2011) A data-based approach to social influence maximization. Proc VLDB Endow 5(1):73–84CrossRefGoogle Scholar
  22. 22.
    Hethcote HW (2000) The mathematics of infectious diseases. SIAM Rev 42:599–653MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Ganesh A, Massoulie L, Towsley D (2005) The effect of network topology on the spread of epidemics. In: Proceedings of INFOCOMGoogle Scholar
  24. 24.
    Cohen R, Havlin S, Ben Avraham D (2003) Efficient immunization strategies for computer networks and populations. Phys Rev Lett 91(24):247901CrossRefGoogle Scholar
  25. 25.
    Aspnes J, Chang K, Yampolskiy A (2005) Inoculation strategies for victims of viruses and the sum-of-squares partition problem. In: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, series SODA’05, pp 43–52Google Scholar
  26. 26.
    Van Mieghem P, Stevanović D, Kuipers F, Li C, Van De Bovenkamp R, Liu D, Wang H (2011) Decreasing the spectral radius of a graph by link removals. Phys Rev E 84(1):016101CrossRefGoogle Scholar
  27. 27.
    Prakash BA, Adamic LA, Iwashyna TJ, Tong H, Faloutsos C (2013) Fractional immunization in networks. In: Proceedings of SDM, pp 659–667Google Scholar
  28. 28.
    Shim E (2013) Optimal strategies of social distancing and vaccination against seasonal influenza. Math Biosci Eng 10(5):1615–1634MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Khalil EB, Dilkina B, Song L (2014) Scalable diffusion-aware optimization of network topology. In: KDD 2014. ACM, pp 1226–1235Google Scholar
  30. 30.
    Saha B, Gupta S, Phung D, Venkatesh S (2017) Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions. Knowl Inf Syst 53(1):179–206. CrossRefGoogle Scholar
  31. 31.
    Patwardhan A, Bilkovski R (2012) Comparison: flu prescription sales data from a retail pharmacy in the US with google flu trends and US ilinet (cdc) data as flu activity indicator. PloS ONE 7(8):e43611CrossRefGoogle Scholar
  32. 32.
    Gog JR, Ballesteros S, Viboud C, Simonsen L, Bjornstad ON, Shaman J, Chao DL, Khan F, Grenfell BT (2014) Spatial transmission of 2009 pandemic influenza in the us. PLoS Comput Biol 10(6):e1003635CrossRefGoogle Scholar
  33. 33.
    Malhotra K, Hobson TC, Valkova S, Pullum LL, Ramanathan A (2015) Sequential pattern mining of electronic healthcare reimbursement claims: experiences and challenges in uncovering how patients are treated by physicians. In: 2015 IEEE international conference on big data (big data). IEEE, pp 2670–2679Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceVirginia TechBlacksburgUSA
  2. 2.Department of Computer Science, Biocomplexity Institute and InitiativeUniversity of VirginiaCharlottesvilleUSA
  3. 3.Oak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations