Annals of Operations Research

, Volume 263, Issue 1–2, pp 551–564 | Cite as

Predicting social response to infectious disease outbreaks from internet-based news streams

  • Shannon M. Fast
  • Louis Kim
  • Emily L. Cohn
  • Sumiko R. Mekaru
  • John S. Brownstein
  • Natasha Markuzon
Data Mining and Analytics


Infectious disease outbreaks often have consequences beyond human health, including concern among the population, economic instability, and sometimes violence. A warning system capable of anticipating social disruptions resulting from disease outbreaks is urgently needed to help decision makers prepare appropriately. We designed a system that operates in near real-time to identify and predict social response. Over 150,000 Internet-based news articles related to outbreaks of 16 diseases in 72 countries and territories were provided by HealthMap. These articles were automatically tagged with indicators of the disease activity and population reaction. An anomaly detection algorithm was implemented on the population reaction indicators to identify periods of unusually severe social response. Then a model was developed to predict the probability of these periods of unusually severe social response occurring in the coming week, 2 and 3 weeks. This model exhibited remarkably strong performance for diseases with substantial media coverage. For country-disease pairs with a median of 20 or more articles per year, the onset of social response in the next week was correctly predicted over 60% of the time, and 87% of weeks were correctly predicted. Performance was weaker for diseases with little media coverage, and, for these diseases, the main utility of our system is in identifying social response when it occurs, rather than predicting when it will happen in the future. Overall, the developed near real-time prediction approach is a promising step toward developing predictive models to inform responders of the likely social consequences of disease spread.


Biosurveillance Social response Epidemics Anomaly detection Near real-time prediction 

Supplementary material

10479_2017_2480_MOESM1_ESM.pdf (219 kb)
Supplementary material 1 (pdf 219 KB)
10479_2017_2480_MOESM2_ESM.pdf (15 kb)
Supplementary material 2 (pdf 15 KB)
10479_2017_2480_MOESM3_ESM.pdf (446 kb)
Supplementary material 3 (pdf 446 KB)


  1. Batista, G. E. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20–29. doi: 10.1145/1007730.1007735.CrossRefGoogle Scholar
  2. Beck, N., Epstein, D., Jackman, S., & O’Halloran, S. (2001). Alternative models of dynamics in binary time-series-cross-section models: The example of state failure.
  3. Beck, N., Katz, J. N., & Tucker, R. (1998). Taking time seriously: Time-series-cross-section analysis with a binary dependent variable. American Journal of Political Science, 42(4), 1260–1288. doi: 10.2307/2991857.CrossRefGoogle Scholar
  4. Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.CrossRefGoogle Scholar
  5. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefGoogle Scholar
  6. Brownstein, J. S., Freifeld, C. C., Reis, B. Y., & Mandl, K. D. (2008). Surveillance Sans Frontières: Internet-based emerging infectious disease intelligence and the HealthMap project. PLoS Med, 5(7), e151. doi: 10.1371/journal.pmed.0050151.CrossRefGoogle Scholar
  7. Buckeridge, D. L., Burkom, H., Campbell, M., Hogan, W. R., & Moore, A. W. (2005). Algorithms for rapid outbreak detection: A research synthesis. Journal of Biomedical Informatics, 38(2), 99–113.CrossRefGoogle Scholar
  8. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1), 321–357.Google Scholar
  9. Cheng, C. (2004). To be paranoid is the standard? Panic responses to SARS outbreak in the Hong Kong Special Administrative Region. Asian Perspective, 28(1), 67–98.Google Scholar
  10. Collier, N., Doan, S., Kawazoe, A., Goodwin, R. M., Conway, M., Tateno, Y., et al. (2008). Biocaster: Detecting public health rumors with a web-based text mining system. Bioinformatics, 24(24), 2940–2941. doi: 10.1093/bioinformatics/btn534.CrossRefGoogle Scholar
  11. D’Orazio, V., & Yonamine, J. E. (2015). Kickoff to conflict: A sequence analysis of intra-state conflict-preceding event structures. PLoS ONE, 10(5), e0122,472. doi: 10.1371/journal.pone.0122472.CrossRefGoogle Scholar
  12. Doyle, A., Katz, G., Summers, K., Ackermann, C., Zavorin, I., Lim, Z., et al. (2014). Forecasting significant societal events using the EMBERS streaming predictive analytics system. Big Data, 2(4), 185–195. doi: 10.1089/big.2014.0046.CrossRefGoogle Scholar
  13. Fast, S. M., González, M. C., Wilson, J. M., & Markuzon, N. (2015). Modelling the propagation of social response during a disease outbreak. Journal of The Royal Society Interface, 12(104), 20141105. doi: 10.1098/rsif.2014.1105.
  14. Gayo-Avello, D. (2013). A meta-analysis of state-of-the-art electoral prediction from Twitter data. Social Science Computer Review, 31(6), 649–679.CrossRefGoogle Scholar
  15. Gerber, M. S. (2014). Predicting crime using Twitter and kernel density estimation. Decision Support Systems, 61, 115–125.CrossRefGoogle Scholar
  16. International Federation of Red Cross and Red Crescent Societies (2015) Red Cross Red Crescent denounces countinued violence against volunteers working to stop the spread of Ebola.
  17. Jackman, S. (2000). In and out of war and peace: Transitional models of international conflict.
  18. Kinsman, J. (2012). “A time of fear”: Local, national, and international responses to a large Ebola outbreak in Uganda. Globalization and Health, 8, 15–15.CrossRefGoogle Scholar
  19. Lau, J. T. F., Griffiths, S., Choi, K. C., & Tsui, H. Y. (2010). Avoidance behaviors and negative psychological responses in the general population in the initial stage of the H1N1 pandemic in Hong Kong. BMC Infectious Diseases, 10(1), 139. doi: 10.1186/1471-2334-10-139.
  20. Lozano, R., Naghavi, M., Foreman, K., Lim, S., Shibuya, K., Aboyans, V., et al. (2012). Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: A systematic analysis for the global burden of disease study 2010. Lancet, 380(9859), 2095–2128. doi: 10.1016/S0140-6736(12)61728-0.CrossRefGoogle Scholar
  21. Mascaro, S., Nicholso, A. E., & Korb, K. B. (2014). Anomaly detection in vessel tracks using Bayesian networks. International Journal of Approximate Reasoning, 55(1), 84–98.CrossRefGoogle Scholar
  22. McGrath, J. W. (1991). Biological impact of social disruption resulting from epidemic disease. American Journal of Physical Anthropology, 84(4), 407–419. doi: 10.1002/ajpa.1330840405.CrossRefGoogle Scholar
  23. Montgomery, D. C. (2009). Introduction to Statistical Quality Control (6th ed.). New Jersey: Wiley.Google Scholar
  24. Montgomery, J. M., Hollenbach, F. M., & Ward, M. D. (2012). Improving predictions using ensemble bayesian model averaging. Political Analysis, 20(3), 271–291.CrossRefGoogle Scholar
  25. Mykhalovskiy, E., & Weir, L. (2006). The global public health intelligence network and early warning outbreak detection: A Canadian contribution to global public health. Canadian Journal of Public Health/Revue Canadienne de SantéPublique, 97(1), 42–44.Google Scholar
  26. O’Brien, S. P. (2010). Crisis early warning and decision support: Contemporary approaches and thoughts on future research. International Studies Review, 12(1), 87–104. doi: 10.1111/j.1468-2486.2009.00914.x.CrossRefGoogle Scholar
  27. Rabiner, L. R., & Juang, B. H. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1), 4–16.CrossRefGoogle Scholar
  28. Racette, M. P., Smith, C. T., Cunningham, M. P., Heekin, T. A., Lemley, J. P., & Mathieu, R. S. (2014). Improving situational awareness for humanitarian logistics through predictive modeling. Systems and Information Engineering Design Symposium (SIEDS), 2014, 334–339.Google Scholar
  29. Rashidi, L., Hashemi, S., & Hamzeh, A. (2011). Anomaly detection in categorical datasets using Bayesian networks. Artificial Intelligence and Computational Intelligence, 7003, 610–619.CrossRefGoogle Scholar
  30. Roberts, S. W. (1959). Control chart tests based on geometric moving averages. Technometrics, 1(3), 239–250.CrossRefGoogle Scholar
  31. Schumaker, R. P., & Chen, H. (2009). Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Transactions on Information Systems (TOIS), 27(2), 1–19. doi: 10.1145/1462198.1462204.CrossRefGoogle Scholar
  32. Servi, L. (2013). Analyzing social media data having discontinuous underlying dynamics. Operations Research Letters, 41(6), 581–585.CrossRefGoogle Scholar
  33. Sherlaw, W., & Raude, J. (2013). Why the French did not choose to panic: A dynamic analysis of the public response to the influenza pandemic. Sociology of Health & Illness, 35(2), 332–344.CrossRefGoogle Scholar
  34. Truvé, S. (2013). Big data for the future: Unlocking the predictive power of the web.
  35. Vaisman, E., Fast, S. M., Cunha, M. G., Postlethwaite, T., Wilson, J. M., & Mekaru, S. R. (2014). Predicting negative social response to disease outbreaks using biosurveillance and news data. In: 2014 INFORMS Workshop on Data Mining and Analytics.Google Scholar
  36. Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics, 2(3), 408–421.CrossRefGoogle Scholar
  37. Wilson, K., & Brownstein, J. S. (2009). Early detection of disease outbreaks using the internet. Canadian Medical Association Journal, 180(8), 829–831. doi: 10.1503/cmaj.1090215.CrossRefGoogle Scholar
  38. Wong, W.K., Moore, A., Cooper, G., & Wagner, M. (2003). Bayesian network anomaly pattern detection for disease outbreaks. In Proceedings of the Twentieth International Conference on Machine Learning (pp. 808–815).Google Scholar
  39. Woodall, J. P. (2001). Global surveillance of emerging diseases: The ProMED-mail perspective. Cad Saude Publica, 17(Suppl), 147–154.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Shannon M. Fast
    • 1
  • Louis Kim
    • 1
  • Emily L. Cohn
    • 2
  • Sumiko R. Mekaru
    • 2
  • John S. Brownstein
    • 2
  • Natasha Markuzon
    • 1
  1. 1.Information and Decision Systems DivisionThe Charles Stark Draper LaboratoryCambridgeUSA
  2. 2.Boston Children’s Hospital, Harvard Medical SchoolBostonUSA

Personalised recommendations