Social Media for Nowcasting Flu Activity: Spatio-Temporal Big Data Analysis

  • Amir Hassan ZadehEmail author
  • Hamed M. Zolbanin
  • Ramesh Sharda
  • Dursun Delen


Contagious diseases pose significant challenges to public healthcare systems all over the world. The rise in emerging contagious and infectious diseases has led to calls for the use of new techniques and technologies capable of detecting, tracking, mapping and managing behavioral patterns in such diseases. In this study, we used Big Data technologies to analyze two sets of flu (influenza) activity data: Twitter data were used to extract behavioral patterns from a location-based social network and to monitor flu outbreaks (and their locations) in the US, and Cerner HealthFacts data warehouse was used to track real-world clinical encounters. We expected that the integration (mashing) of social media and real-world clinical encounters could be a valuable enhancement to the existing surveillance systems. Our results verified that flu-related traffic on social media is closely related with actual flu outbreaks. However, rather than using simple Pearson correlation, which assumes a zero lag between the online and real-world activities, we used a multi-method data analytics approach to obtain the spatio-temporal cross-correlation between the two flu trends and to explain behavioral patterns during the flu season. We found that clinical flu encounters lag behind online posts. Also, we identified several public locations from which a majority of posts initiated. These findings can help health authorities develop more effective interventions (behavioral and/or otherwise) during the outbreaks to reduce the spread and impact, and to inform individuals about the locations they should avoid during those periods.


Business analytics Big data Public health Social media Behavioral analytics Location analytics 



This study was conducted with the data provided by, and the support from, the Center for Health Systems Innovation (CHSI) at Oklahoma State University (OSU) and the Cerner Corporation. The contents of this work are solely the responsibility of the authors and do not necessarily represent the official views of CHSI, OSU or the Cerner Corporation. We also want to thank the anonymous reviewers and the associate editor for very thoughtful comments on the paper.


  1. Al-garadi, M. A., Khan, M. S., Varathan, K. D., Mujtaba, G., & Al-Kabsi, A. M. (2016). Using online social networks to track a pandemic: a systematic review. Journal of Biomedical Informatics, 62, 1–11.CrossRefGoogle Scholar
  2. Allen, C., Tsou, M.-H., Aslam, A., Nagel, A., & Gawron, J.-M. (2016). Applying GIS and machine learning methods to twitter data for multiscale surveillance of influenza. PLoS One, 11(7), e0157734.CrossRefGoogle Scholar
  3. Amorós, R., Conesa, D., Martinez-Beneito, M. A., & López-Quılez, A. (2015). Statistical methods for detecting the onset of influenza outbreaks: A review. REVSTAT–Statistical Journal, 13(1), 41–62.Google Scholar
  4. Anselin, L. (1989). What is special about spatial data? Alternative Perspectives on Spatial Data Analysis (89-4).Google Scholar
  5. Anselin, L. (2013). Spatial econometrics: methods and models (Vol. 4). Berlin: Springer Science & Business Media.Google Scholar
  6. Aslam, A. A., Tsou, M.-H., Spitzberg, B. H., An, L., Gawron, J. M., Gupta, D. K., ... Yang, J.-A. (2014). The reliability of tweets as a supplementary method of seasonal influenza surveillance. Journal of Medical Internet Research, 16(11), e250.Google Scholar
  7. Brillinger, D. R., Bryant, H. L., & Segundo, J. P. (1976). Identification of synaptic interactions. Biological Cybernetics, 22(4), 213–228.CrossRefGoogle Scholar
  8. Broniatowski, D. A., Paul, M. J., & Dredze, M. (2013). National and local influenza surveillance through twitter: an analysis of the 2012-2013 influenza epidemic. PLoS One, 8(12), e83672.CrossRefGoogle Scholar
  9. Chen, Y. (2015). A new methodology of spatial cross-correlation analysis. PLoS One, 10(5), e0126158.CrossRefGoogle Scholar
  10. Chen, Y.-D., Brown, S. A., Hu, P. J.-H., King, C.-C., & Chen, H. (2011). Managing emerging infectious diseases with information systems: reconceptualizing outbreak management through the lens of loose coupling. Information Systems Research, 22(3), 447–468.CrossRefGoogle Scholar
  11. Chorianopoulos, K., & Talvis, K. (2016). open-source and linked data for epidemiology. Health Informatics Journal, 22(4), 962–974.CrossRefGoogle Scholar
  12. Congdon, P. (2005). Bayesian models for categorical data. Hoboken: John Wiley & Sons.CrossRefGoogle Scholar
  13. Copeland, P., Romano, R., Zhang, T., Hecht, G., Zigmond, D., & Stefansen, C. (2013). Google disease trends: an update. Nature, 457, 1012–1014.Google Scholar
  14. Corberán-Vallet, A., & Lawson, A. B. (2014). Prospective analysis of infectious disease surveillance data using syndromic information. Statistical Methods in Medical Research, 23(6), 572–590.CrossRefGoogle Scholar
  15. Daley, D. J., & Vere-Jones, D. (2007). An introduction to the theory of point processes: Volume II: General theory and structure. Berlin: Springer Science & Business Media.Google Scholar
  16. Davidson, M. W., Haim, D. A., & Radin, J. M. (2015). Using networks to combine “big data” and traditional surveillance to improve influenza predictions. Scientific Reports, 5, 8154.CrossRefGoogle Scholar
  17. Dewan, S., & Ramaprasad, J. (2014). Social media, traditional media, and music sales. MIS Quarterly, 38(1), 101–122.CrossRefGoogle Scholar
  18. Dewan, S., & Ramprasad, J. (2009). Chicken and egg? Interplay between music blog buzz and album sales. PACIS 2009 proceedings, p. 87.Google Scholar
  19. Duan, W., Gu, B., & Whinston, A. B. (2008). Do online reviews matter?—an empirical investigation of panel data. Decision Support Systems, 45(4), 1007–1016.CrossRefGoogle Scholar
  20. Dukic, V., Lopes, H. F., & Polson, N. G. (2012). Tracking epidemics with Google flu trends data and a state-space SEIR model. Journal of the American Statistical Association, 107(500), 1410–1426.CrossRefGoogle Scholar
  21. Fang, Z.-H., & Chen, C. C. (2016). A novel trend surveillance system using the information from web search engines. Decision Support Systems, 88, 85–97.CrossRefGoogle Scholar
  22. Fotheringham, A. S., & Wong, D. W. (1991). The modifiable areal unit problem in multivariate statistical analysis. Environment and Planning A, 23(7), 1025–1044.CrossRefGoogle Scholar
  23. Gesmann, M., & de Castillo, D. (2013) googleVis: Using the Google Chart Tools with R.Google Scholar
  24. Gesmann, M., de Castillo, D., & Cheng, J. (2013). googleVis: Interface between R and the Google Chart Tools. R package version 0.4, 2.Google Scholar
  25. Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012–1014.CrossRefGoogle Scholar
  26. Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A.-L. (2008). Understanding individual human mobility patterns. Nature, 453(7196), 779–782.CrossRefGoogle Scholar
  27. Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 37(3), 424–438.Google Scholar
  28. Griffin, B. A., Jain, A. K., Davies-Cole, J., Glymph, C., Lum, G., Washington, S. C., & Stoto, M. A. (2009). Early detection of influenza outbreaks using the DC Department of Health's syndromic surveillance system. BMC Public Health, 9(1), 483.CrossRefGoogle Scholar
  29. Hawkes, A. G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1), 83–90. Scholar
  30. Homans, G. C. (1958). Social behavior as exchange. American Journal of Sociology, 597–606.Google Scholar
  31. Lamb, A., Paul, M. J., & Dredze, M. (2013). Separating Fact from Fear: Tracking Flu Infections on Twitter. Paper presented at the HLT-NAACL.Google Scholar
  32. Lampos, V., Miller, A. C., Crossan, S., & Stefansen, C. (2015). Advances in nowcasting influenza-like illness rates using search query logs. Scientific Reports, 5, 12760.Google Scholar
  33. Lara Yejas, O. D., Weiqiang, Z., & Pannu, A. (2014). Big R: Large-Scale Analytics on Hadoop Using R. Paper presented at the Big Data (BigData Congress), 2014 IEEE International Congress on.Google Scholar
  34. Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203–1205.Google Scholar
  35. Louis, C. S., & Zorlu, G. (2012). Can Twitter predict disease outbreaks? BMJ: British Medical Journal (Online), 344(7861), 24–25.Google Scholar
  36. Lymperopoulos, I. N., & Ioannou, G. D. (2015). Online social contagion modeling through the dynamics of integrate-and-fire neurons. Information Sciences, 320, 26–61.CrossRefGoogle Scholar
  37. Ma, J., Zeng, D., & Chen, H. (2006). Spatial-temporal cross-correlation analysis: a new measure and a case study in infectious disease informatics. Paper presented at the International Conference on Intelligence and Security Informatics.Google Scholar
  38. Magruder, S. (2003). Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease. Johns Hopkins APL Technical Digest, 24(4), 349–353.Google Scholar
  39. Milinovich, G. J., Williams, G. M., Clements, A. C. A., & Hu, W. (2014). Internet-based surveillance systems for monitoring emerging infectious diseases. The Lancet Infectious Diseases, 14(2), 160–168. Scholar
  40. Mohler, G. (2014). Marked point process hotspot maps for homicide and gun crime prediction in Chicago. International Journal of Forecasting, 30(3), 491–497.CrossRefGoogle Scholar
  41. Molinari, N.-A. M., Ortega-Sanchez, I. R., Messonnier, M. L., Thompson, W. W., Wortley, P. M., Weintraub, E., & Bridges, C. B. (2007). The annual impact of seasonal influenza in the US: measuring disease burden and costs. Vaccine, 25(27), 5086–5096. Scholar
  42. Moran, P. A. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1/2), 17-23.Google Scholar
  43. Nguyen, B. V., Burstein, F., & Fisher, J. (2015). Improving service of online health information provision: a case of usage-driven design for health information portals. Information Systems Frontiers, 17(3), 493–511.CrossRefGoogle Scholar
  44. Nunes, B., Natário, I., & Lucília Carvalho, M. (2013). Nowcasting influenza epidemics using non-homogeneous hidden Markov models. Statistics in Medicine, 32(15), 2643–2660.CrossRefGoogle Scholar
  45. Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, 83(401), 9–27.CrossRefGoogle Scholar
  46. Oliver, P., Marwell, G., & Teixeira, R. (1985). A theory of the critical mass. I. Interdependence, group heterogeneity, and the production of collective action. American Journal of Sociology, 91(3), 522-556.Google Scholar
  47. O'Sullivan, D., & Unwin, D. (2014). Geographic information analysis. Hoboken: John Wiley & Sons.Google Scholar
  48. Pagoto, S., Waring, M. E., May, C. N., Ding, E. Y., Kunz, W. H., Hayes, R., & Oleski, J. L. (2016). Adapting behavioral interventions for social media delivery. Journal of medical Internet research, 18(1), e24.
  49. Patwardhan, A., & Bilkovski, R. (2012). Comparison: flu prescription sales data from a retail pharmacy in the US with Google flu trends and US ILINet (CDC) data as flu activity indicator. PLoS One, 7(8), e43611.CrossRefGoogle Scholar
  50. Pick, J. B., Sarkar, A., & Johnson, J. (2015). United States digital divide: state level analysis of spatial clustering and multivariate determinants of ICT utilization. Socio-Economic Planning Sciences, 49, 16–32.CrossRefGoogle Scholar
  51. Prati, G., Pietrantoni, L., & Zani, B. (2011). A social-cognitive model of pandemic influenza H1N1 risk perception and recommended behaviors in Italy. Risk Analysis, 31(4), 645–656.CrossRefGoogle Scholar
  52. Richards, C. L., Iademarco, M. F., & Anderson, T. C. (2014). A new strategy for public health surveillance at CDC: improving national surveillance activities and outcomes. Public Health Reports, 129(6), 472–476.CrossRefGoogle Scholar
  53. Rubin-Delanchy, P., & Heard, N. A. (2014). A test for dependence between two point processes on the real line. arXiv preprint arXiv:1408.3845.Google Scholar
  54. Rudra, K., Sharma, A., Ganguly, N., & Imran, M. (2018). Classifying and summarizing information from microblogs during epidemics. Information Systems Frontiers, 1-16.
  55. Sane, J., & Edelstein, M. (2015) Overcoming barriers to data sharing in public health. A global perspective. London: Chatham House.Google Scholar
  56. Santillana, M., Nguyen, A. T., Dredze, M., Paul, M. J., Nsoesie, E. O., & Brownstein, J. S. (2015). Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Computational Biology, 11(10), e1004513.CrossRefGoogle Scholar
  57. Santillana, M., Nguyen, A. T., Louie, T., Zink, A., Gray, J., Sung, I., & Brownstein, J. S. (2016). Cloud-based Electronic Health Records for Real-time, Region-specific Influenza Surveillance. Scientific Reports, 6, 25732.Google Scholar
  58. Sebastiani, P., Mandl, K. D., Szolovits, P., Kohane, I. S., & Ramoni, M. F. (2006). A Bayesian dynamic model for influenza surveillance. Statistics in Medicine, 25(11), 1803–1816.CrossRefGoogle Scholar
  59. Shi, Z., Rui, H., & Whinston, A. B. (2014). Content sharing in a social broadcasting environment: evidence from twitter. MIS Quarterly, 38(1), 123–142. Scholar
  60. Signorini, A., Segre, A. M., & Polgreen, P. M. (2011). The use of twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PLoS One, 6(5), e19467.CrossRefGoogle Scholar
  61. Simonsen, L., Gog, J. R., Olson, D., & Viboud, C. (2016). Infectious disease surveillance in the big data era: towards faster and locally relevant systems. The Journal of Infectious Diseases, 214(suppl_4), S380–S385.CrossRefGoogle Scholar
  62. Talvis, K., Chorianopoulos, K., & Kermanidis, K. L. (2014). Real-time monitoring of flu epidemics through linguistic and statistical analysis of Twitter messages. Paper presented at the Semantic and Social Media Adaptation and Personalization (SMAP), 2014 9th International Workshop on.Google Scholar
  63. Toole, J. L., Eagle, N., & Plotkin, J. B. (2011). Spatiotemporal correlations in criminal offense records. ACM Transactions on Intelligent Systems and Technology, 2(4), 1–18. Scholar
  64. Tsou, M.-H. (2015). Research challenges and opportunities in mapping social media and big data. Cartography and Geographic Information Science, 42(sup1), 70–74.CrossRefGoogle Scholar
  65. Vandendijck, Y., Faes, C., & Hens, N. (2013). Eight years of the great influenza survey to monitor influenza-like illness in Flanders. PLoS One, 8(5), e64156.CrossRefGoogle Scholar
  66. von Alan, R. H., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28(1), 75–105.CrossRefGoogle Scholar
  67. Wagner, M., Lampos, V., Cox, I. J., & Pebody, R. (2018). The added value of online user-generated content in traditional methods for influenza surveillance. Scientific Reports, 8(1), 13963. Scholar
  68. Wang, D.-H., Suo, Y.-Y., Yu, X.-W., & Lei, M. (2013). Price–volume cross-correlation analysis of CSI300 index futures. Physica A: Statistical Mechanics and its Applications, 392(5), 1172–1179.CrossRefGoogle Scholar
  69. Wilson, K., & Brownstein, J. S. (2009). Early detection of disease outbreaks using the internet. Canadian Medical Association Journal, 180(8), 829–831.CrossRefGoogle Scholar
  70. Young, S. D., Rivers, C., & Lewis, B. (2014). Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes. Preventive Medicine, 63, 112–115.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Information Systems and Supply Chain Management, Raj Soin College of BusinessWright State UniversityDaytonUSA
  2. 2.Department of Information Systems and Operations Management, Miller College of BusinessBall State UniversityMuncieUSA
  3. 3.Department of Management Science and Information Systems, Spears School of BusinessOklahoma State UniversityStillwaterUSA

Personalised recommendations