Skip to main content

Advertisement

Log in

Social Media for Nowcasting Flu Activity: Spatio-Temporal Big Data Analysis

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Contagious diseases pose significant challenges to public healthcare systems all over the world. The rise in emerging contagious and infectious diseases has led to calls for the use of new techniques and technologies capable of detecting, tracking, mapping and managing behavioral patterns in such diseases. In this study, we used Big Data technologies to analyze two sets of flu (influenza) activity data: Twitter data were used to extract behavioral patterns from a location-based social network and to monitor flu outbreaks (and their locations) in the US, and Cerner HealthFacts data warehouse was used to track real-world clinical encounters. We expected that the integration (mashing) of social media and real-world clinical encounters could be a valuable enhancement to the existing surveillance systems. Our results verified that flu-related traffic on social media is closely related with actual flu outbreaks. However, rather than using simple Pearson correlation, which assumes a zero lag between the online and real-world activities, we used a multi-method data analytics approach to obtain the spatio-temporal cross-correlation between the two flu trends and to explain behavioral patterns during the flu season. We found that clinical flu encounters lag behind online posts. Also, we identified several public locations from which a majority of posts initiated. These findings can help health authorities develop more effective interventions (behavioral and/or otherwise) during the outbreaks to reduce the spread and impact, and to inform individuals about the locations they should avoid during those periods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.cdc.gov/flu/news/predict-flu-challenge.htm

References

  • Al-garadi, M. A., Khan, M. S., Varathan, K. D., Mujtaba, G., & Al-Kabsi, A. M. (2016). Using online social networks to track a pandemic: a systematic review. Journal of Biomedical Informatics, 62, 1–11.

    Article  Google Scholar 

  • Allen, C., Tsou, M.-H., Aslam, A., Nagel, A., & Gawron, J.-M. (2016). Applying GIS and machine learning methods to twitter data for multiscale surveillance of influenza. PLoS One, 11(7), e0157734.

    Article  Google Scholar 

  • Amorós, R., Conesa, D., Martinez-Beneito, M. A., & López-Quılez, A. (2015). Statistical methods for detecting the onset of influenza outbreaks: A review. REVSTAT–Statistical Journal, 13(1), 41–62.

    Google Scholar 

  • Anselin, L. (1989). What is special about spatial data? Alternative Perspectives on Spatial Data Analysis (89-4).

  • Anselin, L. (2013). Spatial econometrics: methods and models (Vol. 4). Berlin: Springer Science & Business Media.

    Google Scholar 

  • Aslam, A. A., Tsou, M.-H., Spitzberg, B. H., An, L., Gawron, J. M., Gupta, D. K., ... Yang, J.-A. (2014). The reliability of tweets as a supplementary method of seasonal influenza surveillance. Journal of Medical Internet Research, 16(11), e250.

  • Brillinger, D. R., Bryant, H. L., & Segundo, J. P. (1976). Identification of synaptic interactions. Biological Cybernetics, 22(4), 213–228.

    Article  Google Scholar 

  • Broniatowski, D. A., Paul, M. J., & Dredze, M. (2013). National and local influenza surveillance through twitter: an analysis of the 2012-2013 influenza epidemic. PLoS One, 8(12), e83672.

    Article  Google Scholar 

  • Chen, Y. (2015). A new methodology of spatial cross-correlation analysis. PLoS One, 10(5), e0126158.

    Article  Google Scholar 

  • Chen, Y.-D., Brown, S. A., Hu, P. J.-H., King, C.-C., & Chen, H. (2011). Managing emerging infectious diseases with information systems: reconceptualizing outbreak management through the lens of loose coupling. Information Systems Research, 22(3), 447–468.

    Article  Google Scholar 

  • Chorianopoulos, K., & Talvis, K. (2016). Flutrack.org: open-source and linked data for epidemiology. Health Informatics Journal, 22(4), 962–974.

    Article  Google Scholar 

  • Congdon, P. (2005). Bayesian models for categorical data. Hoboken: John Wiley & Sons.

    Book  Google Scholar 

  • Copeland, P., Romano, R., Zhang, T., Hecht, G., Zigmond, D., & Stefansen, C. (2013). Google disease trends: an update. Nature, 457, 1012–1014.

    Google Scholar 

  • Corberán-Vallet, A., & Lawson, A. B. (2014). Prospective analysis of infectious disease surveillance data using syndromic information. Statistical Methods in Medical Research, 23(6), 572–590.

    Article  Google Scholar 

  • Daley, D. J., & Vere-Jones, D. (2007). An introduction to the theory of point processes: Volume II: General theory and structure. Berlin: Springer Science & Business Media.

    Google Scholar 

  • Davidson, M. W., Haim, D. A., & Radin, J. M. (2015). Using networks to combine “big data” and traditional surveillance to improve influenza predictions. Scientific Reports, 5, 8154.

    Article  Google Scholar 

  • Dewan, S., & Ramaprasad, J. (2014). Social media, traditional media, and music sales. MIS Quarterly, 38(1), 101–122.

    Article  Google Scholar 

  • Dewan, S., & Ramprasad, J. (2009). Chicken and egg? Interplay between music blog buzz and album sales. PACIS 2009 proceedings, p. 87.

  • Duan, W., Gu, B., & Whinston, A. B. (2008). Do online reviews matter?—an empirical investigation of panel data. Decision Support Systems, 45(4), 1007–1016.

    Article  Google Scholar 

  • Dukic, V., Lopes, H. F., & Polson, N. G. (2012). Tracking epidemics with Google flu trends data and a state-space SEIR model. Journal of the American Statistical Association, 107(500), 1410–1426.

    Article  Google Scholar 

  • Fang, Z.-H., & Chen, C. C. (2016). A novel trend surveillance system using the information from web search engines. Decision Support Systems, 88, 85–97.

    Article  Google Scholar 

  • Fotheringham, A. S., & Wong, D. W. (1991). The modifiable areal unit problem in multivariate statistical analysis. Environment and Planning A, 23(7), 1025–1044.

    Article  Google Scholar 

  • Gesmann, M., & de Castillo, D. (2013) googleVis: Using the Google Chart Tools with R.

  • Gesmann, M., de Castillo, D., & Cheng, J. (2013). googleVis: Interface between R and the Google Chart Tools. R package version 0.4, 2.

  • Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012–1014.

    Article  Google Scholar 

  • Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A.-L. (2008). Understanding individual human mobility patterns. Nature, 453(7196), 779–782.

    Article  Google Scholar 

  • Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 37(3), 424–438.

  • Griffin, B. A., Jain, A. K., Davies-Cole, J., Glymph, C., Lum, G., Washington, S. C., & Stoto, M. A. (2009). Early detection of influenza outbreaks using the DC Department of Health's syndromic surveillance system. BMC Public Health, 9(1), 483.

    Article  Google Scholar 

  • Hawkes, A. G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1), 83–90. https://doi.org/10.2307/2334319.

    Article  Google Scholar 

  • Homans, G. C. (1958). Social behavior as exchange. American Journal of Sociology, 597–606.

  • Lamb, A., Paul, M. J., & Dredze, M. (2013). Separating Fact from Fear: Tracking Flu Infections on Twitter. Paper presented at the HLT-NAACL.

  • Lampos, V., Miller, A. C., Crossan, S., & Stefansen, C. (2015). Advances in nowcasting influenza-like illness rates using search query logs. Scientific Reports, 5, 12760.

  • Lara Yejas, O. D., Weiqiang, Z., & Pannu, A. (2014). Big R: Large-Scale Analytics on Hadoop Using R. Paper presented at the Big Data (BigData Congress), 2014 IEEE International Congress on.

  • Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203–1205.

  • Louis, C. S., & Zorlu, G. (2012). Can Twitter predict disease outbreaks? BMJ: British Medical Journal (Online), 344(7861), 24–25.

  • Lymperopoulos, I. N., & Ioannou, G. D. (2015). Online social contagion modeling through the dynamics of integrate-and-fire neurons. Information Sciences, 320, 26–61.

    Article  Google Scholar 

  • Ma, J., Zeng, D., & Chen, H. (2006). Spatial-temporal cross-correlation analysis: a new measure and a case study in infectious disease informatics. Paper presented at the International Conference on Intelligence and Security Informatics.

  • Magruder, S. (2003). Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease. Johns Hopkins APL Technical Digest, 24(4), 349–353.

    Google Scholar 

  • Milinovich, G. J., Williams, G. M., Clements, A. C. A., & Hu, W. (2014). Internet-based surveillance systems for monitoring emerging infectious diseases. The Lancet Infectious Diseases, 14(2), 160–168. https://doi.org/10.1016/S1473-3099(13)70244-5.

    Article  Google Scholar 

  • Mohler, G. (2014). Marked point process hotspot maps for homicide and gun crime prediction in Chicago. International Journal of Forecasting, 30(3), 491–497.

    Article  Google Scholar 

  • Molinari, N.-A. M., Ortega-Sanchez, I. R., Messonnier, M. L., Thompson, W. W., Wortley, P. M., Weintraub, E., & Bridges, C. B. (2007). The annual impact of seasonal influenza in the US: measuring disease burden and costs. Vaccine, 25(27), 5086–5096. https://doi.org/10.1016/j.vaccine.2007.03.046.

    Article  Google Scholar 

  • Moran, P. A. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1/2), 17-23.

  • Nguyen, B. V., Burstein, F., & Fisher, J. (2015). Improving service of online health information provision: a case of usage-driven design for health information portals. Information Systems Frontiers, 17(3), 493–511.

    Article  Google Scholar 

  • Nunes, B., Natário, I., & Lucília Carvalho, M. (2013). Nowcasting influenza epidemics using non-homogeneous hidden Markov models. Statistics in Medicine, 32(15), 2643–2660.

    Article  Google Scholar 

  • Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, 83(401), 9–27.

    Article  Google Scholar 

  • Oliver, P., Marwell, G., & Teixeira, R. (1985). A theory of the critical mass. I. Interdependence, group heterogeneity, and the production of collective action. American Journal of Sociology, 91(3), 522-556.

  • O'Sullivan, D., & Unwin, D. (2014). Geographic information analysis. Hoboken: John Wiley & Sons.

    Google Scholar 

  • Pagoto, S., Waring, M. E., May, C. N., Ding, E. Y., Kunz, W. H., Hayes, R., & Oleski, J. L. (2016). Adapting behavioral interventions for social media delivery. Journal of medical Internet research, 18(1), e24. https://doi.org/10.2196/jmir.5086.

  • Patwardhan, A., & Bilkovski, R. (2012). Comparison: flu prescription sales data from a retail pharmacy in the US with Google flu trends and US ILINet (CDC) data as flu activity indicator. PLoS One, 7(8), e43611.

    Article  Google Scholar 

  • Pick, J. B., Sarkar, A., & Johnson, J. (2015). United States digital divide: state level analysis of spatial clustering and multivariate determinants of ICT utilization. Socio-Economic Planning Sciences, 49, 16–32.

    Article  Google Scholar 

  • Prati, G., Pietrantoni, L., & Zani, B. (2011). A social-cognitive model of pandemic influenza H1N1 risk perception and recommended behaviors in Italy. Risk Analysis, 31(4), 645–656.

    Article  Google Scholar 

  • Richards, C. L., Iademarco, M. F., & Anderson, T. C. (2014). A new strategy for public health surveillance at CDC: improving national surveillance activities and outcomes. Public Health Reports, 129(6), 472–476.

    Article  Google Scholar 

  • Rubin-Delanchy, P., & Heard, N. A. (2014). A test for dependence between two point processes on the real line. arXiv preprint arXiv:1408.3845.

  • Rudra, K., Sharma, A., Ganguly, N., & Imran, M. (2018). Classifying and summarizing information from microblogs during epidemics. Information Systems Frontiers, 1-16. https://doi.org/10.1007/s10796-018-9844-9.

  • Sane, J., & Edelstein, M. (2015) Overcoming barriers to data sharing in public health. A global perspective. London: Chatham House.

  • Santillana, M., Nguyen, A. T., Dredze, M., Paul, M. J., Nsoesie, E. O., & Brownstein, J. S. (2015). Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Computational Biology, 11(10), e1004513.

    Article  Google Scholar 

  • Santillana, M., Nguyen, A. T., Louie, T., Zink, A., Gray, J., Sung, I., & Brownstein, J. S. (2016). Cloud-based Electronic Health Records for Real-time, Region-specific Influenza Surveillance. Scientific Reports, 6, 25732.

  • Sebastiani, P., Mandl, K. D., Szolovits, P., Kohane, I. S., & Ramoni, M. F. (2006). A Bayesian dynamic model for influenza surveillance. Statistics in Medicine, 25(11), 1803–1816.

    Article  Google Scholar 

  • Shi, Z., Rui, H., & Whinston, A. B. (2014). Content sharing in a social broadcasting environment: evidence from twitter. MIS Quarterly, 38(1), 123–142. https://doi.org/10.25300/misq/2014/38.1.06.

    Article  Google Scholar 

  • Signorini, A., Segre, A. M., & Polgreen, P. M. (2011). The use of twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PLoS One, 6(5), e19467.

    Article  Google Scholar 

  • Simonsen, L., Gog, J. R., Olson, D., & Viboud, C. (2016). Infectious disease surveillance in the big data era: towards faster and locally relevant systems. The Journal of Infectious Diseases, 214(suppl_4), S380–S385.

    Article  Google Scholar 

  • Talvis, K., Chorianopoulos, K., & Kermanidis, K. L. (2014). Real-time monitoring of flu epidemics through linguistic and statistical analysis of Twitter messages. Paper presented at the Semantic and Social Media Adaptation and Personalization (SMAP), 2014 9th International Workshop on.

  • Toole, J. L., Eagle, N., & Plotkin, J. B. (2011). Spatiotemporal correlations in criminal offense records. ACM Transactions on Intelligent Systems and Technology, 2(4), 1–18. https://doi.org/10.1145/1989734.1989742.

    Article  Google Scholar 

  • Tsou, M.-H. (2015). Research challenges and opportunities in mapping social media and big data. Cartography and Geographic Information Science, 42(sup1), 70–74.

    Article  Google Scholar 

  • Vandendijck, Y., Faes, C., & Hens, N. (2013). Eight years of the great influenza survey to monitor influenza-like illness in Flanders. PLoS One, 8(5), e64156.

    Article  Google Scholar 

  • von Alan, R. H., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28(1), 75–105.

    Article  Google Scholar 

  • Wagner, M., Lampos, V., Cox, I. J., & Pebody, R. (2018). The added value of online user-generated content in traditional methods for influenza surveillance. Scientific Reports, 8(1), 13963. https://doi.org/10.1038/s41598-018-32029-6.

    Article  Google Scholar 

  • Wang, D.-H., Suo, Y.-Y., Yu, X.-W., & Lei, M. (2013). Price–volume cross-correlation analysis of CSI300 index futures. Physica A: Statistical Mechanics and its Applications, 392(5), 1172–1179.

    Article  Google Scholar 

  • Wilson, K., & Brownstein, J. S. (2009). Early detection of disease outbreaks using the internet. Canadian Medical Association Journal, 180(8), 829–831.

    Article  Google Scholar 

  • Young, S. D., Rivers, C., & Lewis, B. (2014). Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes. Preventive Medicine, 63, 112–115.

    Article  Google Scholar 

Download references

Acknowledgments

This study was conducted with the data provided by, and the support from, the Center for Health Systems Innovation (CHSI) at Oklahoma State University (OSU) and the Cerner Corporation. The contents of this work are solely the responsibility of the authors and do not necessarily represent the official views of CHSI, OSU or the Cerner Corporation. We also want to thank the anonymous reviewers and the associate editor for very thoughtful comments on the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amir Hassan Zadeh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hassan Zadeh, A., Zolbanin, H.M., Sharda, R. et al. Social Media for Nowcasting Flu Activity: Spatio-Temporal Big Data Analysis. Inf Syst Front 21, 743–760 (2019). https://doi.org/10.1007/s10796-018-9893-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-018-9893-0

Keywords

Navigation