Abstract
Seasonal influenza epidemics affect millions of people with respiratory illnesses and cause 250,000 to 500,000 deaths worldwide each year. Rapidly predicting the outbreak of epidemics leads to an earlier detection and control. In this study, we predicted an influenza-like illness (ILI) based on social media data derived from Twitter. Tweets and patients do not always have a linear correlation; therefore, we employed nonlinear methods including autoregressive with exogenous inputs (ARX), autoregressive-moving-average with exogenous inputs (ARMAX), nonlinear autoregressive exogenous (NARX), deep multilayer perceptron (DeepMLP), and a convolutional neural network (CNN). Two new features employed to significantly reduce the prediction errors are products of the tweets and Centers for Disease Control and Prevention (CDC) data and of the tweets and Google data. Furthermore, we introduced a new method based on entropy that decreased the errors as well as time complexity. Among the available methods and features, the best results were obtained with the newly developed features in the deep neural network methods and the entropy-based method that decreased the mean average error by up to 25%. The entropy method also reduced the time complexity. Applying the above-mentioned methods to the Twitter datasets from 2009 to 2010 and 2011–2014 revealed that the ILI outbreak can be predicted 2–4 weeks earlier than by the CDC.
Similar content being viewed by others
References
Duda K. Flu Deaths Per Year. about Heal. 2016. Available from: https://www.verywell.com/flu-deaths-per-year-770503. Accessed 2017.
Guo P, Zhang J, Wang L, Yang S, Luo G, Deng C, et al. Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model. Sci Rep. 2017;7:46469. Available from: http://www.nature.com/articles/srep46469. Accessed 3 March 2019.
Morens DM, Fauci AS. The 1918 influenza pandemic: insights for the 21st century. J Infect Dis. 2007;195:1018–28.
Paul MJ, Dredze M. A model for mining public health topics from Twitter. Health (Irvine Calif). 2012;11:16. Available from: http://www.cs.jhu.edu/~mpaul/files/2011.tech.twitter_health.pdf. Accessed 13 March 2019.
Chen L, Hossain KSMT, Butler P, Ramakrishnan N, Prakash BA. Flu Gone Viral: Syndromic Surveillance of Flu on Twitter Using Temporal Topic Models. Proc - IEEE Int Conf Data Mining, ICDM. 2015. p. 755–60.
Centers for Disease Control and Prevention. a Wkly. Influ. Surveill. Rep. 2009. Available from: http://www.cdc.gov/. Accessed 3 March 2019.
Yih WK, Teates KS, Abrams A, Kleinman K, Kulldorff M, Pinner R, et al. Telephone triage service data for detection of influenza-like illness. PLoS One. 2009;4.
Liu TY, Sanders JL, Tsui FC, Espino JU, Dato VM, Suyama J. Association of Over-The-Counter Pharmaceutical Sales with Influenza-Like-Illnesses to Patient Volume in an Urgent Care Setting. PLoS One. 2013;8.
Google Flu Trends. Available from: http://www.google.org/flutrends/us/data.txt. Accessed 2017.
Shin S-Y, Seo D-W, An J, Kwak H, Kim S-H, Gwack J, et al. High correlation of Middle East respiratory syndrome spread with Google search and Twitter trends in Korea. Sci Rep. 2016;6:32920. Available from: http://www.nature.com/articles/srep32920. Accessed 3 March 2019.
Milojević S. Revisiting the connection between Solar eruptions and primary headaches and migraines using Twitter. Sci Rep. 2016;6:39769. Available from: http://www.nature.com/articles/srep39769. Accessed 3 March 2019.
Tizzoni M, Sun K, Benusiglio D, Karsai M, Perra N. The Scaling of Human Contacts and Epidemic Processes in Metapopulation Networks. Sci Rep. 2015;5:15111. Available from: http://www.nature.com/articles/srep15111. Accessed 3 March 2019.
Posting a Tweet. 2017. Available from: https://support.twitter.com/articles/15367. Accessed 2017.
Twitter Developer Platform(API). 2014. Available from: https://developer.twitter.com. Accessed 3 March 2019.
Achrekar H, Lazarus R, Park WC. Predicting Flu Trends using Twitter Data. IEEE Infocom. 2011;702–7.
Lee K. Real-time disease surveillance using twitter data: demonstration on flu and cancer. KDD’13. 2013;1474–7.
Lamb A, Paul MJ, Dredze M. Separating fact from fear: tracking flu infections on Twitter. Conf North Am Chapter Assoc Comput Linguist Hum Lang Technol. 2013;
Sadilek A, Kautz H, Silenzio V. Modeling spread of disease from social interactions. Int AAAI Conf Weblogs Soc Media. 2012.
Culotta A. Towards detecting influenza epidemics by analyzing Twitter messages. Proc First Work Soc Media Anal - SOMA ‘10. New York, New York, USA: ACM Press; 2010;115–22. Available from: http://portal.acm.org/citation.cfm?doid=1964858.1964874. Accessed 3 March 2019.
Bodnar T, Salathé M. Validating Models for Disease Detection Using Twitter Regression on Tweet Count. Proc 22nd Int Conf World Wide Web companion. 2013;699–702.
Peng H-K, Zhu J, Piao D, Yan R, Zhang Y. Retweet modeling using conditional random fields. 2011 IEEE 11th Int Conf Data Min Work. IEEE; 2011;336–43. Available from: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6137399. Accessed 3 March 2019.
Achrekar H, Gandhe A, Lazarus R, Yu S, Liu B. Online Social Networks Flu Trend Tracker: A Novel Sensory Approach to Predict Flu Trends. Springer. 2013. p. 353–68. Available from: http://link.springer.com/10.1007/978-3-642-38256-7_24. Accessed 3 March 2019.
Caverlee J, Webb S, Tech G. A Large-Scale Study of MySpace : Observations and Implications for Online Social Networks. Proc from 2nd Int Conf Weblogs Soc Media AAAI. 2008;
Gauvin W, Ribeiro B, Towsley D, Liu B, Wang J. Measurement and gender-specific analysis of user publishing characteristics on MySpace. IEEE Netw. 2010;24:38–43.
Asur S, Huberman BA. Predicting the Future With Social Media. WI-IAT ‘10 Proc 2010 IEEE/WIC/ACM Int Conf Web Intell Intell Agent Technol. 2010;429–99.
Motoyama M, Voelker GM, Savage S. Measuring Online Service Availability Using Twitter. WOSN’10 Proc 3rd Conf Online Soc networks. 2010;13.
Mislove A, Lehmann S, Ahn Y-Y, Onnela J-P, Rosenquist JN. pulse of the nation us mood throughout the day inferred from twitter. 2013;
Heaivilin N, Gerbert B, Page J, Gibbs J. Public health surveillance of dental pain via Twitter. J Dent Res. 2011;90:1047–51.
Bosley JC, Zhao NW, Hill S, Shofer FS, Asch DA, Becker LB, et al. Decoding twitter: Surveillance and trends for cardiac arrest and resuscitation communication. Resuscitation. 2013;84:206–12.
Paul MJ, Dredze M. You Are What You Tweet: Analyzing Twitter for Public Health. Fifth Int AAAI Conf Weblogs Soc Media. 2011;265–72.
Gomide J, Veloso A, Meira W, Almeida V, Benevenuto F, Ferraz F, et al. Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. Proc 3rd Int Web Sci Conf - WebSci ‘11. New York, New York, USA, New York, USA: ACM Press; 2011. p. 1–8. Available from: http://dl.acm.org/citation.cfm?doid=2527031.2527049. Accessed 3 March 2019.
Signorini A, Segre AM, Polgreen PM. The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic. PLoS One. 2011.
Chew C, Eysenbach G. Pandemics in the age of Twitter: Content analysis of tweets during the 2009 H1N1 outbreak. PLoS One. 2010;5:361–7.
Lampos V, Cristianini N. Tracking the flu pandemic by monitoring the social web. 2nd Int Work Cogn Inf Process. Ieee; 2010;411–6.
Aramaki E. Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter. Proc 2011 Conf Empir Methods Nat Lang Process. 2011:1568–76.
Achrekar H. Social Network Enabled Flu Trends.
Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009;457:1012–4.
Flu Trackers. 2014. Available from: https://flutrackers.com/forum. Accessed 3 March 2019.
Flusurvey. 2014; Available from: https://flusurvey.org.uk. Accessed 3 March 2019.
Polgreen PM, Chen Y, Pennock DM, Nelson FD. Using internet searches for influenza surveillance. Clin Infect Dis. 2008;47:1443–8.
Hulth A, Rydevik G, Linde A. Web queries as a source for syndromic surveillance. PLoS One. 2009;4:e4378.
Paul MJ, Dredze M, Broniatowski D. Twitter Improves Influenza Forecasting. PLoS Curr. 2014; Available from: http://currents.plos.org/outbreaks/?p=39911. Accessed 3 March 2019.
Broniatowski DA, Paul MJ, Dredze M. National and Local Influenza Surveillance through Twitter: An Analysis of the 2012–2013 Influenza Epidemic. Preis T, editor. PLoS One. 2013;8:e83672. Available from: http://dx.plos.org/10.1371/journal.pone.0083672
Balakrishnan V. System identification: theory for the user (second edition). Automatica. 2002;38:375–8.
Ramesh K, Aziz N, Shukor A. R. S. Development of NARX Model for Distillation Column and Studies on Effect of Regressors. J Appl Sci. 2008;8:1214–20.
Cajueiro E, Kalid R, Schnitman L. Using NARX model with wavelet network to inferring the polished rod position. Int J Math Comput Simul. 2012;6.
Zhang QJ, Gupta KC, Devabhaktuni VK. Artificial neural networks for RF and microwave design - From theory to practice. IEEE Transactions on Microwave Theory and Techniques. 2003;51(4):1339–50. https://doi.org/10.1109/TMTT.2003.809179.
Sandoval L. Structure of a global network of financial companies based on transfer entropy. Entropy. 2014;16:4443–82. Available from: http://www.mdpi.com/1099-4300/16/8/4443/. Accessed 3 March 2019.
Kendall M. Rank correlation methods. London Griffin. 1970.
Acknowledgements
We would like to thank MJ. Paul, M. Dredze, D. Broniatowski for allowing us to use their collecting Twitter dataset.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
Soheila Molaei, Mohammad Khansari, Hadi Veisi and Mostafa Salehi declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Molaei, S., Khansari, M., Veisi, H. et al. Predicting the spread of influenza epidemics by analyzing twitter messages. Health Technol. 9, 517–532 (2019). https://doi.org/10.1007/s12553-019-00309-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12553-019-00309-4