Abstract.
Influenza and flu can be serious problems, and can lead to death, as hundred thousands of people die every year due to seasonal flu. An early warning may help to prevent the spread of flu in the population. This kind of warning can be achieved by using social media data and big data tools and techniques. In this paper, a MapReduce and Spark-based analytic framework (MRSAF) using Twitter data is presented for faster flu outbreak detection. Different analysis cases are implemented using Apache Spark, Hadoop Systems and Hadoop Eco Systems to predict flu trends in different locations using Twitter data. The data was collected using a developed crawler which works together with the Twitter API to stream and filter the tweets based on flu-related keywords. The crawler is also designed to pre-process and clean the unintended attributes of the retrieved tweets. The results of the proposed solution show a strong relationship with the weekly Center for Disease Control and Prevention (CDC) reports.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.H., Liu, B.: Predicting flu trends using twitter data. In: 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 702–707. IEEE, April 2011
Murthy, D., Gross, A., Longwell, S.: Twitter and e-health: a case study of visualizing cancer networks on twitter. In: 2011 International Conference on Information Society (i-Society), pp. 110–113, IEEE, June 2011
Nambisan, P., Luo, Z., Kapoor, A., Patrick, T.B., Cisler, R.: Social media, big data, and public health informatics: ruminating behavior of depression revealed through twitter. In: 2015 48th Hawaii International Conference on System Sciences (HICSS), pp. 2906–2913. IEEE, January 2015
Corley, C.D., Cook, D.J., Mikler, A.R., Singh, K.P.: Text and structural data mining of influenza mentions in web and social media. International Journal of Environmental Research and Public Health 7(2), 596–615 (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Mohammed, E.A., Far, B.H., Naugler, C.: Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. BioData Mining 7(22), 1–23 (2014)
Haryono, G. P., & Zhou, Y.: Profiling apache HIVE query from run time logs. In: 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 61–68, IEEE, January 2016
Verma, A., Mansuri, A.H., Jain, N.: Big data management processing with hadoop mapreduce and spark technology: A comparison. In: Symposium on Colossal Data Analysis and Networking (CDAN), pp. 1–4. IEEE, March 2016
Paul, M.J., Dredze, M.: A model for mining public health topics from Twitter. Health 11, 16–6 (2012)
Paul, M.J., Dredze, M.: You are what you Tweet: Analyzing Twitter for public health. ICWSM 20, 265–272 (2011)
Corley, C, Mikler, A.R, Singh, K.P, Cook, D.J.: Monitoring Influenza Trends through Mining Social Media. In: BIOCOMP: 2009, pp. 340–346 (2009)
Jurgens, D: That’s what friends are for: inferring location in online social media platforms based on social relationships. In: ICWSM, vol. 13, pp. 273–282 (2013)
Cheng, Z., Caverlee, J., Lee, K.:.You are where you tweet: a content-based approach to geo-locating twitter users. In:Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 759–768. ACM, October 2010
Santillana, M., Nguyen, A. T., Dredze, M., Paul, M. J., Nsoesie, E. O., Brownstein, J. S.: Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput. Biol. 11(10), e1004513 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Al Essa, A., Faezipour, M. (2017). MapReduce and Spark-Based Analytic Framework Using Social Media Data for Earlier Flu Outbreak Detection. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2017. Lecture Notes in Computer Science(), vol 10357. Springer, Cham. https://doi.org/10.1007/978-3-319-62701-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-62701-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62700-7
Online ISBN: 978-3-319-62701-4
eBook Packages: Computer ScienceComputer Science (R0)