Skip to main content

MapReduce and Spark-Based Analytic Framework Using Social Media Data for Earlier Flu Outbreak Detection

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10357))

Abstract.

Influenza and flu can be serious problems, and can lead to death, as hundred thousands of people die every year due to seasonal flu. An early warning may help to prevent the spread of flu in the population. This kind of warning can be achieved by using social media data and big data tools and techniques. In this paper, a MapReduce and Spark-based analytic framework (MRSAF) using Twitter data is presented for faster flu outbreak detection. Different analysis cases are implemented using Apache Spark, Hadoop Systems and Hadoop Eco Systems to predict flu trends in different locations using Twitter data. The data was collected using a developed crawler which works together with the Twitter API to stream and filter the tweets based on flu-related keywords. The crawler is also designed to pre-process and clean the unintended attributes of the retrieved tweets. The results of the proposed solution show a strong relationship with the weekly Center for Disease Control and Prevention (CDC) reports.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.H., Liu, B.: Predicting flu trends using twitter data. In: 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 702–707. IEEE, April 2011

    Google Scholar 

  2. Murthy, D., Gross, A., Longwell, S.: Twitter and e-health: a case study of visualizing cancer networks on twitter. In: 2011 International Conference on Information Society (i-Society), pp. 110–113, IEEE, June 2011‏

    Google Scholar 

  3. Nambisan, P., Luo, Z., Kapoor, A., Patrick, T.B., Cisler, R.: Social media, big data, and public health informatics: ruminating behavior of depression revealed through twitter. In: 2015 48th Hawaii International Conference on System Sciences (HICSS), pp. 2906–2913. IEEE, January 2015

    Google Scholar 

  4. Corley, C.D., Cook, D.J., Mikler, A.R., Singh, K.P.: Text and structural data mining of influenza mentions in web and social media. International Journal of Environmental Research and Public Health 7(2), 596–615 (2010)

    Google Scholar 

  5. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)‏

    Google Scholar 

  6. Mohammed, E.A., Far, B.H., Naugler, C.: Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. BioData Mining 7(22), 1–23 (2014)

    Google Scholar 

  7. Haryono, G. P., & Zhou, Y.: Profiling apache HIVE query from run time logs. In: 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 61–68, IEEE, January 2016

    Google Scholar 

  8. Verma, A., Mansuri, A.H., Jain, N.: Big data management processing with hadoop mapreduce and spark technology: A comparison. In: Symposium on Colossal Data Analysis and Networking (CDAN), pp. 1–4. IEEE, March 2016‏

    Google Scholar 

  9. Paul, M.J., Dredze, M.: A model for mining public health topics from Twitter. Health 11, 16–6 (2012)

    Google Scholar 

  10. Paul, M.J., Dredze, M.: You are what you Tweet: Analyzing Twitter for public health. ICWSM 20, 265–272 (2011)

    Google Scholar 

  11. Corley, C, Mikler, A.R, Singh, K.P, Cook, D.J.: Monitoring Influenza Trends through Mining Social Media. In: BIOCOMP: 2009, pp. 340–346 (2009)

    Google Scholar 

  12. Jurgens, D: That’s what friends are for: inferring location in online social media platforms based on social relationships. In: ICWSM, vol. 13, pp. 273–282 (2013)

    Google Scholar 

  13. Cheng, Z., Caverlee, J., Lee, K.:.You are where you tweet: a content-based approach to geo-locating twitter users. In:Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 759–768. ACM, October 2010

    Google Scholar 

  14. Santillana, M., Nguyen, A. T., Dredze, M., Paul, M. J., Nsoesie, E. O., Brownstein, J. S.: Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput. Biol. 11(10), e1004513 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miad Faezipour .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Al Essa, A., Faezipour, M. (2017). MapReduce and Spark-Based Analytic Framework Using Social Media Data for Earlier Flu Outbreak Detection. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2017. Lecture Notes in Computer Science(), vol 10357. Springer, Cham. https://doi.org/10.1007/978-3-319-62701-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62701-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62700-7

  • Online ISBN: 978-3-319-62701-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics