Skip to main content

Detecting and Analyzing Influenza Epidemics with Social Media in China

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8443))

Abstract

In recent years, social media has become important and omnipresent for social network and information sharing. Researchers and scientists have begun to mine social media data to predict varieties of social, economic, health and entertainment related real-world phenomena. In this paper, we exhibit how social media data can be used to detect and analyze real-world phenomena with several data mining techniques. Specifically, we use posts from TencentWeibo to detect influenza and analyze influenza trends. We build a support vector machine (SVM) based classifier to classify influenza posts. In addition, we use association rule mining to extract strongly associated features as additional features of posts to overcome the limitation of 140 words for posts. We also use sentimental analysis to classify the reposts without feature and uncommented reposts. The experimental results show that by combining those techniques, we can improve the precision and recall by at least ten percent. Finally, we analyze the spatial and temporal patterns for positive influenza posts and tell when and where influenza epidemic is more likely to occur.

This research is partially funded by the NSF of China (Grant No. 11271351 and 61303167) and the Basic Research Program of Shenzhen (Grant No. JCYJ20130401170306838 and JC201105190934A). Xin Wang’s research is partially funded by the NSERC Discovery Grant.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22, 207–216 (1993)

    Article  Google Scholar 

  2. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  3. Aramaki, E., Maskawa, S., Morita, M.: Twitter catches the flu: Detecting influenza epidemics using twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1568–1576. Association for Computational Linguistics (2011)

    Google Scholar 

  4. Asur, S., Huberman, B.A.: Predicting the future with social media. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1, pp. 492–499. IEEE (2010)

    Google Scholar 

  5. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  6. de Haaff, M.: Sentiment analysis, hard but worth it!, customerthink (2010), http://www.customerthink.com/blog/sentiment_analysis_hard_but_worth_it

  7. Espino, J.U., Hogan, W.R., Wagner, M.M.: Telephone triage: a timely data source for surveillance of influenza-like diseases. In: AMIA Annual Symposium Proceedings, vol. 2003, p. 215. American Medical Informatics Association (2003)

    Google Scholar 

  8. Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457(7232), 1012–1014 (2008)

    Article  Google Scholar 

  9. Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining - general survey and comparison. ACM SIGKDD Explorations Newsletter 2(1), 58–64 (2000)

    Article  Google Scholar 

  10. Krige, D.G.: A statistical approach to some mine valuation and allied problems on the Witwatersrand. PhD thesis, University of the Witwatersrand (1951)

    Google Scholar 

  11. Lampos, V., De Bie, T., Cristianini, N.: Flu detector - tracking epidemics on twitter. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 599–602. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Lin, C.-J.: A guide to support vector machines, Department of Computer Science, National Taiwan University (2006)

    Google Scholar 

  13. Magruder, S.: Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease. Johns Hopkins University APL Technical Digest 24, 349–353 (2003)

    Google Scholar 

  14. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  15. Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1275–1284. ACM (2009)

    Google Scholar 

  16. Meyer, D., Leisch, F., Hornik, K.: The support vector machine under test. Neurocomputing 55(1), 169–186 (2003)

    Article  Google Scholar 

  17. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)

    Google Scholar 

  18. Sadilek, A., Kautz, H., Silenzio, V.: Predicting disease transmission from geo-tagged micro-blog data. In: Twenty-Sixth AAAI Conference on Artificial Intelligence, p. 11 (2012)

    Google Scholar 

  19. Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology 61(12), 2544–2558 (2010)

    Article  Google Scholar 

  20. Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with twitter: What 140 characters reveal about political sentiment. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, pp. 178–185 (2010)

    Google Scholar 

  21. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, F., Luo, J., Li, C., Wang, X., Zhao, Z. (2014). Detecting and Analyzing Influenza Epidemics with Social Media in China. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8443. Springer, Cham. https://doi.org/10.1007/978-3-319-06608-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06608-0_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06607-3

  • Online ISBN: 978-3-319-06608-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics