Personal and Ubiquitous Computing

, Volume 19, Issue 7, pp 1125–1132 | Cite as

Chinese social media analysis for disease surveillance

  • Xiaohui CuiEmail author
  • Nanhai Yang
  • Zhibo Wang
  • Cheng Hu
  • Weiping Zhu
  • Hanjie Li
  • Yujie Ji
  • Cheng Liu
Original Article


It is reported that there are hundreds of thousands of deaths caused by seasonal flu all around the world every year. More other diseases such as chickenpox, malaria, etc. are also serious threats to people’s physical and mental health. There are 250,000–500,000 deaths every year around the world. Therefore proper techniques for disease surveillance are highly demanded. Recently, social media analysis is regarded as an efficient way to achieve this goal, which is feasible since growing number of people have been posting their health information on social media such as blogs, personal websites, etc. Previous work on social media analysis mainly focused on English materials but hardly considered Chinese materials, which hinders the application of such technique to Chinese people. In this paper, we proposed a new method of Chinese social media analysis for disease surveillance. More specifically, we compared different kinds of methods in the process of classification and then proposed a new way to process Chinese text data. The Chinese Sina micro-blog data collected from September to December 2013 are used to validate the effectiveness of the proposed method. The results show that a high classification precision of 87.49 % in average has been obtained. Comparing with the data from the authority, Chinese National Influenza Center, we can predict the outbreak time of flu 5 days earlier.


Social media Chinese SVMLIGHT Classification Prediction Flu 



This research is supported in part by National Nature Science Foundation of China No. 61440054, Fundamental Research Funds for the Central Universities of China No. 216-274213, and Nature Science Foundation of Hubei, China No. 2014CFA048. Outstanding Academic Talents Startup Funds of Wuhan University, No. 216-410100003 and 216-410100004.


  1. 1.
    IResearch (2010) In 2010 the global Internet users spend most of their time in social media.
  2. 2.
  3. 3.
    Collier N, Son NT, Nguyen NM (2011) OMG u got flu? Analysis of shared health messages for bio-surveillance. J. Biomed Semant 2(S–5):S9CrossRefGoogle Scholar
  4. 4.
    Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014 CrossRefGoogle Scholar
  5. 5.
    Mangold WG, Faulds DJ (2009) Social media: the new hybrid element of the promotion mix. Bus Horiz 52(4):357–365CrossRefGoogle Scholar
  6. 6.
    Kamel Boulos MN, Sanfilippo AP, Corley CD, Wheeler S (2010) Social web mining and exploitation for serious applications. Technosocial predictive analytics and related technologies for public health, environmental and national security surveillance. Comput Methods Programs Biomed 100(1):16–23CrossRefGoogle Scholar
  7. 7.
    Lampos V, De Bie T, Cristianini N (2010) Flu detector-tracking epidemics on twitter. In: European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD 2010), Barcelona, Spain, pp 599–602Google Scholar
  8. 8.
    Freifeld CC, Chunara R, Mekaru SR, Chan EH, Kass-Hout T, Iacucci AA, Brownstein JS (2010) Participatory epidemiology: use of mobile phones for community-based health reporting. PLoS Med 7(12):e1000376CrossRefGoogle Scholar
  9. 9.
    Sadilek A, Kautz HA, Silenzio (2012a) Predicting disease transmission from geo-tagged micro-blog data. In: Twenty-sixth AAAI conference on artificial intelligenceGoogle Scholar
  10. 10.
    Sadilek A, Kautz H, Silenzio V (2012b) Dublin: modeling spread of disease from social interactions. In: Proceedings of sixth AAAI international conference on weblogs and social media (ICWSM)Google Scholar
  11. 11.
    Kaundal R, Kapoor AS, Raghava GP (2006) Machine learning techniques in disease forecasting: a case study on rice blast prediction. BMC Bioinform 7(1):485CrossRefGoogle Scholar
  12. 12.
    Jin X, Gallagher A, Cao L, Luo J, Han J (2010) The wisdom of social multimedia: using flickr for prediction and forecast. In: Proceedings of the international conference on multimedia. ACM, pp 1235–1244Google Scholar
  13. 13.
    Zheng-yan C (2010) Short message classification of microblogging based on semantic. Mod Comput 8:006Google Scholar
  14. 14.
    Yang F, Liu Y, Yu X, Yang M (2012) Automatic detection of rumor on sina weibo. In: Proceedings of the ACM SIGKDD workshop on mining data semantics. ACM, p 13Google Scholar
  15. 15.
    Bao M, Yang N, Zhou L, Lao Y, Zhang Y, Tian Y (2013) The spatial analysis of weibo check-in data–the case study of wuhan. In: Geo-informatics in resource management and sustainable ecosystem. Springer, Berlin, pp 480–491Google Scholar
  16. 16.
    Sun Y, Yan H, Lu C, Bie R, Zhou Z (2014) Constructing the web of events from raw data in the web of things. Mob Inf Syst 10(1):105–125Google Scholar
  17. 17.
    Ritchie M, Charlish A, Woodbridge K, Stove A (2011) Use of the Kullback–Leibler divergence in estimating clutter distributions. In: 2011 IEEE on radar conference (RADAR). IEEE, pp 751–756Google Scholar
  18. 18.
    Amati G, Van Rijsbergen CJ (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inf Syst (TOIS) 20(4):357–389CrossRefGoogle Scholar
  19. 19.
    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523CrossRefGoogle Scholar
  20. 20.
    Liu J, Li B, Zhang W-S (2012) Feature extraction using maximum variance sparse mapping. Neural Comput Appl 21(8):1827–1833CrossRefGoogle Scholar
  21. 21.
    Deng S, Xu Y, Li L, Li X, He Y (2013) A feature-selection algorithm based on support vector machine-multiclass for hyperspectral visible spectral analysis. J Food Eng 119(1):159–166CrossRefGoogle Scholar
  22. 22.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  23. 23.
    Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT-Press, pp 41–56Google Scholar
  24. 24.
    Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27Google Scholar
  25. 25.
    Yang N, Li S, Liu J, Bian F (2014) Sensitivity of support vector machine classification to various training features. TELKOMNIKA Indones J Electr Eng 12(1):286–291Google Scholar
  26. 26.
    Han E-HS, Karypis G, Kumar V (2001) Text categorization using weight adjusted k-nearest neighbor classification. Springer, BerlinCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Xiaohui Cui
    • 1
    Email author
  • Nanhai Yang
    • 1
  • Zhibo Wang
    • 1
    • 2
  • Cheng Hu
    • 1
  • Weiping Zhu
    • 1
  • Hanjie Li
    • 1
  • Yujie Ji
    • 1
  • Cheng Liu
    • 3
  1. 1.International School of SoftwareWuhan UniversityWuhanChina
  2. 2.Software CollegeEast China Institute of TechnologyNanchangChina
  3. 3.Oak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations