Journal of Medical Systems

, 40:236 | Cite as

Mining Health Social Media with Sentiment Analysis

  • Fu-Chen Yang
  • Anthony J.T. Lee
  • Sz-Chen Kuo
Patient Facing Systems
Part of the following topical collections:
  1. Patient Facing Systems


With the rapid development of the Internet, more and more users utilize health communities (known as forums) to find health-related information, share their medical stories and experiences, or interact with other people in the communities. In this paper, we propose a framework to analyze the user-generated contents in a health community. The proposed framework contains three phases. First, we extract medical terms, including conditions, symptoms, treatments, effectiveness and side effects to form a virtual document for each question in the community. Next, we modify Latent Dirichlet Allocation (LDA) by adding a weighted scheme, called conLDA, to cluster virtual documents with similar medical term distributions into a conditional topic (C-topic). Finally, we analyze the clustered C-topics by sentiment polarities, and physiological and psychological sentiment. The experiment results show that conLDA outperforms the original LDA, and can cluster relevant medical terms and relevant questions together. The C-topics clustered by conLDA are more thematic than those clustered by the original LDA. The results of sentiment analysis may provide a quick reference and valuable insights for patients, caregivers and doctors.


Health social media Latent Dirichlet Allocation Sentiment analysis 



The authors are grateful to the anonymous referees for their helpful comments and suggestions. This research was supported in part by the Ministry of Science and Technology, Republic of China under Grant No. MOST 103-2410-H-002-109-MY3.


  1. 1.
    Aletras, N., and Stevenson, M., Evaluating topic coherence using distributional semantics, Proceedings of the 10th International Conference on Computational Semantics. 13–22, 2013.Google Scholar
  2. 2.
    Augustyniak, L., Kajdanowicz, T., Kazienko, P., Kulisiewicz, M., and Tuliglowicz, W., An approach to sentiment analysis of movie reviews: Lexicon based vs. classification, Proceedings of the 9th International Conference on Hybrid Artificial Intelligence Systems. 168–178, 2014.Google Scholar
  3. 3.
    Bahrainian, S., and Dengel, A., Sentiment analysis and summarization of Twitter data, Proceedings of the 16th IEEE International Conference on Computational Science and Engineering. 227–234, 2013.Google Scholar
  4. 4.
    Beck, F., Richard, J.B., Nguyen-Thanh, V., Montagni, I., Parizot, I., and Renahy, E., Use of the internet as a health information resource among French young adults: results from a nationally representative survey. J. Med. Internet Res. 16(5):e128, 2014.CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Bianco, A., Zucco, R., Nobile, C.G.A., Pileggi, C., and Pavia, M., Parents seeking health-related information on the internet: cross-sectional study. J. Med. Internet Res. 15(9):e204, 2013.CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Biyani, P., Caragea, C., Mitra, P., and Yen, J., Identifying emotional and informational support in online health communities, Proceedings of the 25th International Conference on Computational Linguistics. 827–836, 2014.Google Scholar
  7. 7.
    Blei, D.M., Ng, A.Y., Jordan, M.I., and Lafferty, J., Latent Dirichlet allocation. J. Mach. Learn. Res. 3:993–1022, 2003.Google Scholar
  8. 8.
    Chen, A.T., Exploring online support spaces: using cluster analysis to examine breast cancer, diabetes and fibromyalgia support groups. J. Patient Educ. Couns. 87(2):250–257, 2012.CrossRefGoogle Scholar
  9. 9.
    Chen, L.S., Lin, Z.C., and Chang, J.R., FIR: an effective scheme for extracting useful metadata from social media. J. Med. Syst. 39(11):1, 2015.CrossRefGoogle Scholar
  10. 10.
    Ge, G., Chen, L., and Du, J., The research on topic detection of microblog based on TC-LDA, Proceedings of the 15th IEEE International Conference on Communication Technology. 722–727, 2013.Google Scholar
  11. 11.
    Heidelberger, C., El-Gayar, O., and Sarnikar, S., Online health social networks and patient health decision behavior: A research agenda, Proceedings of the 44th Hawaii International Conference on System Science. 1–7, 2011.Google Scholar
  12. 12.
    Hu, X., Tang, L., Tang, J., and Liu, H., Exploiting social relations for sentiment analysis in microblogging, Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 537–546, 2013.Google Scholar
  13. 13.
    Huang, Z., Dong, W., Ji, L., and Duan, H., Outcome prediction in clinical treatment processes. J. Med. Syst. 40(1):8, 2016.CrossRefPubMedGoogle Scholar
  14. 14.
    Huang, Z., Lu, X., and Duan, H., Latent treatment pattern discovery for clinical processes. J. Med. Syst. 37(2):9915, 2013.CrossRefPubMedGoogle Scholar
  15. 15.
    Lau, J.H., Newman, D., Karimi, S. and Baldwin, T., Best topic word selection for topic labeling, Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 605–613, 2010.Google Scholar
  16. 16.
    Lin, C. and He, Y., Joint sentiment/topic model for sentiment analysis, Proceedings of the 18th ACM Conference on Information and Knowledge Management. 375–384, 2010.Google Scholar
  17. 17.
    Lin, C., He, Y., Everson, R., and Rüger, S., Weakly supervised joint sentiment-topic detection from text. IEEE Trans. Knowl. Data Eng. 24(6):1134–1145, 2012.CrossRefGoogle Scholar
  18. 18.
    Lin, Y., Li, W., Chen, K., and Liu, Y., A document clustering and ranking system for exploring MEDLINE citations. J. Am. Med. Inform. Assoc. 14(5):651–661, 2007.CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Lu, Y., Zhang, P. and Deng, S., Exploring health-related topics in online health community using cluster analysis, Proceedings of the 46th Hawaii International Conference on System Science. 802–811, 2013.Google Scholar
  20. 20.
    Mimno, D., Wallach, H.M., Talley, E., Leenders, M. and McCallum, A., Optimizing semantic coherence in topic models, Proceedings of Conference on Empirical Methods in Natural Language Processing. 262–272, 2011.Google Scholar
  21. 21.
    Monnier, J., Laken, M., and Carter, C., Patient and caregiver interest in internet-based cancer services. Cancer Pract. 10:305–310, 2002.CrossRefPubMedGoogle Scholar
  22. 22.
    Nguyen, T., Phung, D., Dao, B., Venkatesh, S., and Berk, M., Affective and content analysis of online depression communities. IEEE Trans. Affect. Comput. 5(3):217–226, 2014.CrossRefGoogle Scholar
  23. 23.
    O'Neil, B., Ziebland, S., Valderas, J., and Lupiáñez-Villanueva, F., User-generated online health content: a survey of internet users in the United Kingdom. J. Med. Internet Res. 16(4):e118, 2014.CrossRefGoogle Scholar
  24. 24.
    Portier, K., Greer, G.E., Rokach, L., Ofek, N., Wang, Y., Biyani, P., Yu, M., Banerjee, S., Zhao, K., Mitra, P., and Yen, J., Understanding topics and sentiment in an online cancer survivor community. J. Natl. Cancer Inst. Monogr. 47:195–198, 2013.CrossRefGoogle Scholar
  25. 25.
    Qiu, B., Zhao, K., Mitra, P., Wu, D., Caragea, C., and Yen, J., Get online support, feel better - sentiment analysis and dynamics in an online cancer survivor community, Proceedings of the Third IEEE International Conference on Social Computing. 274–281, 2011.Google Scholar
  26. 26.
    Röder, M., Both, A., and Hinneburg, A., Exploring the space of topic coherence measures, Proceedings of the 8th ACM International Conference on Web Search and Data Mining. 399–408, 2015.Google Scholar
  27. 27.
    Siegel, R.L., Miller, K.D., and Jemal, A., Cancer statistics. Cancer J. Clin. 65(5–65):29, 2015.Google Scholar
  28. 28.
    Tang, X., and Yang, C.C., Ranking user influence in healthcare social media. ACM Trans. Intell. Syst. Technol. 3(4):73:1–73:21, 2012.CrossRefGoogle Scholar
  29. 29.
    Vanzo, A., Croce, D. and Basili, R., Context-based model for sentiment analysis in Twitter, Proceedings of the 25th International Conference on Computational Linguistics. 2345–2354, 2014.Google Scholar
  30. 30.
    Wang, Y., Agichtein, E., and Benzi, M., TM-LDA: Efficient online modeling of latent topic transitions in social media, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 123–131, 2012.Google Scholar
  31. 31.
    Zaidan, A.A., Zaidan, B.B., Kadhem, Z., Larbani, M., Lakulu, M.B., and Hashim, M., Challenges, alternatives, and path to sustainability: Better public health promotion using social networking pages as a key tool. J. Med. Syst. 39(2):7–2015.Google Scholar
  32. 32.
    Zhang, Y., He, D., and Sang, Y., Facebook as a platform for health information and communication: a case study of diabetes group. J. Med. Syst. 37(3):9942, 2013.CrossRefPubMedGoogle Scholar
  33. 33.
    Zhao, K., Greer, G., Qiu, B., Mitra, P., Portier, K., and Yen, J., Finding influential users of an online health community: a new metric based on sentiment influence. J. Am. Med. Inform. Assoc. 21(e2):212–218, 2014.CrossRefGoogle Scholar
  34. 34.
    Ziebland, S., and Wyke, S., Health and illness in a connected world: how might sharing experiences on the internet affect people's health? Milt. Q. 90(2):219–249, 2012.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Information ManagementNational Taiwan UniversityTaipeiRepublic of China
  2. 2.Big Data LaboratoryChunghwa Telcom LaboratoriesTaipeiRepublic of China

Personalised recommendations