Accurate classification of socially generated medical discourse

  • Rana AlnashwanEmail author
  • Humphrey Sorensen
  • Adrian O’Riordan
  • Cathal Hoare
Regular Paper


The growth of online health communities particularly those involving socially generated content can provide considerable value for society. Participants can gain knowledge of medical information or interact with peers on medical forum platforms. Analysing sentiment expressed by members of a health community in medical forum discourse can be of significant value, such as by identifying a particular aspect of an information space, determining themes that predominate among a large data set, and allowing people to summarize topics within a big data set. In this paper, we identify sentiments expressed in online medical forums that discuss Lyme disease. There are two goals in our research: first, to identify a complete and relevant set of categories that can characterize Lyme disease discourse; and second, to test and investigate strategies, both individually and collectively, for automating the classification of medical forum posts into those categories. We present a feature-based model that consists of three different feature sets: content-free, content-specific and meta-level features. Employing inductive learning algorithms to build a feature-based classification model, we assess the feasibility and accuracy of our automated classification. We further evaluate our model by assessing its ability to adapt to an online medical forum discussing Lupus disease. The experimental results demonstrate the effectiveness of our approach.


Big data Multiclass sentiment classification Machine learning Online health community Text mining Feature extraction 


  1. 1.
    Petrie, K.J., Weinman, J.: Perceptions of Health and Illness: Current Research and Applications. Taylor & Francis, Boca Raton (1997)Google Scholar
  2. 2.
    Davison, K.P., Pennebaker, J.W., Dickerson, S.S.: Who talks? The social psychology of illness support groups. Am. Psychol. 55(2), 205 (2000)CrossRefGoogle Scholar
  3. 3.
    Bhatia, S., Mitra, P.: Adopting inference networks for online thread retrieval. In: AAAI, vol. 10, pp. 1300–1305 (2010)Google Scholar
  4. 4.
    Bobicev, V., Sokolova, M., Oakes, M.: What goes around comes around: learning sentiments in online medical forums. Cognit. Comput. 7(5), 609–621 (2015)CrossRefGoogle Scholar
  5. 5.
    Zhang, T., Cho, J.H., Zhai, C.: Understanding user intents in online health forums. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 220–229. ACM (2014)Google Scholar
  6. 6.
    Fox, S.: The Social Life of Health Information, 2011. Pew Internet & American Life Project, Washington (2011)Google Scholar
  7. 7.
    Bravo-Marquez, F., Mendoza, M., Poblete, B.: Meta-level sentiment models for big social data analysis. Knowl. Based Syst. 69, 86–99 (2014)CrossRefGoogle Scholar
  8. 8.
    Biyani, P., Bhatia, S., Caragea, C., Mitra, P.: Using non-lexical features for identifying factual and opinionative threads in online forums. Knowl. Based Syst. 69, 170–178 (2014)CrossRefGoogle Scholar
  9. 9.
    Ding, X., Liu, B.: The utility of linguistic rules in opinion mining. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 811–812. ACM (2007)Google Scholar
  10. 10.
    Poggi, I., D’Errico, F.: Multimodal acid communication of a politician. In: ESSEM@ AI* IA, pp. 59–70 (2013)Google Scholar
  11. 11.
    Cieliebak, M., Dürr, O., Uzdilli, F.: Potential and limitations of commercial sentiment detection tools. In: ESSEM@ AI* IA, pp. 47–58 (2013)Google Scholar
  12. 12.
    Khan, F.H., Qamar, U., Bashir, S.: eSAP: a decision support framework for enhanced sentiment analysis and polarity classification. Inf. Sci. 367, 862–873 (2016)CrossRefGoogle Scholar
  13. 13.
    Al-Twairesh, N., Al-Khalifa, H., Al-Salman, A.: Subjectivity and sentiment analysis of arabic: trends and challenges. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), pp. 148–155. IEEE (2014)Google Scholar
  14. 14.
    Plutchik, R.: The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am. Sci. 89(4), 344–350 (2001)CrossRefGoogle Scholar
  15. 15.
    Staiano, J., Guerini, M.: Depechemood: a lexicon for emotion analysis from crowd-annotated news. arXiv:1405.1605 (2014)
  16. 16.
    Bravo-Marquez, F., Frank, E., Mohammad, S.M., Pfahringer, B.: Determining word-emotion associations from tweets by multi-label classification. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 536–539. IEEE (2016)Google Scholar
  17. 17.
    Mohammad, S.M., Turney, P.D.: Emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. Association for Computational Linguistics, pp. 26–34 (2010)Google Scholar
  18. 18.
    Wang, X., Wei, F., Liu, X., Zhou, M., Zhang, M.: Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1031–1040. ACM (2011)Google Scholar
  19. 19.
    Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group web forum messages. IEEE Intell. Syst. 20(5), 67–75 (2005)CrossRefGoogle Scholar
  20. 20.
    Dang, Y., Zhang, Y., Chen, H.: A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell. Syst. 25(4), 46–53 (2010)CrossRefGoogle Scholar
  21. 21.
    Alnashwan, R., O’Riordan, A.P., Sorensen, H., Hoare, C.: Improving sentiment analysis through ensemble learning of meta-level features. In: KDWEB 2016: 2nd International Workshop on Knowledge Discovery on the Web. Sun SITE Central Europe (CEUR)/RWTH Aachen University (2016)Google Scholar
  22. 22.
    Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Assoc. Inf. Sci. Technol. 57(3), 378–393 (2006)CrossRefGoogle Scholar
  23. 23.
    Lu, Y.: Automatic topic identification of health-related messages in online health community using text classification. SpringerPlus 2(1), 309 (2013)CrossRefGoogle Scholar
  24. 24.
    Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, vol. 10, pp. 2200–2204 (2010)Google Scholar
  25. 25.
    Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th Conference on Language Resources and Evaluation (LREC06), pp. 417–422 (2006)Google Scholar
  26. 26.
    Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)CrossRefGoogle Scholar
  27. 27.
    Bradley, M.M., Lang, P.J.: Affective norms for English words (anew): instruction manual and affective ratings, Technical report C-1, the center for research in psychophysiology. University of Florida, Tech. Rep. (1999)Google Scholar
  28. 28.
    Nielsen, F. Å.: A new anew: evaluation of a word list for sentiment analysis in microblogs. arXiv:1103.2903 (2011)
  29. 29.
    Mohammad, S.M., Kiritchenko, S., Zhu, X.: NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. arXiv:1308.6242 (2013)
  30. 30.
    Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. J. Assoc. Inf. Sci. Technol. 63(1), 163–173 (2012)CrossRefGoogle Scholar
  31. 31.
    Cambria, E., Havasi, C., Hussain, A.: Senticnet 2: a semantic and affective resource for opinion mining and sentiment analysis. In: FLAIRS Conference, pp. 202–207 (2012)Google Scholar
  32. 32.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. 1(1), 69–90 (1999)CrossRefGoogle Scholar
  33. 33.
    Nichols, T.R., Wisner, P.M., Cripe, G., Gulabchand, L.: Putting the kappa statistic to use. Qual. Assur. J. 13(3–4), 57–61 (2010)CrossRefGoogle Scholar
  34. 34.
    Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 153–158 (1997)CrossRefGoogle Scholar
  35. 35.
    Guo, B., Nixon, M.S.: Gait feature subset selection by mutual information. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 39(1), 36–46 (2009)CrossRefGoogle Scholar
  36. 36.
    Bihis, M., Roychowdhury, S.: A generalized flow for multi-class and binary classification tasks: an azure ml approach. In: 2015 IEEE International Conference on Big Data (Big Data). pp. 1728–1737. IEEE (2015)Google Scholar
  37. 37.
    Salathe, M., Bengtsson, L., Bodnar, T.J., Brewer, D.D., Brownstein, J.S., Buckee, C., Campbell, E.M., Cattuto, C., Khandelwal, S., Mabry, P.L., et al.: Digital epidemiology. PLoS Comput. Biol. 8(7), e1002616 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity College CorkCorkIreland

Personalised recommendations