Predicting Medical Roles in Online Health Fora

  • Amine AbdaouiEmail author
  • Jérôme Azé
  • Sandra Bringay
  • Natalia Grabar
  • Pascal Poncelet
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8791)


Online health fora are increasingly visited by patients to get help and information related to their health. However, these fora are not limited to patients: a significant number of health professionals actively participate in many discussions. As experts their posted information are very important since, they are able to well explain the problems, the symptoms, correct false affirmations and give useful advices, etc. For someone interested in trusty medical information, obtaining only these kinds of posts can be very useful and informative. Unfortunately, extracting such knowledge needs to navigate over the fora in order to evaluate the information. Navigation and selection are time consuming, tedious, difficult and error-prone activities when done manually. It is thus important to propose a new method for automatically categorize information proposed both by non-experts as well as by professionals in online health fora. In this paper, we propose to use a supervised approach to evaluate what are the most representative components of a post considering vocabularies, uncertainty markers, emotions, misspellings and interrogative forms to perform efficiently this categorization. Experiments have been conducted on two real fora and shown that our approach is efficient for extracting posts done by professionals.


Text categorization Text mining Online health fora 



This paper is based on studies supported by the “Maison des Sciences de l’Homme de Montpellier” (MSH-M) within the framework of the French project “Patient’s mind”.8


  1. 1.
    Himmel, W., Reincke, U., Michelmann, H.W.: Text mining and natural language processing approaches for automatic categorization of lay requests to web-based expert forums. J. Med. Internet Res. 11(3), 1 (2009)CrossRefGoogle Scholar
  2. 2.
    Huh, J., Yetisgen-Yildiz, M., Pratt, W.: Text classification for assisting moderators in online health communities. J. Biomed. Inform. 46(6), 998–1005 (2013)CrossRefGoogle Scholar
  3. 3.
    Melzi, S., Abdaoui, A., Azé, J., Bringay, S., Poncelet, P., Galtier, F.: Patient’s rationale: patient knowledge retrieval from health forums. In: ETELEMED 2014, The Sixth International Conference on eHealth, Telemedicine, and Social Medicine, 2014, pp. 140–145 (2014)Google Scholar
  4. 4.
    Bringay, S., Kergosien, E., Pompidor, P., Poncelet, P.: Identifying the targets of the emotions expressed in health forums. In: Gelbukh, A. (ed.) CICLing 2014, Part II. LNCS, vol. 8404, pp. 85–97. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  5. 5.
    Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. Notebook Papers of CLEF, pp. 23–26 (2013)Google Scholar
  6. 6.
    Bouguessa, M., Dumoulin, B., Wang, S.: Identifying authoritative actors in question-answering forums: the case of Yahoo! answers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp. 866–874 (2008)Google Scholar
  7. 7.
    Fisher, D., Smith, M., Welser, H.T.: You are who you talk to: detecting roles in usenet newsgroups. In: Proceedings of the 39th Annual Hawaii International Conference on System Sciences, 2006, HICSS ’06, vol. 3, p. 59b (2006)Google Scholar
  8. 8.
    Thoumelin, P.C., Grabar, N.: La subjectivité dans le discours médical: sur les traces de l’incertitude et des émotions. Rev. Nouv. Technol. Inf., Extraction et Gestion des Connaissances, RNTI-E-26, pp. 455–466 (2014)Google Scholar
  9. 9.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  10. 10.
    Tanguy, L., Fabre, C., Ho-Dac, L.-M., Rebeyrolle, J.: Caractérisation des échanges entre patients et médecins : approche outillée d’un corpus de consultations médicales. Corpus 10, 137–154 (2012)Google Scholar
  11. 11.
    Hamon, T., Nazarenko, A.: Le développement d’une plate-forme pour l’annotation spécialisée de documents Web: retour d’expérience. Trait. Autom. Lang. 49(2), 127–154 (2008)Google Scholar
  12. 12.
    Augustyn, M., Hamou, S.B., Bloquet, G., Goossens, V., Loiseau, M., Rinck, F.: Lexique des affects: constitution de ressources pédagogiques numériques.. In: Autour du langage et des langues: perspective pluridisciplinaire, Sélection d’articles du Colloque International des étudiants-chercheurs en didactique des langues et linguistique. (2008)Google Scholar
  13. 13.
    Balahur, A.: Sentiment analysis in social media texts. In: 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Atlanta, Georgia, pp. 120–128 (2013)Google Scholar
  14. 14.
    Salton, G.: Developments in automatic text retrieval. Science 253(5023), 974–980 (1991)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  16. 16.
    Platt, J.C.: Fast training of SVMs using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
  17. 17.
    John, G.H. Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp. 338–345 (1995)Google Scholar
  18. 18.
    Cohen, W.W.: Fast Effective Rule Induction. In: Twelfth International Conference on Machine Learning, pp. 115–123 (1995)Google Scholar
  19. 19.
    Cross-validation and selection of priors. Statistical Modeling, Causal Inference, and Social Science [Online]. Accessed 7 May 2014
  20. 20.
    Lexique des sentiments et des émotions françaisGoogle Scholar
  21. 21.
    Mohammad, S.M., Turney, P.D.: Emotions evoked by common words and phrases: using mechanical turk to create an emotion Lexicon. In Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Stroudsburg, PA, USA, pp. 26–34 (2010)Google Scholar
  22. 22.
    Skopik, F., Truong, H.-L., Dustdar, S.: Trust and reputation mining in professional virtual communities. In: Gaedke, M., Grossniklaus, M., Díaz, O. (eds.) ICWE 2009. LNCS, vol. 5648, pp. 76–90. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  23. 23.
    Wanas, N., El-Saban, M., Ashour, H., Ammar, W.: Automatic scoring of online discussion posts. In: Proceedings of the 2Nd ACM Workshop on Information Credibility on the Web, New York, NY, USA, pp. 19–26 (2008)Google Scholar
  24. 24.
    Feng, D., Shaw, E., Kim, J., Hovy, E.: Learning to detect conversation focus of threaded discussions. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Stroudsburg, PA, USA, pp. 208–215 (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Amine Abdaoui
    • 1
    Email author
  • Jérôme Azé
    • 1
  • Sandra Bringay
    • 1
  • Natalia Grabar
    • 2
  • Pascal Poncelet
    • 1
  1. 1.LIRMM UM2 CNRS, UMR 5506MontpellierFrance
  2. 2.STL UMR 8163 CNRSUniversité Lille 3Lille 1France

Personalised recommendations