Abstract
Online remedy finders and health-related discussion forums have become increasingly popular in recent years. Common web users write their health problems there and request suggestion from experts or other users. As a result, these forums became a huge repository of information and discussions on various health issues. An intelligent information retrieval system can help to utilize this repository in various applications. In this paper, we propose a system for the automatic identification of existing similar forum posts given a new post. The system is based on computing similarity between two patient authored texts. For computing the similarity between the current post and existing posts, the system uses a hybrid strategy based on template information, topic modelling, and latent semantic indexing. The system is tested using a set of real questions collected from a homeopathy forum namely abchomeopathy.com. The relevance of the posts retrieved by the system is evaluated by human experts. The evaluation results demonstrate that the precision of the system is 88.87%.
Similar content being viewed by others
References
Berlin A, Sorani M, Sim I. A taxonomic description of computer-based clinical decision support systems. J Biomed Inform. 2006;39(6):656–67.
Wright A, Chen ES, Maloney FL. An automated technique for identifying associations between medications, laboratory results and problems. J Biomed Inform. 2010;43(6):891–901.
Ordonez C. Association rule discovery with the train and test approach for heart disease prediction. IEEE Trans Inf Technol Biomed. 2006;10(2):334–43.
Aronsky D, Chan KJ, Haug PJ. Evaluation of a computerized diagnostic decision support system for patients with pneumonia: study design considerations. J Am Med Inform Assoc. 2001;8(5):473–85.
Liu J, Zhang Z, Wong DW, et al. Automatic glaucoma diagnosis through medical imaging informatics. J Am Med Inform Assoc. 2013;20(6):1021–7.
Cimino JJ, Aguirre A, Johnson SB, Peng P. Generic queries for meeting clinical information needs. Bull Med Libr Assoc. 1993;81(2):195–206.
Yu H, Lee M, Kaufman D, et al. Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians. J Biomed Inf. 2007;40(3):236–51.
Yu H, Cao Y. Automatically extracting information needs from ad hoc clinical questions. In: AMIA annual symposium proceedings; 2008. p. 96–100.
Cao YG, Liu F, Simpson P. AskHERMES: an online question answering system for complex clinical questions. J Biomed Inform. 2011;44:277–88.
Harkema H, Roberts I, Gaizauskas R, Hepple M. Information extraction from clinical records. In: Proceedings of the 4th UK e-Science All Hands Meeting; 2005. p. 19–22.
Sohn S, Clark C, Halgrim SR, Murphy SP, Jonnalagadda SR, Wagholikar KB. Analysis of cross-institutional medication description patterns in clinical narratives. Biomedical. 2013;6(2013):7–16.
Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. J Am Med Inf Assoc. 2010;17(1):19–24.
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inf Assoc. 2010;17(2010):507–13.
Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inf Assoc. 2010;17(2010):229–36.
Stewart A, Smith M, Nejdl W. A transfer approach to detecting disease reporting events in blog social media. In: Proceedings of the 22nd ACM conference on hypertext and hypermedia; 2011. p. 271–280.
Xu J, Gan L, Cheng M, Wu Q. Unsupervised medical entity recognition and linking in Chinese Online Medical Text. J Healthc Eng. 2018; 2018:Article ID 2548537
Chen Y, Guo W, Zhao X. A semantic based information retrieval model for blog. In: Third international symposium on electronic commerce and security; 2010. p. 257–60.
MacLean DL, Heer J. Identifying medical terms in patient-authored text: a crowdsourcing-based approach. J Am Med Inf Assoc. 2013;2013(20):1120–7.
Ranjan H, Agarwal S, Prakash A, Saha SK. Automatic labelling of important terms and phrases from medical discussions. In: IEEE conference on information and communication technology; 2017. IEEE Explore. https://doi.org/10.1109/INFOCOMTECH.2017.8340644.
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res (JMLR). 2011;12:2493–537.
Blei DM, Ng AY, Jordan MI, Lafferty J. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.
Kim H, Howland P, Park H. Dimension reduction in text classification with support vector machines. J Mach Learn Res. 2005;6(1):37–53.
Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge: Cambridge University Press; 2008.
Satyam A, Dawn AK, Saha SK. A statistical analysis approach to author identification using latent semantic analysis. Notebook for PAN at CLEF 2014. In: Proceedings of the CLEF2014 working notes, p. 1143–1147. ISSN 1613-0073. Sheffield, UK, 15–18 September 2014.
Prakash A, Saha SK. Experiments on document chunking and query formation for plagiarism source retrieval. Notebook for PAN at CLEF 2014. In: Proceedings of the CLEF2014 working notes. p. 990–996. ISSN 1613-0073. Sheffield, UK, 15–18 September 2014.
Plansangket S, Gan JQ. A query suggestion method combining TF-IDF and Jaccard coefficient for interactive web search. Artif Intell Res. 2015;4(2):119–25.
Suchomel S, Brandejs, M. Improving synoptic quering for source retrieval—Notebook for PAN at CLEF 2015. CLEF 2015 Evaluation Labs and Workshop—Working Notes Papers, Toulouse, France, 8–11 September 2015.
Funding
The authors declare that they have received no funding for the current study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Saha, S.K., Prakash, A. & Majumder, M. “Similar query was answered earlier”: processing of patient authored text for retrieving relevant contents from health discussion forum. Health Inf Sci Syst 7, 4 (2019). https://doi.org/10.1007/s13755-019-0067-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13755-019-0067-3